Miao Lu

I study reinforcement learning and optimization. I work on building evolving agents that handle long-horizon tasks. Overview:

Long-horizon & self-evolving agents: SUPO and Context-Folding (agent actively managing its context optimized by end-to-end RL), INFUSER (RL with self-synthesized training data).
Algorithmic foundation of RL: RPO (overoptimization in RLHF), MEX (unified RL exploration strategy).
Optimization in deep learning: Benign Oscillation (generalization benefit of large LR), Momentum in River-valley.

I am a 3rd-year Ph.D. candidate at Management Science & Engineering, Stanford University, advised by Prof. Jose Blanchet. I received my B.S. in Mathematics from the University of Science and Technology of China. I interned at NVIDIA Research, ByteDance Seed, and Ubiquant Investment.

miaolu [at] stanford [dot] edu

Selected Publications and Preprints

Authors with * contributed equally. See full publication list in Publications

INFUSER: Influence-Guided Self-Evolution Improves Reasoning

Siyu Chen, Miao Lu, Beining Wu, Heejune Sheen, Fengzhuo Zhang, Shuangning Li, Zhiyuan Li, Jose Blanchet, Tianhao Wang, Zhuoran Yang

COLM Workshop on Scientific Understanding of Foundation Models (SciFM) 2026

Towards Understanding Momentum Acceleration in River-Valley Loss Landscape

Miao Lu, Zeyu Bian, Kaiyue Wen, Beining Wu, Siyu Chen, Tianhao Wang, Zhiyuan Li

ICML Workshop on High-dimensional Learning Dynamics (HiLD) 2026 Spotlight

Scaling Long-Horizon LLM Agent via Context Folding

Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen

International Conference on Machine Learning (ICML) 2026

Beyond the Context Window: Scaling Agentic RL via End-to-end Optimized Context Compression

Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen

Association for Computational Linguistics (ACL) 2026

Can Neural Networks Achieve Optimal Computational-Statistical Tradeoff? An Analysis on Single-Index Model

Siyu Chen*, Beining Wu*, Miao Lu, Zhuoran Yang, Tianhao Wang

NeurIPS Workshop on Mathematics of Modern ML (M3L) 2024 Oral

International Conference on Learning Representations (ICLR) 2025 Oral

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Zhihan Liu*, Miao Lu*, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang

ICML Workshop on Aligning RL Experimentalists and Theorists (ARLET) 2024

Neural Information Processing Systems (NeurIPS) 2024

Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates

Miao Lu*, Beining Wu*, Xiaodong Yang, Difan Zou

NeurIPS Workshop on Mathematics of Modern ML (M3L) 2023

International Conference on Learning Representations (ICLR) 2024

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

Zhihan Liu*, Miao Lu*, Wei Xiong*, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

Neural Information Processing Systems (NeurIPS) 2023 Spotlight

Under review in Operations Research (OR)

Learning Pruning-Friendly Networks via Frank-Wolfe: One-Shot, Any-Sparsity, and No Retraining

Miao Lu*, Xiaolong Luo*, Tianlong Chen, Wuyang Chen, Dong Liu, Zhangyang Wang

International Conference on Learning Representations (ICLR) 2022 Spotlight

Background

Experiences

Education

Stanford UniversitySep 2023 – present

Ph.D. candidate in Operations Research, advised by Jose Blanchet
Department of Management Science & Engineering

University of Science and Technology of ChinaSep 2018 – Jun 2022

B.S. in Mathematics and Applied Mathematics (GPA: 4.06/4.30, Rank: 2/140+)
School of the Gifted Young · 41st Guo Moruo Scholarship

Industrial Experience

NVIDIA ResearchJan 2026 – present

Research Scientist Intern

ByteDance SeedJun 2025 – Dec 2025

Research Scientist Intern

Ubiquant InvestmentJun 2022 – Sep 2022

Quantitative Research Intern

Research Visiting

Toyota Technological Institute at ChicagoJul 2024 – Sep 2024

Student Visitor, hosted by Tianhao Wang and Zhiyuan Li

The University of Hong KongApr 2023 – Aug 2023

Research Assistant, hosted by Difan Zou

Invited Talks

Scaling Long Horizon LLM Agent via Reinforcement Learning — TikTok Research Seminar, Remote, 2025 Slides
Optimal Robust Assortment Learning from Observational Data — INFORMS Annual Meeting, Atlanta, 2025 Slides
Optimal Computational-Statistical Tradeoff: Single-Index Model — 2nd NeurIPS M3L Workshop, Vancouver, 2024 Slides
Distributionally Robust RL with Interactive Data Collection — INFORMS Annual Meeting, Seattle, 2024 Slides
Double Pessimism for Distributionally Robust Offline RL — CISS, Princeton, 2024 · INFORMS Annual Meeting, Phoenix, 2023 Slides

Awards & Honors

Xinhe Scholarship — outstanding undergraduate researcher, USTC (Mar 2023)
41st Guo Moruo Scholarship — highest scholarship from USTC (Sep 2021)
Yuanqing Yang Scholarship — top scholarship, School of Mathematical Sciences, USTC (Jan 2022)
S.-T. Yau College Student Mathematics Contests, winning prize (Prob. & Stat. track) (Jun 2020)
National Scholarship — highest scholarship from the Ministry of Education of China (2019 & 2020)

Professional Service

Journal Reviewer

Annals of Applied Probability (AOAP) · Operations Research (OR) · Management Science (MS) · Mathematics of Operations Research (MOR) · Transactions on Machine Learning Research (TMLR)

Conference Reviewer

NeurIPS (2023–2026) · ICLR (2024–2026) · ICML (2024–2026) · AISTATS (2025) · AAAI (2025) · ICML Workshops (ARLET, EXAIT, MOSS) · NeurIPS M3L