I study reinforcement learning and optimization. I work on building evolving agents that handle long-horizon tasks. Overview:

  • Long-horizon & self-evolving agents: SUPO and Context-Folding (agent actively managing its context optimized by end-to-end RL), INFUSER (RL with self-synthesized training data).
  • Algorithmic foundation of RL: RPO (overoptimization in RLHF), MEX (unified RL exploration strategy).
  • Optimization in deep learning: Benign Oscillation (generalization benefit of large LR), Momentum in River-valley.

I am a 3rd-year Ph.D. candidate at Management Science & Engineering, Stanford University, advised by Prof. Jose Blanchet. I received my B.S. in Mathematics from the University of Science and Technology of China. I interned at NVIDIA Research, ByteDance Seed, and Ubiquant Investment.

miaolu [at] stanford [dot] edu

Research

Selected Publications and Preprints

Authors with * contributed equally. See full publication list in Publications

Siyu Chen, Miao Lu, Beining Wu, Heejune Sheen, Fengzhuo Zhang, Shuangning Li, Zhiyuan Li, Jose Blanchet, Tianhao Wang, Zhuoran Yang
arXiv preprint, Jun 2026
Towards Understanding Momentum Acceleration in River-Valley Loss Landscape
Miao Lu, Zeyu Bian, Kaiyue Wen, Beining Wu, Siyu Chen, Tianhao Wang, Zhiyuan Li
ICML Workshop on High-dimensional Learning Dynamics (HiLD) 2026 Spotlight
Weiwei Sun, Miao Lu, Zhan Ling, Kang Liu, Xuesong Yao, Yiming Yang, Jiecao Chen
International Conference on Machine Learning (ICML) 2026
Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen
Association for Computational Linguistics (ACL) 2026
Siyu Chen*, Beining Wu*, Miao Lu, Zhuoran Yang, Tianhao Wang
NeurIPS Workshop on Mathematics of Modern ML (M3L) 2024 Oral
International Conference on Learning Representations (ICLR) 2025 Oral
Zhihan Liu*, Miao Lu*, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang
ICML Workshop on Aligning RL Experimentalists and Theorists (ARLET) 2024
Neural Information Processing Systems (NeurIPS) 2024
Miao Lu*, Beining Wu*, Xiaodong Yang, Difan Zou
NeurIPS Workshop on Mathematics of Modern ML (M3L) 2023
International Conference on Learning Representations (ICLR) 2024
Zhihan Liu*, Miao Lu*, Wei Xiong*, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
Neural Information Processing Systems (NeurIPS) 2023 Spotlight
Under review in Operations Research (OR)
Miao Lu*, Xiaolong Luo*, Tianlong Chen, Wuyang Chen, Dong Liu, Zhangyang Wang
International Conference on Learning Representations (ICLR) 2022 Spotlight

Background

Experiences

Education

Stanford UniversitySep 2023 – present
  • Ph.D. candidate in Operations Research, advised by Jose Blanchet
  • Department of Management Science & Engineering
University of Science and Technology of ChinaSep 2018 – Jun 2022
  • B.S. in Mathematics and Applied Mathematics (GPA: 4.06/4.30, Rank: 2/140+)
  • School of the Gifted Young · 41st Guo Moruo Scholarship

Industrial Experience

NVIDIA ResearchJan 2026 – present
  • Research Scientist Intern
ByteDance SeedJun 2025 – Dec 2025
  • Research Scientist Intern
Ubiquant InvestmentJun 2022 – Sep 2022
  • Quantitative Research Intern

Research Visiting

Toyota Technological Institute at ChicagoJul 2024 – Sep 2024
  • Student Visitor, hosted by Tianhao Wang and Zhiyuan Li
The University of Hong KongApr 2023 – Aug 2023
  • Research Assistant, hosted by Difan Zou

Invited Talks

  • Scaling Long Horizon LLM Agent via Reinforcement Learning — TikTok Research Seminar, Remote, 2025  Slides
  • Optimal Robust Assortment Learning from Observational Data — INFORMS Annual Meeting, Atlanta, 2025  Slides
  • Optimal Computational-Statistical Tradeoff: Single-Index Model — 2nd NeurIPS M3L Workshop, Vancouver, 2024  Slides
  • Distributionally Robust RL with Interactive Data Collection — INFORMS Annual Meeting, Seattle, 2024  Slides
  • Double Pessimism for Distributionally Robust Offline RL — CISS, Princeton, 2024 · INFORMS Annual Meeting, Phoenix, 2023  Slides

Awards & Honors

  • Xinhe Scholarship — outstanding undergraduate researcher, USTC (Mar 2023)
  • 41st Guo Moruo Scholarship — highest scholarship from USTC (Sep 2021)
  • Yuanqing Yang Scholarship — top scholarship, School of Mathematical Sciences, USTC (Jan 2022)
  • S.-T. Yau College Student Mathematics Contests, winning prize (Prob. & Stat. track) (Jun 2020)
  • National Scholarship — highest scholarship from the Ministry of Education of China (2019 & 2020)

Professional Service

Journal Reviewer

Annals of Applied Probability (AOAP) · Operations Research (OR) · Management Science (MS) · Mathematics of Operations Research (MOR) · Transactions on Machine Learning Research (TMLR)

Conference Reviewer

NeurIPS (2023–2026) · ICLR (2024–2026) · ICML (2024–2026) · AISTATS (2025) · AAAI (2025) · ICML Workshops (ARLET, EXAIT, MOSS) · NeurIPS M3L