I study reinforcement learning and optimization. I work on building evolving agents that handle long-horizon tasks. Overview:
- Long-horizon & self-evolving agents: SUPO and Context-Folding (agent actively managing its context optimized by end-to-end RL), INFUSER (RL with self-synthesized training data).
- Algorithmic foundation of RL: RPO (overoptimization in RLHF), MEX (unified RL exploration strategy).
- Optimization in deep learning: Benign Oscillation (generalization benefit of large LR), Momentum in River-valley.
I am a 3rd-year Ph.D. candidate at Management Science & Engineering, Stanford University, advised by Prof. Jose Blanchet. I received my B.S. in Mathematics from the University of Science and Technology of China. I interned at NVIDIA Research, ByteDance Seed, and Ubiquant Investment.
miaolu [at] stanford [dot] edu
Research
Selected Publications and Preprints
Authors with * contributed equally. See full publication list in Publications
Background
Experiences
Education
- Ph.D. candidate in Operations Research, advised by Jose Blanchet
- Department of Management Science & Engineering
- B.S. in Mathematics and Applied Mathematics (GPA: 4.06/4.30, Rank: 2/140+)
- School of the Gifted Young · 41st Guo Moruo Scholarship
Industrial Experience
- Research Scientist Intern
- Research Scientist Intern
- Quantitative Research Intern
Research Visiting
Toyota Technological Institute at ChicagoJul 2024 – Sep 2024- Student Visitor, hosted by Tianhao Wang and Zhiyuan Li
- Research Assistant, hosted by Difan Zou
Invited Talks
- Scaling Long Horizon LLM Agent via Reinforcement Learning — TikTok Research Seminar, Remote, 2025 Slides
- Optimal Robust Assortment Learning from Observational Data — INFORMS Annual Meeting, Atlanta, 2025 Slides
- Optimal Computational-Statistical Tradeoff: Single-Index Model — 2nd NeurIPS M3L Workshop, Vancouver, 2024 Slides
- Distributionally Robust RL with Interactive Data Collection — INFORMS Annual Meeting, Seattle, 2024 Slides
- Double Pessimism for Distributionally Robust Offline RL — CISS, Princeton, 2024 · INFORMS Annual Meeting, Phoenix, 2023 Slides
Awards & Honors
- Xinhe Scholarship — outstanding undergraduate researcher, USTC (Mar 2023)
- 41st Guo Moruo Scholarship — highest scholarship from USTC (Sep 2021)
- Yuanqing Yang Scholarship — top scholarship, School of Mathematical Sciences, USTC (Jan 2022)
- S.-T. Yau College Student Mathematics Contests, winning prize (Prob. & Stat. track) (Jun 2020)
- National Scholarship — highest scholarship from the Ministry of Education of China (2019 & 2020)
Professional Service
Annals of Applied Probability (AOAP) · Operations Research (OR) · Management Science (MS) · Mathematics of Operations Research (MOR) · Transactions on Machine Learning Research (TMLR)
NeurIPS (2023–2026) · ICLR (2024–2026) · ICML (2024–2026) · AISTATS (2025) · AAAI (2025) · ICML Workshops (ARLET, EXAIT, MOSS) · NeurIPS M3L