Aryaman's research archive
Robotics · VLA · world models · reasoning
-
GRPO vs DAPO
Jun 7, 2026Note
1. Asymmetric clipping: Clip with different epsilons, with a higher upper epsilon to maintain exploration and prevent entropy collapse.
-
Off-Policy Drift
Jun 7, 2026Note
- Classical synchronous RL training used to wait for data generation before doing any gradient update, wasting GPUs as they lie idle when data generation is happening.
-
Reasoning Models Don't Always Say What They Think
May 30, 2026Yanda Chen, Joe Benton, Ansh Radhakrishnan, Jonathan Uesato, … — 2025 · arXiv
CoT reasoning in SOTA models (Claude 3.7, DeepSeek R1) is unfaithful <20% of the time; RL doesn't fix it; reward hacking rarely verbalized.
chain-of-thought safety reward-hacking runtime-monitoring alignment-faking faithfulness -
DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models
May 29, 2026Cheng Yin, Yankai Lin, Wang Xu, Sikyuen Tam, … — 2025 · arXiv
Identifies two necessary conditions for CoT to work in VLA, then builds DeepThinkVLA (hybrid-attention + SFT→RL) achieving SOTA on LIBERO, LIBERO-Plus, RoboTwin 2.0.
chain-of-thought vla out-of-distribution action-chunking robot-foundation-model hierarchical-policy -
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
May 28, 2026Moo Jin Kim, Chelsea Finn, Percy Liang — 2025 · arXiv
Systematic VLA fine-tuning study yields OpenVLA-OFT: parallel decoding + action chunking + continuous L1 regression achieves 97.1% on LIBERO and outperforms π0/RDT-1B on ALOHA.
vla dexterous-manipulation robot-foundation-model action-chunking film-conditioning -
Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations
May 28, 2026Ali M. Ali, Aryaman Gupta, Hashim A. Hashim — 2024 · arXiv
Hierarchical DRL (model-based approach + PPO landing) for VTOL-UAV autonomous docking on wave-disturbed offshore platforms with sim-to-real transfer.
sim-to-real hierarchical-policy domain-randomization UAV fallback-controller safety -
Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers
May 28, 2026Aryaman Gupta, Kaustav Chakraborty, Somil Bansal — 2023 · arXiv
HJ reachability stress-tests vision-based controllers offline to mine system-level failures; trains runtime anomaly classifier + fallback controller.
reachability-analysis anomaly-detection runtime-monitoring fallback-controller out-of-distribution safety -
Unsupervised Discovery of Failure Taxonomies from Deployment Logs
May 28, 2026Aryaman Gupta, Yusuf Umut Ciftci, Somil Bansal — 2025 · arXiv
Unsupervised framework: VLM-inferred failure explanations + LLM clustering → interpretable failure taxonomies from raw deployment logs.
failure-analysis unsupervised-clustering deployment-logs runtime-monitoring chain-of-thought data-collection -
$π_0$: A Vision-Language-Action Flow Model for General Robot Control
May 28, 2026Kevin Black, Noah Brown, Danny Driess, Adnan Esmail, … — 2024 · arXiv
π0: 3.3B VLA built on PaliGemma + flow-matching action expert, pre-trained on 10k hrs cross-embodiment data, enabling dexterous long-horizon manipulation.
vla flow-matching robot-foundation-model dexterous-manipulation cross-embodiment hierarchical-policy
Nothing matches.