Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations
Ali M. Ali, Aryaman Gupta, Hashim A. Hashim — 2024 · arXiv
sim-to-real
hierarchical-policy
domain-randomization
UAV
fallback-controller
safety
TL;DR
<p class="text-sm leading-relaxed">Hierarchical DRL (model-based approach + PPO landing) for VTOL-UAV autonomous docking on wave-disturbed offshore platforms with sim-to-real transfer.Summary
<p class="text-sm leading-relaxed">Decomposes offshore UAV landing into model-based approach phase and DRL landing phase; uses JONSWAP spectrum domain randomization per episode for sim-to-real generalization. PPO outperforms DQN variants: 0.327 m/s impact velocity vs 0.820 m/s (DQN), converging in <200 episodes.Key contributions
- — Hierarchical decomposition: model-based approach phase + offline-trained PPO landing phase reduces training time and improves success rate.
- — JONSWAP spectrum domain randomization per episode as stochastic wave disturbance model for sim-to-real policy generalization.
Novelty
<p class="text-sm leading-relaxed">Unlike prior DRL UAV docking (Hwangbo et al., Koch et al.) targeting static platforms, combines phase decomposition with per-episode JONSWAP wave randomization for moving offshore stations.Methods
- — PPO actor-critic with GAE for continuous thrust control in landing phase
- — JONSWAP spectrum + inverse Fourier transform for per-episode randomized wave generation
- — Dueling/Double DQN with experience replay as discrete-action baselines
Strengths
- — DQN vs Double DQN vs Dueling DQN vs PPO comparison with identical network architectures enables fair algorithm benchmarking.
- — JONSWAP per-episode randomization is physically grounded domain randomization — not generic noise injection.
Weaknesses
- — No real-world hardware experiments; sim-to-real transfer claim is unvalidated beyond numerical simulation.
- — 1D vertical dynamics only (z-axis); full 6-DOF landing with lateral disturbances not addressed.
Future work
- — Incorporate visual feedback from onboard camera into landing phase policy.
- — Validate learned policy on physical VTOL-UAV hardware over water.
Key insights
- — Phase decomposition (model-based approach + DRL landing) cuts training complexity without sacrificing policy quality.
- — Per-episode JONSWAP randomization is sufficient for generalized sim-to-real policy transfer in maritime UAV docking.
My thoughts
My Thoughts
Initial Reaction
Connections
Implementation Notes
Open Questions
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control d=0.99
- Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers d=1.00
- Unsupervised Discovery of Failure Taxonomies from Deployment Logs d=1.02
- Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success d=1.03
Extracted by claude-sonnet-4-6.