Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers
Aryaman Gupta, Kaustav Chakraborty, Somil Bansal — 2023 · arXiv
reachability-analysis
anomaly-detection
runtime-monitoring
fallback-controller
out-of-distribution
safety
TL;DR
<p class="text-sm leading-relaxed">HJ reachability stress-tests vision-based controllers offline to mine system-level failures; trains runtime anomaly classifier + fallback controller.Summary
<p class="text-sm leading-relaxed">Vision-based controllers fail catastrophically on OOD inputs; component-level anomaly detectors miss cascading system failures. Offline HJ reachability auto-labels unsafe images; EfficientNet-B0 classifier detects anomalies at runtime; fallback slows aircraft, reducing BRT volume from 25.75% to 18.24% on TaxiNet taxiing task.Key contributions
- — HJ reachability-based automatic labeling of system-level anomalous images across diverse simulator conditions—no manual annotation.
- — Runtime anomaly classifier + velocity-reduction fallback that shrinks closed-loop failure set by ~7.5pp on aircraft taxiing.
Novelty
<p class="text-sm leading-relaxed">Extends Chakraborty & Bansal (RA-L 2023) from offline failure discovery to online detection/mitigation without privileged environment info at test time.Methods
- — Hamilton-Jacobi BRT computation via Level Set Toolbox over 101³ grid in X-Plane simulator
- — EfficientNet-B0 binary classifier trained on BRT-labeled 240K image dataset
- — Velocity-reduction fallback controller triggered by anomaly classifier at runtime
Strengths
- — Outperforms prediction-error and ensemble baselines with explicit state-space visualization showing why component-level labels mislead.
- — Emergent semantic understanding (runway markings, night lights) learned without heuristics—validated against ground-truth BRTs on unseen airports.
Weaknesses
- — Single domain (aircraft taxiing); 3-state system—curse of dimensionality limits scalability; real-world validation absent.
- — Fallback controller (velocity reduction) is trivial; no comparison against more capable alternatives (SLAM, MPC) or formal safety guarantees.
Future work
- — Replace grid-based HJ with DeepReach for high-dimensional systems.
- — Temporal/visual-history-aware anomaly detector; incremental TaxiNet retraining on mined failures.
Key insights
- — Component-level prediction error is both pessimistic and optimistic in different state regions—poor proxy for system-level safety.
- — BRT-derived labels enable a classifier to learn semantic failure modes (runway markings, lighting) without any manual heuristics.
My thoughts
My Thoughts
A supervised learning based runtime monitor trained to do binary classification.
Initial Reaction
Connections
Implementation Notes
Open Questions
Connections
-
Unsupervised Discovery of Failure Taxonomies from Deployment Logs
Both build runtime monitors for vision-based controllers by automatically mining/labeling system-level failures and detecting OOD anomalies.
Related papers
- Unsupervised Discovery of Failure Taxonomies from Deployment Logs d=0.86
- Deep Reinforcement Learning for Sim-to-Real Policy Transfer of VTOL-UAVs Offshore Docking Operations d=1.00
- Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success d=1.02
- Reasoning Models Don't Always Say What They Think d=1.02
- DeepThinkVLA: Enhancing Reasoning Capability of Vision-Language-Action Models d=1.05
Extracted by claude-sonnet-4-6.