Detecting and Mitigating System-Level Anomalies of Vision-Based Controllers

Aryaman Gupta, Kaustav Chakraborty, Somil Bansal — 2023 · arXiv

arXiv ↗ PDF ↗ Added May 28, 2026

reachability-analysis anomaly-detection runtime-monitoring fallback-controller out-of-distribution safety

TL;DR

<p class="text-sm leading-relaxed">HJ reachability stress-tests vision-based controllers offline to mine system-level failures; trains runtime anomaly classifier + fallback controller.

Summary

<p class="text-sm leading-relaxed">Vision-based controllers fail catastrophically on OOD inputs; component-level anomaly detectors miss cascading system failures. Offline HJ reachability auto-labels unsafe images; EfficientNet-B0 classifier detects anomalies at runtime; fallback slows aircraft, reducing BRT volume from 25.75% to 18.24% on TaxiNet taxiing task.

Key contributions

— HJ reachability-based automatic labeling of system-level anomalous images across diverse simulator conditions—no manual annotation.
— Runtime anomaly classifier + velocity-reduction fallback that shrinks closed-loop failure set by ~7.5pp on aircraft taxiing.

Novelty

<p class="text-sm leading-relaxed">Extends Chakraborty & Bansal (RA-L 2023) from offline failure discovery to online detection/mitigation without privileged environment info at test time.

Methods

— Hamilton-Jacobi BRT computation via Level Set Toolbox over 101³ grid in X-Plane simulator
— EfficientNet-B0 binary classifier trained on BRT-labeled 240K image dataset
— Velocity-reduction fallback controller triggered by anomaly classifier at runtime

Strengths

— Outperforms prediction-error and ensemble baselines with explicit state-space visualization showing why component-level labels mislead.
— Emergent semantic understanding (runway markings, night lights) learned without heuristics—validated against ground-truth BRTs on unseen airports.

Weaknesses

— Single domain (aircraft taxiing); 3-state system—curse of dimensionality limits scalability; real-world validation absent.
— Fallback controller (velocity reduction) is trivial; no comparison against more capable alternatives (SLAM, MPC) or formal safety guarantees.

Future work

— Replace grid-based HJ with DeepReach for high-dimensional systems.
— Temporal/visual-history-aware anomaly detector; incremental TaxiNet retraining on mined failures.

Key insights

— Component-level prediction error is both pessimistic and optimistic in different state regions—poor proxy for system-level safety.
— BRT-derived labels enable a classifier to learn semantic failure modes (runway markings, lighting) without any manual heuristics.

My thoughts

My Thoughts

A supervised learning based runtime monitor trained to do binary classification.

Initial Reaction

Connections

Implementation Notes

Open Questions

Connections

Unsupervised Discovery of Failure Taxonomies from Deployment Logs
Both build runtime monitors for vision-based controllers by automatically mining/labeling system-level failures and detecting OOD anomalies.

Related papers

Extracted by claude-sonnet-4-6.