Aegis Lights
Self-adaptive urban traffic control system achieving 45–49% reduction in average trip time across all traffic density scenarios.
Overview
Aegis Lights is a self-adaptive infrastructure system designed to optimize urban traffic signal timing in real time. Traditional fixed-cycle signals cannot react to dynamic conditions — accidents, surges, irregular flows — resulting in unnecessary congestion. Aegis Lights applies a closed-loop control architecture to sense, reason, and adapt signal phases continuously, deployed as a containerized microservice system on IBM Cloud.
Problem Statement
Static traffic signal phases fail under variable load. A single incident can cause queue spillback that propagates through an entire intersection network. The goal was to build a controller that could detect anomalous conditions and adapt signal timing without human intervention, using only real-time queue-length and flow data.
Approach
MAPE-K Feedback Loop
The system is architected around the Monitor-Analyze-Plan-Execute-Knowledge (MAPE-K) feedback loop — the standard pattern for self-adaptive systems. Each component has a well-defined responsibility:
- Monitor: Virtual sensor nodes continuously poll the CityFlow simulation for queue lengths, vehicle counts, and incident flags at each intersection lane.
- Analyze: Computes an incident-aware cost model that penalizes queue spillback and flags abnormal congestion states.
- Plan: Selects the next signal phase using a Contextual Bandit algorithm (UCB), balancing exploration of untested timing patterns with exploitation of known low-congestion phases.
- Execute: Applies the selected phase through the simulation API; phase changes are atomic and validated before execution.
- Knowledge: Persists per-intersection statistics, phase history, and incident logs to a SQLite/Parquet knowledge store to inform future decisions.
System Architecture & Deployment
The system is deployed as a suite of containerized microservices on IBM Cloud, orchestrated via Kubernetes Jobs and CronJobs:
- CityFlow Simulator: The traffic simulation engine runs as a K8s Job, exposing its state to the rest of the system.
- Virtual Sensors (Python): Lightweight sensor processes poll the simulator and publish readings to the knowledge store.
- MAPE-K Service (Python FastAPI): The core adaptive controller — implements Monitor, Analyze, Plan, Execute as REST endpoints. FastAPI with mutex locking manages concurrent read/write access to simulation state.
- Signal Controllers (Python): Receive phase commands from the MAPE-K service and apply them to the simulator.
- Knowledge Store: SQLite/Parquet for lightweight, schema-flexible persistence of sensor readings, phase history, and incident logs.
- IBM Cloud Object Storage (COS): Long-term artifact storage — simulation logs, performance reports.
- IBM Log Analysis: Centralized logging and observability across all microservices.
- IBM Container Registry (ICR): All service images are built and stored in ICR, pulled by K8s at deploy time.
- IBM Cloud Monitoring (Sysdig): Real-time metrics and alerting for system health.
Contextual Bandit Controller
Signal phase selection is framed as a multi-armed bandit problem where each "arm" represents a pre-validated phase configuration from a phase library. The Upper Confidence Bound (UCB) algorithm was chosen over epsilon-greedy for its principled exploration bonus, which ensures phases that haven't been recently tested remain in consideration even when a strong default exists.
The reward signal is the negative of the incident-aware cost model — phases that reduce queue buildup and avoid spillback receive higher rewards and are selected more frequently over time.
Results
45–49% reduction in average trip time across low, medium, and high traffic density scenarios compared to fixed-phase baselines.
- Incident-aware cost model successfully rerouted virtual traffic flows when artificial roadway incidents were injected mid-simulation.
- UCB exploration strategy discovered 2–3 non-obvious phase configurations per scenario that outperformed the human-designed defaults.
- FastAPI service handled concurrent monitor and execute calls without deadlock across all test runs.
- Full IBM Cloud deployment demonstrated end-to-end observability via IBM Log Analysis and Sysdig metrics.
Reflection
The MAPE-K pattern proved an excellent fit for this problem — the clean separation of concerns made it straightforward to swap the planning algorithm (e.g., replace UCB with a full RL policy) without touching the monitoring or execution layers. The IBM Cloud deployment surfaced real distributed systems challenges: container startup ordering, mutex contention under concurrent sensor polling, and log aggregation across ephemeral K8s Jobs. The main algorithmic limitation is that the bandit model has no memory of long-term phase sequences; a sequential RL agent (e.g., PPO) could likely push results further by learning multi-step timing strategies.
Assets
Aegis Lights simulation — CityFlow traffic network responding to adaptive signal control in real time.