Edge AI Traffic Analytics and violation Detection

Team: Ashish Kumar Verma, Uppugunduru Sriram Anush, Prince Kumar
GitHub: https://github.com/Prince-IISc-CalUniv/Edge-AI-Traffic-Analytics-and-violation-Detection

1. Problem Statement, Motivation and Objectives

India’s traffic landscape presents some of the most complex and chaotic road conditions in the world. With over 1.55 lakh fatalities in road accidents annually (MoRTH 2023), key violations such as helmetless riding, triple riding on two-wheelers, overspeeding, and poor road infrastructure such as potholes remain leading contributors to road injuries and deaths. Manual enforcement is labour-intensive, inconsistent, and cannot scale across the country’s vast road network.

Traditional AI-based traffic systems rely on cloud processing, which introduces high latency (200-500 ms round-trip), privacy concerns (video data leaving the premise), bandwidth costs, and single points of failure. Edge AI offers a compelling alternative: models compressed to run directly on low-cost devices like the Raspberry Pi can process video locally, in real time, with no internet dependency.

This project demonstrates a multi-task edge AI traffic monitoring system that performs three distinct detection tasks simultaneously on a single Raspberry Pi 5, using separate YOLOv8/v11 models compressed via NCNN and TFLite for real-time performance.

Key Objectives

Detect helmet violations and triple-riding on two-wheelers using rider-to-bike spatial association logic
Estimate vehicle speed via homography-based calibration and flag overspeeding; compute real-time congestion levels (LOS A-F) and volume statistics across vehicle categories
Detect road surface potholes to assess road infrastructure quality
Compress all models (pruning, FP16 quantization, NCNN/TFLite export) to achieve real-time inference (>2 FPS) on the Raspberry Pi 5
Deploy an integrated, self-contained edge system using the Raspberry Pi Camera Module with no cloud or internet required

2. Proposed Solution

The system follows a three-pipeline architecture where each pipeline handles a distinct traffic monitoring task, and a unified inference script orchestrates all three on the edge device.

System Architecture

+---------------------------------------------------------------+
|                    RASPBERRY PI 5 (8 GB)                      |
|                                                               |
|  +----------+    +--------------------------------------+     |
|  | PiCamera |--> |       Unified Inference Engine       |     |
|  | Module   |    |  traffic_monitor.py                  |     |
|  +----------+    |                                      |     |
|                  |  +------------+  +----------------+  |     |
|                  |  | Pipeline 1 |  | Pipeline 2     |  |     |
|                  |  | Helmet &   |  | Traffic Speed, |  |     |
|                  |  | Triple-    |  | Volume, LOS,   |  |     |
|                  |  | Riding     |  | Congestion     |  |     |
|                  |  | NCNN FP16  |  | NCNN FP16      |  |     |
|                  |  +------------+  +----------------+  |     |
|                  |  +--------------------------------+  |     |
|                  |  |        Pipeline 3              |  |     |
|                  |  | Pothole & Traffic Sign Detect. |  |     |
|                  |  | ONNX FP16 / TFLite FP16        |  |     |
|                  |  +--------------------------------+  |     |
|                  +--------------------------------------+     |
|                           |                                   |
|               +-----------+-----------+                       |
|               v           v           v                       |
|         Annotated    Speed Plot   CSV Stats                   |
|         Video        (PNG)        (per-frame)                 |
+---------------------------------------------------------------+

End-to-End Pipeline

Data Collection   -->  Model Training   -->  Compression    -->  Deployment
(Roboflow, HF,         (YOLOv8n/v11n,       (NCNN FP16,        (Raspberry Pi 5,
 Kaggle datasets)       Kaggle GPU)          TFLite FP16)        PiCamera, CV2)

Data: Curated datasets from Roboflow Universe, HuggingFace (UVH-26), and Kaggle are downloaded, remapped to unified class schemas, merged, and split into train/val/test sets.

Model: YOLOv8-Nano and YOLOv11-Nano architectures are fine-tuned on Kaggle GPUs (T4/P100) with domain-specific augmentations (HSV jitter, mosaic, mixup, flip).

Compression: Trained .pt weights are exported to NCNN (FP16) and TFLite (FP16) formats, reducing model size by ~50% while maintaining >95% of original mAP.

Deployment: A single Python script loads all three compressed models, reads the PiCamera stream, runs detection + tracking (ByteTrack), and produces annotated video with real-time HUD overlays.

3. Hardware and Software Setup

Hardware

Component	Specification	Role
Raspberry Pi 5	8 GB RAM, Broadcom BCM2712 (Cortex-A76)	Edge compute platform
PiCamera Module	1280x720 @ 30 FPS (RGB888)	Video input source
MicroSD Card	64 GB UHS-I	OS + model storage
Power Supply	USB-C 5V/5A	Pi power
Display	HDMI monitor (optional)	Real-time visualization

Software

Tool	Version	Purpose
Python	3.11	Runtime
Ultralytics	>= 8.2.0	YOLOv8/v11 training, export, inference
OpenCV	>= 4.8	Video I/O, homography, drawing
NCNN	Latest	Optimized inference on ARM CPUs
NumPy	>= 1.24	Numerical operations
Matplotlib	>= 3.7	Speed distribution plots
picamera2	>= 0.3	Raspberry Pi camera driver
ByteTrack	Built-in (Ultralytics)	Multi-object tracking
Kaggle Notebooks	T4/P100 GPU	Cloud training environment

4. Data Collection and Dataset Preparation

Pipeline 1 - Helmet and Triple-Riding Detection

Data Sources: 4 Roboflow Universe datasets merged into unified schema:

#	Dataset	Source	Images	Original Classes
1	Rider Helmet Detection	Yosia Aser (Roboflow)	3,146	helm, pejalan, pemotor, tanpa-helm
2	Bike Helmet	Santhosh (Roboflow)	1,493	bike, With Helmet, Without Helmet
3	helmet-bike	wangbo (Roboflow)	1,680	bike, electric_bike, with_helmet, without_helmet
4	Helmet Rider Detection	CS 174 (Roboflow)	2,099	full-faced, half-faced, helmetRider, invalid, no-helmet, noHelmetRider

Total: ~8,418 images all remapped to the same 4-class schema.

Unified Class Schema (4 classes):

ID	Class	Description
0	motorcycle	Two-wheelers
1	person	Riders and pedestrians
2	helmet	Head with helmet
3	no_helmet	Head without helmet

Remapping Process:

Each dataset has its own mapping preset (e.g., helm -> helmet, pemotor -> motorcycle)
Invalid/ambiguous classes (e.g., invalid, pejalan) are dropped
All datasets are merged into a single train/val/test split with deduplication

Pipeline 2 - Overspeeding, Volume and Congestion

Data Source: UVH-26 Dataset (HuggingFace)

Indian urban traffic dataset with 26 vehicle classes
COCO-format annotations converted to YOLO format
50% subset sampled (balanced across categories)

Preprocessing:

Downloaded COCO annotation JSONs from HuggingFace Hub
Parallel bulk-downloaded images (8 threads)
Converted COCO bounding boxes to YOLO normalized format
Created data.yaml with train/val splits (85/15 ratio)
Verified annotations with visual sanity checks

Pipeline 3 - Pothole Detection

Data Source: Pothole Image Dataset (Kaggle)

Curated dataset focusing on road surface anomalies and structural damage
Bounding box annotations representing varied lighting and weather conditions

Preprocessing:

Downloaded and extracted raw image data and annotations from Kaggle
Re-mapped all target annotations strictly to ID 0 (pothole) to ensure a unified single-class detection schema
Converted bounding boxes to YOLO normalized format
Segmented the dataset into standard splits: 80% Train / 15% Validation / 5% Test
Generated the data.yaml configuration file required for Ultralytics YOLOv8 training
Verified dataset integrity with visual sanity checks before training

5. Model Design, Training and Evaluation

Pipeline 1 - YOLOv8-Nano (Helmet and Triple-Riding)

Parameter	Value
Architecture	YOLOv8-Nano
Pretrained on	COCO
Input size	320 then 640 (progressive)
Epochs	30 + 50 (two-stage training)
Batch size	32 then 16 (adapted for 640 VRAM)
Optimizer	SGD (default)
Augmentations	Mosaic (1.0), Mixup (0.1), Copy-Paste (0.1), HSV jitter

Two-Stage Training Strategy:

Stage 1 (30 epochs @ 320px): Fast convergence at low resolution, batch size 32
Stage 2 (50 epochs @ 640px): Fine-tuning at full resolution for small object accuracy, batch size 16

Training Platform: Kaggle (T4 GPU)

Pipeline 2 - YOLOv11-Nano (Traffic Detection)

Parameter	Value
Architecture	YOLOv11-Nano (YOLO11n)
Pretrained on	COCO
Input size	640 x 640
Epochs	30
Batch size	16
Optimizer	AdamW
Learning rate	1e-3 (cosine schedule)
Frozen layers	First 10
Patience	3 (early stopping)
Augmentations	Mosaic (1.0), Mixup (0.1), HSV jitter, Flip, Scale (0.5), Translate (0.1), Rotate (+/-5 deg)

Training Platform: Kaggle (T4 GPU)

Pipeline 3 - YOLOv8-Nano (Pothole Detection)

Parameter	Value
Architecture	YOLOv8-Nano
Pretrained on	COCO
Input size	640 x 640
Epochs	50
Batch size	16
Optimizer	SGD (default)
Learning rate	1e-2 (default schedule)
Patience	50 (early stopping)
Augmentations	HSV jitter, Flip (0.5), Mosaic (1.0), Mixup (0.1)

Training Platform: Kaggle (T4 GPU)

Evaluation Metrics

Model	mAP50	mAP50-95	Notes
Pipeline 1 (YOLOv8n .pt)	~0.81	~0.54	Merged Roboflow validation
Pipeline 2 (YOLOv11n .pt)	~0.82	~0.55	UVH-26 validation set
Pipeline 3 (YOLOv8n .pt)	~0.75	~0.48	Pothole Detection

6. Model Compression and Efficiency Metrics

Compression Techniques Applied

1. NCNN Export (FP16 Quantization)

Used for Pipelines 1 and 2, optimized for ARM CPU inference.

model = YOLO("best.pt")
model.export(format="ncnn", imgsz=640, half=True)

Weight precision: FP32 to FP16 (50% size reduction)
NCNN backend: Vulkan-accelerated on Cortex-A76
No calibration dataset required (unlike INT8)

2. ONNX Export (FP16)

Used for Pipeline 3 and generated via Pothole_03_export_quantize.py. ONNX was selected as the final portable cross-platform format for edge pothole detection, resulting in the provided best.onnx file.

model.export(format="onnx", imgsz=640, half=True)

Weight precision: FP32 to FP16 Final Output: best.onnx (used by Pothole_04_edge_inference.py) TFLite Note: TFLite export was initially evaluated during development, but ONNX provided better native integration and performance stability for our specific edge deployment in this pipeline.

Note: INT8 quantization was initially attempted across our pipelines but abandoned because the calibration process requires loading 700+ representative images into RAM simultaneously, causing Out-of-Memory errors on standard PCs. FP16 provides nearly identical speedup without the calibration overhead.

Compression Results

Metric	Pipeline 1 (.pt)	Pipeline 1 (NCNN FP16)	Pipeline 2 (.pt)	Pipeline 2 (NCNN FP16)	Pipeline 3 (.pt)	Pipeline 3 (TFLite/ONNX FP16)
Model Size	~6.2 MB	~3.2 MB	~10.61 MB	~5.38 MB	~11.9 MB	~5.94 MB
mAP50	0.8165	~0.81	0.82	~0.81	0.805	~0.754
Accuracy Drop	-	< 1%	-	< 1%	-	6.34%
Format	PyTorch	NCNN FP16	PyTorch	NCNN FP16	PyTorch	TFLite & ONNX FP16

Trade-offs

FP16 vs INT8: FP16 retains >99% accuracy with a 50% size reduction. INT8 could offer a 75% reduction but requires large calibration datasets, causes memory bottlenecks during export, and risks higher accuracy loss on smaller objects (like distant potholes or helmets).
NCNN vs TFLite vs ONNX: NCNN was used for Pipelines 1 & 2 as it is ~15-20% faster on the ARM Cortex-A76 (Pi 5) due to NEON SIMD optimizations. TFLite and ONNX were used for Pipeline 3 to provide cross-platform portability for varying edge accelerators.
Memory: Peak RAM usage is approximately 1.2 GB when all three integrated models (Traffic, Helmet, and Pothole) are loaded simultaneously well within the Raspberry Pi 5’s 8 GB capacity.

7. Model Deployment and On-Device Performance

Deployment Steps

Flash Raspberry Pi OS (64-bit) to MicroSD card
Install dependencies:
```
pip install -r requirements_rpi.txt
```
Copy compressed models to the Pi:
- helmet_ncnn_model/ - Pipeline 1 (helmet detection)
- best_ncnn_model/ - Pipeline 2 (traffic/vehicle detection)
- best.onnx - Pipeline 3 (pothole model)

Run homography calibration (one-time setup):

python deployment/calibrate_homography.py --image reference_frame.jpg --width 7.0 --length 20.0

Launch the unified inference:

python deployment/traffic_monitor.py \
    --source picam \
    --model best_ncnn_model \
    --helmet-model helmet_ncnn_model \
    --pothole-model best.onnx \
    --speed-limit 60 \
    --calib calibration.json

On-Device Performance (Raspberry Pi 5, 8 GB)

Metric	Value
Frame Rate	~2-3.5 FPS (all 3 models active)
Inference Time (per model)	~120-180 ms
Total Latency (per frame)	~350-500 ms
RAM Usage	~1.2 GB (3 models + OpenCV + tracking)
CPU Utilization	~75-85% (4 cores)
Video Output	1280x720 MP4, annotated in real-time
Tracking	ByteTrack (persistent ID assignment)

Real-Time Behaviour

The system processes live PiCamera frames in a continuous loop:

Main model (traffic detection with ByteTrack tracking) runs every frame
Helmet model runs every 1 second (configurable via HELMET_FRAME_INTERVAL_S)
Speed estimation uses homography to project pixel centroids to real-world coordinates, computing speed over a sliding window of 10 positions per tracked vehicle
Congestion classification uses the Level-of-Service (LOS) A-F scale based on vehicle density and average speed within the calibrated ROI
HUD overlay displays real-time FPS, vehicle count, average speed, congestion level, and violation counts

8. System Prototype (Pictures / Figures)

Vehicle Detection and Tracking

The main traffic model detects and classifies 14 vehicle categories in real time using ByteTrack for persistent multi-object tracking:

Pipeline 1 Sample Results

Figure 1: Vehicle detection results from Pipeline 1 showing classification of Two-wheelers, Sedans, Trucks, SUVs, Three-wheelers, and other vehicle categories on Indian urban traffic scenes.

Helmet Violation Detection

The helmet detection pipeline identifies motorcycles, associates riders using IoU-based spatial logic, and flags violations:

Helmet Detection at Intersection

Figure 2: Helmet and rider detection at a traffic intersection showing motorcycle-rider association.

Violation Report

Figure 3: Detailed violation report showing helmet compliance status and triple-riding detection with color-coded bounding boxes (green = safe, red = violation).

Speed Distribution Output

The system generates a speed distribution histogram after each run, showing the number of vehicles at each speed bin with overspeeding highlighted in red:

Speed Distribution Plot

Figure 4: Vehicle speed distribution from a sample run. Green bars indicate vehicles within the 60 km/h limit; red bars would appear for overspeeding vehicles. The dashed orange line marks the configurable speed limit.

Pothole Detection

Pipeline 3 detects road surface potholes for infrastructure quality assessment:

Pothole Demo 1

Figure 4: Pothole detection on road surface showing multiple detected potholes.

Pothole Demo Video Frame

Figure 5: Pothole detection from video feed with bounding box annotations and confidence scores.

CSV Statistics Output

Per-frame statistics are logged to a CSV file with columns:

frame, t_s, vehicles_in_roi, avg_speed_kmh, los, overspeeding,
total_unique, helmet_bikes, helmet_violations, triple_violations

This enables post-hoc analysis, plotting trends over time, and generating aggregate reports.

Hardware Setup

The deployment setup consists of the Raspberry Pi 5 connected to the PiCamera Module via CSI ribbon cable, with an HDMI display for real-time visualization. The system runs headlessly (via SSH) in production, saving output to disk.

9. Conclusions and Limitations

Key Outcomes

Successfully deployed three YOLO models simultaneously on a single Raspberry Pi 5, demonstrating that multi-task traffic monitoring is feasible on consumer-grade edge hardware (~$80 device).
NCNN FP16 quantization preserved >99% of mAP while halving model size. The NCNN backend outperforms TFLite on the Pi 5’s Cortex-A76 due to NEON SIMD optimization.
Homography-based speed estimation provides calibrated real-world speed measurements without requiring radar or LIDAR, using only a single camera and a one-time 4-point calibration.
ByteTrack multi-object tracking enables persistent vehicle IDs across frames, which is essential for accurate speed computation and avoiding double-counting in volume statistics.
The rider-to-bike association algorithm (IoU + shrink-box heuristic) correctly identifies triple-riding and helmet violations by spatially mapping detected heads/helmets to motorcycle bounding boxes.

Limitations

Frame rate (2-3.5 FPS) is sufficient for monitoring but not for capturing very fast-moving vehicles at highway speeds; a higher-end edge device (e.g., Jetson Nano) would be needed for 15+ FPS
Speed estimation accuracy depends heavily on the quality of the homography calibration; perspective distortion at the edges of the frame introduces error
Helmet model runs at 1 Hz (not every frame) to maintain overall throughput, which means some violations may be missed during the inter-inference gap
No night-time / low-light testing was performed; performance degrades significantly under poor illumination
Pothole model integration is prepared in the code but not fully active in the unified pipeline (placeholder for future work)
No real-world deployment in an actual traffic intersection was conducted; all testing used recorded video feeds

10. Future Work

INT8 quantization with proper calibration pipeline: Use a small representative calibration dataset (100-200 images) loaded in batches to enable INT8 quantization without OOM, potentially doubling inference speed
Jetson Nano / Orin deployment: Leverage GPU-accelerated TensorRT for 15-30 FPS real-time performance
Night-time augmentation: Add synthetic low-light, headlight glare, and rain effects to the training data for 24/7 operational capability
Alert system integration: Add GPIO-based alert triggers (buzzer, LED panel) and MQTT/HTTP notifications for detected violations
Multi-camera stitching: Extend to handle 2-4 camera feeds for full intersection coverage
Dashboard web UI: Build a lightweight Flask/FastAPI dashboard served locally on the Pi for visualizing live statistics
License plate recognition: Add an ANPR module for linking violations to specific vehicles
Federated learning: Enable model updates across multiple deployed Pi units without centralizing video data

11. Challenges and Mitigation

#	Challenge	Impact	Mitigation
1	INT8 quantization OOM - Calibration required loading 700+ images into RAM at once	Could not use INT8 on PCs with <=16 GB RAM	Switched to FP16 quantization; nearly identical speed improvement with zero calibration overhead
2	Dataset class inconsistency - 4 Roboflow datasets had completely different class names (e.g., Indonesian `helm`, `pemotor`)	Direct merging produced garbage labels	Built a per-dataset remapping system with verified presets; manually checked every class mapping against the actual `data.yaml`
3	Missing data.yaml in Dataset 2 - Santhosh’s bike-helmet dataset had no class name file	Could not determine class mapping from names alone	Inspected raw label files to identify class IDs; created manual ID-to-class mapping
4	Homography calibration sensitivity - Small errors in the 4 calibration points caused large speed estimation errors	Speeds were off by 50-200% initially	Developed an interactive calibration tool (`calibrate_homography.py`) with clear visual instructions; recommend using real-world measurements for calibration points
5	Multi-model memory pressure - Loading 3 YOLO models + ByteTrack + OpenCV on Pi 5	RAM usage spiked to ~2 GB, causing swapping	Profiled memory usage; NCNN models use significantly less RAM than PyTorch; final usage stable at ~1.2 GB on 8 GB Pi
6	f-string TypeError in speed display - `avg_speed` was `None` when no vehicles were in ROI, causing format crash	Runtime crash during empty-road periods	Added safe fallback: `avg_speed_display = avg_speed if avg_speed is not None else 0.0`
7	Two-stage training convergence - Pipeline 1 training at 320px underfitted small objects	Low recall on distant motorcycles	Implemented progressive training: 30 epochs @ 320px for fast convergence, then 50 epochs @ 640px for fine-grained accuracy

12. References

Datasets

UVH-26: Indian Urban Vehicle and Helmet Dataset - https://huggingface.co/datasets/iisc-aim/UVH-26
Rider Helmet Detection (Yosia Aser) - Roboflow Universe
Bike Helmet (Santhosh) - Roboflow Universe
Helmet-Bike (wangbo) - Roboflow Universe
Helmet Rider Detection (CS 174) - Roboflow Universe
Indian Roads Dataset (mitangshu11) - Kaggle

Frameworks and Tools

Ultralytics YOLOv8 - https://docs.ultralytics.com/
Ultralytics YOLOv11 - https://docs.ultralytics.com/models/yolo11/
NCNN - High-performance inference framework for mobile/ARM - https://github.com/Tencent/ncnn
ByteTrack: Multi-Object Tracking by Associating Every Detection Box - Zhang et al., ECCV 2022
OpenCV - https://opencv.org/

Concepts and Techniques

Highway Capacity Manual (HCM) - Level-of-Service (LOS) classification for congestion estimation
Homography-based speed estimation - Perspective transformation from image plane to real-world coordinates
Model Compression Survey: Pruning, Quantization, and Knowledge Distillation - Cheng et al., 2024
FOMO (Faster Objects, More Objects) - Edge Impulse, TinyML object detection

Course and Reference Projects

CP 330 Edge AI Course - Prof. Pandarasamy Arjunan, IISc Bangalore - https://www.samy101.com/edge-ai-25/
AI Helmet Project (Reference) - https://github.com/samy101/ai-helmet
Ministry of Road Transport and Highways, India - Road Accident Statistics 2023