Nighttime Street Light Functional Audit Using Computer Vision on Edge Device
Team: Anant Rajput, Kartikeya Gaur, Revathy Ramesh, Vuyyuru Gopi Chand
Code: GitHub Repository
1. Problem Statement, Motivation & Objectives
Urban street lighting is a critical component of public infrastructure it improves visibility, reduces crime, and supports safe vehicular and pedestrian movement after dark. Yet maintaining expansive street lighting networks at city scale remains a formidable challenge. Cities worldwide operate hundreds of thousands of streetlights (Los Angeles alone maintains over 220,000 units), and globally the number is projected to exceed 350 million. Despite their importance, most cities still rely on outdated maintenance practices: periodic manual inspections and citizen complaint-driven reporting. These approaches are labor-intensive, error-prone, and slow to detect outages, leaving malfunctioning lamps unnoticed for extended periods and compromising both public safety and energy efficiency.
The motivation for an automated, vision-based approach is clear: IoT sensor networks that instrument every lamp post carry prohibitive deployment and maintenance costs, while manual drive-by surveys lack scientific reliability and scalability. Computer vision offers a compelling middle path using cameras mounted on ordinary vehicles to perform continuous nighttime drive-by surveys, turning everyday city fleets into infrastructure monitoring platforms.
Kumar et al. (2016) demonstrated the feasibility of this paradigm using a car-top sensor platform with cameras, lux sensors, GPS, and IMU, achieving lamp identification and height estimation across multiple cities. Building on this foundation, recent deep learning approaches have substantially improved detection accuracy — particularly for small, distant, or partially occluded lamps under challenging nighttime conditions. Edge AI deployment of such models is well-motivated: processing video locally on the vehicle eliminates the latency and bandwidth costs of cloud transmission, enables real-time fault flagging, and avoids transmitting sensitive location-linked footage off-device.
Objectives
-
Develop a robust detection model capable of handling nighttime, low-light, and motion-blurred imagery.
-
Implement an image preprocessing pipeline (CLAHE, Gamma correction) to enhance nighttime visibility.
-
Optimize models via pruning and quantization for deployment on resource-constrained hardware.
-
Deploy a real-time web-based interface for live monitoring.
2. Proposed Solution
The system is a vehicle-mounted, edge-deployed streetlight fault detector. A camera captures nighttime video during drive-by patrols; the webapp runs a compressed YOLO model to classify each detected lamp as ON or OFF in real time, logging faults with timestamps for subsequent municipal review — with no cloud connectivity required.
End-to-end pipeline:
Data → Nighttime video is captured along pre-planned routes and extracted into frames. Each frame is labelled with bounding boxes for ON_Streetlight and OFF_Streetlight classes. A preprocessing pipeline (CLAHE, gamma correction, unsharp masking) is applied to compensate for low ambient light and motion blur before training.
https://drive.google.com/drive/folders/1SJH9nexL3rW9DqcZN_Q-33obRFqGnCKH?usp=sharing
Model → Multiple full-precision (FP32) baseline models are trained on a GPU workstation (Google Colab, Tesla T4): YOLOv5nu, YOLOv5su, YOLOv8n, and YOLOv8s. The best-performing lightweight model (YOLOv5nu) is selected for compression.
Compression → Post-training quantization (ONNX Runtime dynamic INT8 and FP16), one-shot and iterative magnitude pruning (detection head excluded to preserve accuracy), and knowledge distillation (YOLOv5su → YOLOv5nu and YOLOv8s → YOLOv8n pairs) are applied to reduce model size and inference latency to levels compatible with mobile-based webapp hardware.
Deployment → The compressed model is exported to ONNX (opset 17) and TFLite INT8 formats (input resolution 320×320) and deployed on a mobile-based webapp. On-device inference time is benchmarked using ONNX Runtime on CPU with threads pinned to mimic Raspberry Pi 4, resource utilisation, and detection quality are measured and reported.
Output → Per-frame bounding box predictions with class labels (ON / OFF) and confidence scores, rendered in real time and logged for fault mapping.
3. Hardware & Software Setup
Hardware:
- Data collection: Smartphone camera mounted on vehicle dashboard, used for nighttime drive-by footage capture across three Bengaluru localities
- Training platform: Google Colab with NVIDIA Tesla T4 GPU (14,913 MiB VRAM)
- Edge deployment platform: Smartphone based Webapp (target device for TFLite INT8 / ONNX inference)
Software:
| Component | Tool / Version |
|---|---|
| Manual Annotation | LabelImg |
| Training framework | Ultralytics 8.4.45 (YOLOv5/v8), PyTorch 2.10.0+cu128 |
| Baseline object detection | YOLOv5nu, YOLOv5su, YOLOv8n, YOLOv8s |
| Preprocessing | OpenCV (CLAHE, gamma, unsharp), Albumentations |
| Compression | PyTorch pruning (torch.nn.utils.prune, head-excluded); ONNX Runtime dynamic INT8; FP16 export |
| Knowledge distillation | Ultralytics KD trainer with model.loss wrap (α=0.5, T=4, AdamW, cosine LR); pairs: YOLOv5su→YOLOv5nu, YOLOv8s→YOLOv8n |
| Export | Ultralytics ONNX export (opset 17, simplify=True), TFLite INT8 (imgsz 320) |
| Edge runtime | TFLite runtime / ONNX Runtime on Mobile Phone; ORT CPU benchmark (4 threads) as RPi 4 proxy |
| Development | Python 3.12.13, Google Colab, Jupyter Notebook |
4. Data Collection & Dataset Preparation
Data source: Custom dataset — nighttime video collected by the project team during vehicle drive-bys across three localities in Bengaluru, Karnataka, India.
Collection routes and distances:
| Area | Route Distance |
|---|---|
| IISc Campus | 4,564.4 m |
| Mathikere | 3,679.3 m |
| Yesvantpur | 8,313.9 m |
| Total | 16,557.7 m (~16.6 km) |
The dataset preparation followed a specific downsampling and annotation pipeline:
Frame Rate: Video data was originally captured at 30fps.
Annotation Sampling: To create the training set, frames were annotated at a reduced rate of 2fps.
Tools: Manual labeling and bounding box creation were performed using LabelImg.
Final Dataset: The resulting processed dataset consists of ~ 2.1k frames ready for model training.
Dataset characteristics:
- Classes: 2 —
ON_Streetlight(functioning lamp),OFF_Streetlight(faulty / unlit lamp) - Class imbalance: The OFF class is significantly underrepresented (~11% of instances), reflecting real-world fault rates
- Training/validation/test split: ~70% / 20% / 10%
- Image resolution: 640×640 (resized during training); exported at 320×320 for edge deployment
Imaging conditions: All footage is nighttime, low ambient light, with motion blur from vehicle movement and handheld capture. These conditions make detection significantly harder than standard benchmark datasets.
Preprocessing pipeline (applied before training):
- CLAHE (Contrast Limited Adaptive Histogram Equalisation, clip=2.0, tile=8×8) restores local contrast suppressed by low ambient light
- Gamma correction (γ=1.4) brightens underexposed regions without saturating lamp blobs
- Unsharp mask (strength=1.3, σ=3) recovers edges softened by motion blur
Training augmentations (Albumentations):
- Random rotation ±20°, horizontal flip (p=0.5)
- Random brightness/contrast ±30% (p=0.6)
- Gaussian blur (kernel 3–5, p=0.3)
- CLAHE (p=0.4), random gamma 80–120 (p=0.4)
- Mosaic (1.0), random erasing (0.3), HSV value shift (0.4)
Labelling: Bounding box annotations created manually; each frame is tagged with class ID, centre coordinates, and box dimensions in YOLO format.
5. Model Design, Training & Evaluation
Architecture overview:
All YOLO models follow the standard one-stage detection paradigm: a CNN backbone extracts multi-scale feature maps, a neck (Feature Pyramid Network) fuses them, and detection heads predict bounding boxes and class probabilities at three scales. Models differ in depth and width multipliers, giving a range of parameter counts and computational costs.
Models trained and their specifications:
| Model | Params | GFLOPs | Layers | Weight Size |
|---|---|---|---|---|
| YOLOv5nu | 2,503,334 | 7.1 | 85 | 5.3 MB |
| YOLOv5su | 9,112,310 | 23.8 | 85 | 18.5 MB |
| YOLOv8n | 3,006,038 | 8.1 | 73 | 6.2 MB |
| YOLOv8s | 11,125,971 | 28.4 | 225 | 22.5 MB |
Training configuration (YOLO models):
| Parameter | Value |
|---|---|
| Epochs | 100 |
| Image size | 640×640 |
| Batch size | 16 |
| Learning rate (lr0) | 0.01 |
| Momentum | 0.9 |
| Weight decay | 0.0005 |
| LR schedule | Cosine annealing |
| Bounding box loss | CIoU |
| Objectness/class loss | Binary cross-entropy |
| Hardware | Tesla T4 GPU (Google Colab) |
Evaluation results (validation set):
| Model | Precision | Recall | mAP50 | F1 | Inference (GPU) |
|---|---|---|---|---|---|
| YOLOv5nu | 0.833 | 0.826 | 0.886 | 0.829 | |
| YOLOv5su | 0.856 | 0.809 | 0.862 | 0.832 | |
| YOLOv8n | 0.833 | 0.803 | 0.877 | 0.817 | |
| YOLOv8s | 0.858 | 0.808 | 0.878 | 0.832 |
Key finding: YOLOv5nu achieves the best balance of accuracy and speed, highest mAP50 (0.886), smallest parameter count (2.5M), and fastest GPU inference (3.2ms).
6. Model Compression & Efficiency Metrics
Four compression strategies were applied to produce edge-ready models, targeting constrained compute and memory budgets on mobile and Raspberry Pi 4 class hardware. The pipeline corrects several methodological issues present in earlier versions (dynamic PyTorch INT8 on Conv2d is a no-op; RPi FPS now measured via ORT CPU benchmarking, not a formula estimate; KD loss is injected by wrapping model.loss rather than dead-code trainer overrides; the detection head is excluded from pruning to prevent mAP collapse).
Strategy 1 — Post-Training Quantization: FP16 and ONNX Runtime Dynamic INT8
Two honest, deployable quantization paths are used:
- FP16 (half-precision): The FP32 PyTorch model is exported to ONNX with half-precision weights. Provides ~2× size reduction with negligible accuracy loss on GPU/ARM hardware with native FP16 support.
- ONNX Runtime Dynamic INT8: The ONNX model is quantized to INT8 using
onnxruntime.quantization.quantize_dynamicwithQuantType.QInt8. Unlike PyTorch’storch.quantization.quantize_dynamic, ORT dynamic INT8 operates correctly on Conv2d layers, making it the valid post-training INT8 path for YOLO backbones. Provides ~4× size reduction. - Applied to: all four YOLO variants (YOLOv5nu, YOLOv5su, YOLOv8n, YOLOv8s)
Strategy 2 — One-Shot Magnitude Pruning
L1 unstructured pruning removes the lowest-magnitude weights from Conv2d layers in a single pass. Applied to all four YOLO variants at sparsity levels of 20%, 30%, and 50%. The final prune_skip_last_n_convs=6 Conv2d layers (the Detect head’s 1×1 convs) are excluded — pruning these produces large mAP drops for negligible size gain. Pruned models report both raw file size and gzipped file size; the gzip size is the realistic deployable size, as zero runs compress effectively.
- Applied to: YOLOv5nu, YOLOv5su, YOLOv8n, YOLOv8s
- Sparsity levels: 20%, 30%, 50% of backbone Conv2d weights zeroed
- Reported size: raw
.ptfile size and gzip-compressed size (FileSizeMB_gz)
Strategy 3 — Iterative Magnitude Pruning
Progressive pruning in five steps (10% → 20% → 30% → 40% → 50%) with 5-epoch fine-tuning between each step. Crucially, prune masks are kept attached during fine-tuning — an earlier version removed masks between steps, allowing zeroed weights to regrow and invalidating the sparsity budget. The detection head is excluded here as well.
- Applied to: all four YOLO variants
- Steps: 5 rounds × 5 fine-tuning epochs
- Final sparsity: ~50% backbone Conv2d weights removed; masks held in place throughout fine-tuning
Strategy 4 — Knowledge Distillation
A larger teacher model supervises a smaller student model via a weighted loss combination. The KD loss is injected by wrapping model.loss on the student — an earlier approach that overrode train_step was dead code never called by the Ultralytics BaseTrainer. Two teacher→student pairs are trained:
| Pair | Teacher | Student |
|---|---|---|
| v5s→v5n | YOLOv5su | YOLOv5nu |
| v8s→v8n | YOLOv8s | YOLOv8n |
Loss = α · L_KD + (1−α) · L_task, α=0.5, Temperature T=4
- Optimizer: AdamW (lr=1e-3, weight decay=1e-4)
- Schedule: Cosine annealing, patience=5 early stopping
- Goal: Transfer knowledge from the larger model to a model with fewer parameters at no extra data cost
7. Model Deployment & On-Device Performance
Deployment steps:
- Select models for deployment: YOLOv8n (best mAP50-95) was flagged for Mobile Phone export
- Export YOLO models:
model.export(format="onnx", imgsz=320, simplify=True, opset=17) model.export(format="tflite", imgsz=320, int8=True) - Quantize ONNX for INT8: Apply ORT dynamic INT8 quantization (
quantize_dynamic,QuantType.QInt8) to the exported ONNX file - Transfer to Mobile phone and use web interface: Copy .onnx files; run ONNX file in web insterface
- Run inference: Feed preprocessed (CLAHE → gamma → unsharp) frames through the runtime at 320×320 resolution; parse bounding box outputs and render detections
GPU performance (GPU reference — Tesla T4):
| Model | Preprocess | Inference | Postprocess | Total per frame |
|---|---|---|---|---|
| YOLOv5nu | 1.8 ms | 3.2 ms | 2.2 ms | ~7.2 ms |
| YOLOv5su | 1.8 ms | 4.9 ms | 2.0 ms | ~8.7 ms |
| YOLOv5mu | 1.8 ms | 11.0 ms | 2.4 ms | ~15.2 ms |
| YOLOv8n | 1.8 ms | 3.4 ms | 2.0 ms | ~7.2 ms |
Real-time behaviour: The deployed system processes each incoming frame, draws bounding boxes labelled ON_Streetlight or OFF_Streetlight with confidence scores, and logs detections to a timestamped GPS CSV for post-route fault mapping.
8. Conclusions & Limitations
Key outcomes:
- A custom nighttime streetlight detection dataset was successfully collected across ~16.6 km of Bengaluru roads, covering IISc Campus, Mathikere, and Yeshwantpur
- YOLOv5nu achieved the best overall efficiency profile: highest mAP50 (0.886) at the smallest parameter count (2.5M, 5.3 MB), with GPU inference of 3.2 ms/frame.
-
YOLOv8n was competitive (mAP50=0.877) with similar speed and slightly better mAP50-95 (0.458 vs 0.438)
-
Quantization (ORT dynamic INT8 and FP16), pruning (one-shot Magnitude)
- RPi 4 inference speed is measured via real ORT CPU benchmarks (4 threads pinned) rather than a formula estimate, giving realistic deployable performance figures
Limitations:
- Class imbalance: Very less OFF_Streetlight instances in the validation set; model robustness on diverse fault types (partial occlusion, dimming, intermittent faults) is not fully validated
- Single city, single season: Data was collected in Bengaluru under dry-season nighttime conditions; performance under rain, fog, or different lamp architectures is unknown
9. Future Work
-
Dataset expansion: Collect additional OFF_Streetlight examples across rain, fog, and different lamp types to improve class balance and generalisation
-
Fault severity grading: Extend the label set beyond binary ON/OFF to include partial fault, flickering, and dim lamp classes for more actionable maintenance prioritisation
-
Fleet-scale deployment: Integrate the system into existing municipal maintenance vehicles (BBMP, BESCOM fleet) to survey the full city lighting network on routine patrol routes
10. Challenges & Mitigation
| Challenge | Description | Mitigation |
|---|---|---|
| Low-light image quality | Nighttime frames are noisy, motion-blurred, and underexposed, making features difficult to learn | Applied CLAHE + gamma correction + unsharp masking as a fixed preprocessing pipeline before both training and inference |
| Severe class imbalance | OFF_Streetlight instances are ~8× less frequent than ON, risking the model ignoring faulty lamps | Mosaic and random erasing augmentations increase exposure to rare class instances; model still learned high recall for OFF class |
| Edge export resolution trade-off | Full 640×640 inference is too slow for Mobile Phones; reducing to 320×320 risks missing small distant lamps | Exported all edge models at 320×320; validated that mAP remains acceptable at reduced resolution before committing |
| GPU-to-edge latency gap | Tesla T4 inference times (3–11 ms) are not representative of Raspberry Pi performance | RPi 4 speed is now proxied by real ORT CPU benchmarking with 4 threads pinned (mimicking RPi 4 cores) |
| Data collection logistics | Nighttime driving routes required planned sorties across three localities with consistent camera placement | Used a fixed dashboard mount and pre-planned OSM-exported routes to ensure route reproducibility and coverage |
11. References
-
Kumar, S., Deshpande, A., Ho, S.S., Ku, J.S., & Sarma, S.E. (2016). Urban Street Lighting Infrastructure Monitoring Using a Mobile Sensor Platform. IEEE Sensors Journal, 16(12), 4981–4994. https://doi.org/10.1109/JSEN.2016.2552249
-
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640.
-
Ultralytics. (2024). YOLOv5 and YOLOv8 Documentation. https://docs.ultralytics.com
-
Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-Excitation Networks. Proceedings of CVPR, 7132–7141.
-
Howard, A. et al. (2019). Searching for MobileNetV3. arXiv:1905.02244.
-
Jacob, B. et al. (2018). Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. Proceedings of CVPR.
-
Han, S., Pool, J., Tran, J., & Dally, W. (2015). Learning both Weights and Connections for Efficient Neural Networks. NeurIPS.
-
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531.
-
TensorFlow Lite. (2024). Post-Training Quantization. https://www.tensorflow.org/lite/performance/post_training_quantization
-
Buslaev, A. et al. (2020). Albumentations: Fast and Flexible Image Augmentations. Information, 11(2), 125.
-
OpenStreetMap contributors. (2024). Bengaluru road network data. https://www.openstreetmap.org