πΎ Bottle Detection Pipeline: Real-Time Damage Classification System
Team: Garvit, Shreemay, Naina.
Code: GitHub Repository
This project implements a real-time, on-device bottle damage detection and classification system using YOLOv8 for bottle localization and a ResNet50V2 TFLite model for damage classification on the Raspberry Pi 6.
Download Dataset : https://indianinstituteofscience-my.sharepoint.com/:f:/g/personal/garvitsingh_iisc_ac_in/IgBZL7K_GBsFTKEEFdSSHR2iATd8kEWolyq1uzfXkRLRkwc?e=uzx9Id
π₯ Problem Statement
Packaging quality control is a critical step in FMCG manufacturing. Damaged bottles reaching consumers lead to product spoilage, customer complaints, and brand damage. Current automated inspection solutions require:
- Expensive industrial camera rigs
- Cloud API dependencies and proprietary software licences
- Vendor lock-in with per-unit pricing
This project addresses that gap with a fully open-source, offline Edge AI pipeline that can be deployed on low-cost hardware β enabling quality control for small and mid-scale producers who previously had no viable option.
π― Project Objectives
The main objective is to develop a production-grade bottle inspection module that:
- Detects all bottles in a frame using a real-time object detector
- Classifies each detected bottle as
damagedornon_damagedusing a fine-tuned CNN - Provides immediate feedback via annotated frames (green = ok, red = damaged) and a JSON result log
- Runs entirely on-device on a Raspberry Pi, with no cloud dependency
The prototype demonstrates an end-to-end Edge pipeline:
Dataset preparation β Transfer learning (ResNet50V2) β QAT quantization β TFLite export β On-device deployment on RPi6
π§ Hardware & Software Used
Hardware Required
- π§ Raspberry Pi 6 (ARM64, 8 GB LPDDR5)
- π· Pi Camera Module (or USB webcam for camera mode)
- π USB power supply / battery bank for portable deployment
Software & Tools Used
- π Python 3 with
ultralytics,tflite-runtime,opencv-python-headless,numpy,pillow - π₯ PyTorch (ARM CPU build) β for running YOLOv8 inference via Ultralytics
- π¦ TensorFlow Lite β lightweight inference engine for the ResNet50V2 classifier
- π§ͺ TensorFlow / Keras β model training, evaluation, and TFLite export (run on Colab/Kaggle)
- π¬ TensorFlow Model Optimization Toolkit β for Quantization Aware Training (QAT)
- π Matplotlib / Seaborn β training curve and confusion matrix plots
- π’ scikit-learn β
classification_report,confusion_matrix,compute_class_weight - π₯οΈ Google Colab / Kaggle β cloud GPU environment for training
ποΈ Pipeline Architecture
The pipeline uses a two-stage architecture (see PPT Slide 2 β Pipeline Overview):
| Stage | Tool | Role |
|---|---|---|
| 1 β Detection | YOLOv8 | Localize all bottles in the frame with configurable confidence threshold (default 0.10) |
| 1 β NMS Filter | Custom containment NMS | Remove large wrapper boxes that contain smaller individual bottle boxes |
| Bridge | OpenCV | Crop each bottle with 20 px padding, preserving context for the classifier |
| 2 β Classify | ResNet50V2 TFLite | Predict damaged / non_damaged per crop; softmax probabilities logged |
| 2 β Log Output | JSON | Save annotated image (green = ok, red = damaged), per-bottle crop files, structured JSON |
Why two stages? YOLO excels at robust, fast spatial detection across variable bottle counts and positions. ResNet provides richer texture-level feature extraction per crop β better suited to detecting subtle surface damage than using the YOLO classification head alone.
Custom Containment NMS (remove_overlapping_boxes)
Standard IoU-based NMS fails in two real scenarios this pipeline encounters. The custom NMS is containment-based and handles both (see PPT Slide 3 for the logic diagram):
- Duplicate box around a single bottle β YOLO draws a tight box and a slightly larger wrapper box around the same bottle
- Large wrapper box spanning multiple bottles β YOLO draws one big box around a group of 2+ bottles alongside correct individual boxes
Logic: Boxes are sorted by area (largest first). A large box is removed if it contains 1 or more smaller boxes with β₯ 80% overlap (containment_threshold=0.80, iou_threshold=0.30). Only the tightest individual-bottle boxes are forwarded to the classifier.
π§ Classifier: ResNet50V2 Training (resnet50.ipynb)
The damage classifier is trained in a Jupyter notebook on Colab/Kaggle using a two-phase transfer learning approach. The notebook follows a 14-step pipeline from data loading to TFLite export.
Dataset & Split
-
Classes: damagednon_damaged - Split: 70% train / 15% validation / 15% test
- Splits are deterministic via
SEED=42β files sorted before shuffle to guarantee reproducibility across runs - A data leakage check is performed post-split:
train_files.intersection(test_files)must return zero overlap - Class counts verified at startup: any missing class folder prints an error before training begins
Augmentation Pipeline
Training images are augmented on-the-fly using a tf.keras.Sequential augmentation block applied only during training (training=True):
| Transform | Parameter | Purpose |
|---|---|---|
RandomFlip |
horizontal | Mirror-invariance |
RandomRotation |
Β±10Β° (0.10) | Slight tilt tolerance |
RandomZoom |
10% | Scale variation |
RandomBrightness |
20% | Lighting variation |
RandomContrast |
15% | Exposure variation |
GaussianNoise |
Ο = 0.05 | Simulate sensor noise |
ResNet50V2 preprocessing (preprocess_input) is applied after augmentation, scaling pixel values from [0, 255] to [-1, 1] as required by the backbone. All three splits use AUTOTUNE prefetching for pipeline efficiency.
Figure 1: Augmented Training Samples
augmented_samples.pngβ 3Γ3 grid of augmented training images, saved during Step 5 ofresnet50.ipynb. Shows the combined visual effect of RandomFlip, RandomRotation, RandomZoom, RandomBrightness, RandomContrast, and GaussianNoise on real bottle crops.
OUTPUT_DIR/augmented_samples.png
Model Architecture
- Backbone: ResNet50V2 pretrained on ImageNet (
include_top=False,weights='imagenet') - Input: 224 Γ 224 Γ 3 RGB
Classification head:
GlobalAveragePooling2D
BatchNormalization
Dense(256, activation='relu', kernel_regularizer=L2(1e-4))
Dropout(0.57)
Dense(128, activation='relu', kernel_regularizer=L2(1e-4))
Dropout(0.30)
Dense(2, activation='softmax') β damaged | non_damaged
Total trainable parameters: classification head only in Phase 1; head + top 30 ResNet layers in Phase 2. See PPT Slide 4 for the inference flow diagram.
Training: Two Phases
Phase 1 β Frozen base (up to 15 epochs)
The ResNet50V2 backbone is fully frozen (base_model.trainable = False). Only the classification head trains.
- Optimizer:
Adam(lr=1e-3) - Loss:
CategoricalCrossentropy(label_smoothing=0.1) - Class weights via
compute_class_weight('balanced')β handles imbalanced damaged/non-damaged counts - Callbacks:
EarlyStopping(monitor='val_accuracy', patience=5, restore_best_weights=True),ReduceLROnPlateau(factor=0.5, patience=3, min_lr=1e-6),ModelCheckpoint(save_best_only=True)
Phase 2 β Fine-tuning top 30 layers (up to 20 epochs)
The top 30 layers of the backbone are unfrozen (layer.trainable = True for base_model.layers[-30:]).
- Optimizer:
Adam(lr=5e-6)β deliberately low to avoid destroying pretrained weights - Same loss and callbacks;
EarlyStopping(patience=7)gives more patience as improvement is slower - Best weights restored automatically at end of training
Figure 2: Training Curves (Phase 1 + Phase 2)
training_curves.pngβ 3-panel figure saved during Step 9 ofresnet50.ipynbat 150 DPI. Panel 1: Train vs. Val accuracy across all epochs. Panel 2: Train vs. Val loss across all epochs. Panel 3: Train Precision and Recall across all epochs. A vertical dashed line marks the Phase 1 β Phase 2 boundary (fine-tune start). Title: βResNet50V2 β Training Curvesβ.
OUTPUT_DIR/training_curves.png
Figure 3: Confusion Matrix
confusion_matrix.pngβ Seaborn heatmap saved during Step 10 ofresnet50.ipynb. Test-set confusion matrix with raw counts, axes: true label (y) vs. predicted label (x). Classes:damaged,non_damaged. Title: βConfusion Matrix β Test Set (ResNet50V2)β.
OUTPUT_DIR/confusion_matrix.png
Quantization
Two quantized variants are exported for edge deployment:
Post-Training INT8 Quantization (Step 11)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen # 300 calibration batches
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.float32 # float I/O avoids mismatch on RPi
converter.inference_output_type = tf.float32
Quantization Aware Training β QAT (Step 12)
The classification head is wrapped with tfmot.quantization.keras.quantize_model and the full model re-trained for up to 30 epochs at lr=1e-5 with EarlyStopping(patience=5). QAT simulates quantization noise during training, yielding better accuracy at INT8 precision than post-training quantization. On RPi6, QAT also runs faster (~252 ms) than the post-training INT8 model (~277 ms) because XNNPACKβs FP32 SIMD kernels outperform its INT8 path on the ARM Cortex-A76 β QAT is the recommended deployment format.
A FP32 TFLite model is also exported (no optimizations) for debugging and accuracy comparison.
Figure 4: QAT Training Curves
qat_training_curves.pngβ 2-panel figure saved during the QAT training cell ofresnet50.ipynb. Panel 1: QAT training vs. validation accuracy across all QAT epochs. Panel 2: QAT training vs. validation loss across all QAT epochs.
OUTPUT_DIR/qat_training_curves.png
Output Files
| File | Step | Description |
|---|---|---|
bottle_classifier_resnet50v2_fp32.keras |
8 | Full FP32 Keras model, best Phase 2 weights |
best_phase1.keras |
7 | Best checkpoint from Phase 1 (frozen base) |
best_phase2.keras |
8 | Best checkpoint from Phase 2 (fine-tuned) |
bottle_classifier_resnet50v2_int8.tflite |
11 | INT8 post-training quantized TFLite (~22 MB) |
bottle_classifier_resnet50v2_fp32.tflite |
11 | FP32 TFLite (no quantization, for debug/comparison) |
bottle_classifier_resnet50v2_qat.tflite |
12 | QAT TFLite β recommended for deployment |
labels.txt |
13 | Class label list (damaged, non_damaged) |
training_curves.png |
9 | Figure 2: Accuracy / Loss / Precision+Recall (all epochs) |
confusion_matrix.png |
10 | Figure 3: Test set confusion matrix |
augmented_samples.png |
5 | Figure 1: 3Γ3 grid of augmented training samples |
qat_training_curves.png |
12 | Figure 4: QAT accuracy and loss curves |
π Performance Benchmarks (Raspberry Pi 6, ARM64)
All benchmarks run on RPi6 ARM64 with XNNPACK delegate. Confidence threshold: 0.10.
Figure 5: Timing Breakdown Chart (best.pt baseline)
PPT Slide 7 β Bar chart benchmarked on RPi6 using the original
best.ptFP32 YOLO model. Six bars: YOLO 1 bottle (1550 ms), YOLO 2 bottles (885 ms), ResNet Γ1 (280 ms), ResNet Γ2 (505 ms), Total 1 bottle (2016 ms), Total 2 bottles (1559 ms). Note: βIncludes model load on first run. Subsequent calls are faster.β
Inference Timing
| Configuration | YOLO | ResNet/bottle | Total (1 bottle) | Total (2 bottles) |
|---|---|---|---|---|
FP32 (best.pt) |
~1550 ms | ~252 ms | ~2016 ms | ~1559 ms |
QAT (yolov8n.pt) |
~308 ms | ~252 ms | ~2076 ms | ~1351 ms |
INT8 (yolov8n.pt) |
~308 ms | ~277 ms | ~2245 ms | ~1424 ms |
Figure 6: QAT Model Results β 5Γ YOLO Speedup
PPT Slide 12 β Shows the before/after YOLO banner:
best.pt~1550 ms βyolov8n.pt~308 ms (5Γ FASTER). Two benchmark runs: Run 01 (7 bottles, YOLO 307.6 ms, ResNet QAT ~252 ms, total 2076 ms) and Run 02 (4 bottles, YOLO 310.9 ms, ResNet QAT ~260 ms, total 1351 ms). Key win note: switching the detection model frombest.pttoyolov8n.ptachieved the speedup with no accuracy configuration change.
Figure 7: INT8 Model Results
PPT Slide 13 β Three-column comparison header: FP32 best.pt (YOLO ~1550 ms, total 2016 ms), QAT yolov8n (YOLO ~308 ms, total 2076 ms), INT8 yolov8n (YOLO ~308 ms, total 2245 ms β slower than QAT). Same two benchmark runs below. Observation panel explains the counterintuitive result: XNNPACK on RPi6βs ARM Cortex-A76 does not accelerate INT8 convolutions as efficiently as FP32. QAT keeps weights in float32 at runtime β XNNPACKβs highly-tuned FP32 SIMD kernels outperform its INT8 path on this SoC. INT8 wins on memory bandwidth, not compute throughput.
Memory Usage (RPi6, 8 GB LPDDR5)
| Component | RAM |
|---|---|
| YOLOv8n model weights | ~12 MB |
| ResNet50V2 TFLite INT8 | ~25 MB |
| OpenCV frame buffer | ~50 MB |
| Python runtime + libs | ~200 MB |
| XNNPACK delegate cache | ~30 MB |
| OS + idle processes | ~348 MB |
| Pipeline peak RSS | ~665 MB (8.1% of 8 GB) |
Figure 8: Memory Usage Profile
PPT Slide 15 β Horizontal utilization bar showing pipeline RSS (~665 MB, 8.1%) vs. total 8 GB with ~7.4 GB free. Right panel breaks down RAM by component. Headroom section projects V2 expansion: multi-camera Γ4 feeds (~2.2 GB total), 640Γ480 resolution input (+80 MB), Edge TPU runtime overhead (+50 MB), remaining free after V2 (>5 GB). Note: βOnly 8.1% of available RAM consumed. The pipeline can scale to 10+ simultaneous models, higher-resolution inputs, or multi-camera feeds without requiring a hardware upgrade.β
π Usage Modes
The pipeline supports three operating modes via the --mode flag. Entry point: bottle_pipeline.py. See PPT Slide 6.
Image Mode
Single-shot inspection. Best for validation and debugging.
python bottle_pipeline.py \
--mode image \
--input photo.jpg \
--confidence 0.10 \
--output results/
Outputs: Annotated image with green (ok) / red (damaged) bounding boxes and {class}: {confidence} labels, per-bottle crop files, JSON result log.
Folder Mode
Bulk processing. Logs a summary across all images at the end.
python bottle_pipeline.py \
--mode folder \
--input images/ \
--output batch_results/
Outputs: Batch JSON summary, damage rate across all images, avg processing time per image. Supports .jpg, .jpeg, .png, .bmp, .webp.
Camera Mode β Photo Booth Workflow
Designed for offline QC inspection stations. Runs a repeating 3-phase loop β avoids the RPi CPU being continuously maxed between captures.
python bottle_pipeline.py \
--mode camera \
--camera 0 \
--no-display # for headless RPi deployments
Loop phases (repeating):
- Live preview + 5s countdown β shows a live feed with βCapturing in: Nβ overlay so the operator can position bottles
- Capture & process β freezes the last frame, runs YOLO + ResNet, draws annotated results. Camera buffer set to 1 frame to minimise stale-frame lag.
- Display results for 10s β shows annotated frame with per-bottle labels; operator reads result before next cycle begins
Outputs: Annotated result per capture, running session totals (total bottles / total damaged / damage rate). Keys: q = quit, s = save current annotated frame to disk. Session summary printed to terminal and saved as JSON on exit.
Figure 9: Live Detection Output
PPT Slide 11 β Real inference output on 4 bottles (Bisleri and similar PET bottles). Pipeline correctly identifies 2 as Not Damaged(61%, 68%) and 2 asDamaged(68%, 72%). All 4 have individual bounding boxes. Summary overlay top-left: *βTotal: 4Damaged: 2 Non Damaged: 2β*. Boxes appear green for both classes in this screenshot β in the actual runtime, damaged bottles get red boxes.
π§± Data Structures
Results are returned as typed Python @dataclass objects for type safety and easy JSON serialization. See PPT Slide 5.
BottleDetection β one detection + classification result per bottle:
@dataclass
class BottleDetection:
bottle_id: int
bbox: Tuple[int, int, int, int] # x1, y1, x2, y2
detection_confidence: float
damage_class: str # 'damaged' | 'non_damaged'
damage_confidence: float
crop_path: Optional[str] # path to saved crop file
PipelineResult β full image result, serializable via json.dump(asdict(result)):
@dataclass
class PipelineResult:
image_path: str
timestamp: str
total_bottles: int
damaged_count: int
not_damaged_count: int
detection_time_ms: float
classification_time_ms: float
total_time_ms: float
detections: List[BottleDetection]
JSON Output Sample (folder mode, full summary):
{
"summary": {
"total_images": 12,
"total_bottles": 48,
"total_damaged": 11,
"total_not_damaged": 37,
"damage_rate": "22.9%"
},
"results": [
{
"total_bottles": 3,
"damaged_count": 1,
"total_time_ms": 245.3,
"detections": [...]
}
]
}
Crop files are saved as {image_name}_bottle_{id}_{class}.jpg.
βοΈ Raspberry Pi Setup
Run setup_rpi5.sh for automated 7-step installation. See PPT Slide 8.
# 1. System update
sudo apt update && sudo apt upgrade -y
# 2. System dependencies
apt install python3-pip python3-venv libopencv-dev libatlas-base-dev cmake git
# 3. Virtual environment
python3 -m venv ~/bottle_pipeline_venv
# 4. TFLite runtime
pip install tflite-runtime
# 5. PyTorch ARM CPU build
pip install torch torchvision --index-url .../whl/cpu
# 6. Ultralytics YOLO
pip install ultralytics
# 7. Other dependencies
pip install numpy opencv-python-headless pillow
Required file layout:
bottle_pipeline/
βββ bottle_pipeline.py
βββ setup_rpi5.sh
βββ requirements.txt
βββ models/
βββ yolov8s.pt β CLI default; swap for yolov8n.pt for speed
βββ *.tflite
βββ labels.txt
Transfer models from PC:
scp yolov8n.pt pi@raspberrypi:~/bottle_pipeline/models/
scp *.tflite pi@raspberrypi:~/bottle_pipeline/models/
Activate and run:
source ~/bottle_pipeline_venv/bin/activate
cd ~/bottle_pipeline
# CLI default is yolov8s.pt β pass yolov8n.pt explicitly (all benchmarks use yolov8n)
python bottle_pipeline.py --mode camera --yolo-model models/yolov8n.pt
π Troubleshooting
See PPT Slide 9 for the full fault tree.
| Error | Fix |
|---|---|
No module named tflite_runtime |
pip install tflite-runtime. Use tflite-runtime on RPi, not full TensorFlow. Fallback: import tensorflow as tf |
Could not open camera |
ls /dev/video* && raspi-config β verify /dev/video* exists and enable camera interface for Pi Camera Module |
YOLO model loading error |
pip install torch torchvision --index-url .../cpu β ensure PyTorch ARM CPU build; download index must specify /whl/cpu |
| Out of memory (OOM) | Use --yolo-model yolov8n.pt β YOLOv8n uses far less RAM than YOLOv8s; process images one-at-a-time in folder mode |
| Camera feed lags / stale frames | Camera mode sets CAP_PROP_BUFFERSIZE=1 β if lag persists, add a cap.read() discard call before the capture step |
Speed Tuning
| Knob | Recommendation |
|---|---|
| YOLO model size | yolov8n.pt > yolov8s.pt β 5Γ faster on RPi6 |
| Resolution | 320Γ240 vs 640Γ480 β ~2Γ faster; camera mode defaults to 640Γ480 |
| Quantization | QAT TFLite is the fastest and most accurate option on RPi6 |
| Confidence threshold | Higher threshold β fewer crops forwarded β faster overall |
| Batch size | Process 1 image at a time on RPi (pipeline already does this) |
πΊοΈ V2 Roadmap: Scaling to Real-Time Conveyor Speeds
See PPT Slide 14.
| Phase | Status | Achievements / Plan |
|---|---|---|
| Phase 1 β INT8 Quantization | β Done | ResNet50V2 INT8v3 TFLite deployed; QAT validated at ~252 ms/bottle (QAT), ~277 ms/bottle (INT8); XNNPACK delegate enabled |
| Phase 2 β YOLOv8n Nano Switch | β Done | Switched best.pt β yolov8n.pt; YOLO 1550 ms β 308 ms (5Γ speedup measured on RPi6); conf=0.10 accuracy within acceptable range |
| Phase 3 β Edge TPU Integration | π΅ Planned | Coral USB Accelerator or Dev Board; INT8 model compiled for Edge TPU; target <50 ms/bottle end-to-end; eliminates CPU bottleneck entirely |
| Phase 4 β Conveyor Integration | π΅ Planned | GPIO-triggered capture via sensor; Pi Camera 3 (rolling shutter fix); continuous stream zero-buffer mode; alert output β PLC / reject actuator |
Current V1: ~2β3.5 s/tray (batch processing, best.pt FP32, RPi CPU only β suitable for offline QC inspection workflow). Target V2: ~50 ms/bottle on continuous conveyor stream with industrial-grade zero missed-defect guarantee.
π₯ Team
- Bottle Pipeline Team β 2026
For questions, feedback, or collaboration, please open an issue in this repository.