A robust, detector-agnostic multi-task framework that unifies detection, oriented bounding boxes, and instance segmentation for automated, generalizable rodent behavioral monitoring.
Continuous, non-invasive behavioral monitoring of rodents is fundamental to neuroscience and pharmacological phenotyping, yet existing vision pipelines struggle to generalize across illumination changes, perspective distortion, cage geometry, and dense-occlusion housing conditions.
DeepRodent addresses this with a single multi-scale feature backbone feeding four task-specific heads — axis-aligned detection, rotation-aware oriented bounding boxes for curled or rotated animals, pixel-level instance segmentation under occlusion, and a temporal behavioral embedding — trained jointly and converted, via a post-processing aggregation engine, into trajectory tracking, behavioral-state classification, and spatial occupancy heatmaps.
Plugging DeepRodent's prediction heads into any YOLOv8–YOLO12 backbone yields a consistent +2.6 to +3.1 mAP improvement while holding real-time inference speed suitable for continuous monitoring.
| Method | Backbone | Precision | Recall | mAP₅₀ | mAP₅₀₋₉₅ | FPS |
|---|---|---|---|---|---|---|
| YOLOv8-Seg | Nano | 91.7 | 89.6 | 92.8 | 73.9 | 188 |
| YOLO11-Seg | Small | 93.5 | 92.1 | 94.2 | 77.4 | 161 |
| YOLO12-Seg | Small | 93.8 | 92.5 | 94.4 | 78.2 | 156 |
| DeepRodent (Ours) | YOLO Family | 95.4 | 94.1 | 96.2 | 84.6 | 154 |
Full cross-environment generalization, ablation, and SOTA comparison tables are reported in the paper.
A multi-scale feature integration backbone (CSP-style blocks with scale-aware softmax fusion) produces one shared representation per frame, which is decoded by four heads under a single joint objective.
Axis-aligned bounding boxes for fast, cage-wide localization of every animal in frame.
Rotation-aware localization for curled, rearing, or arbitrarily rotated rodents, regressed with a Gaussian-Wasserstein rotated-IoU loss.
Pixel-level instance masks that hold up under high-density occlusion between animals.
A behavioral embedding feeding trajectory tracking, state classification, and occupancy heatmaps.
Combining focal segmentation loss, IoU box loss, rotated-IoU regression, KL-divergence regularization, uncertainty-guided reweighting, and a cross-domain feature-moment matching term for generalization across laboratory settings. See docs/ARCHITECTURE.md for the equation-by-equation mapping to code.
DeepRodent expects the standard YOLO-style polygon segmentation layout. No data on hand? Generate a synthetic set to smoke-test the pipeline end-to-end.
Produces trajectory arrays, an occupancy heatmap, and per-frame behavioral-state tags.
All ablations are reported across 3 random seeds with the multi-seed averaging protocol in Evaluator.multi_seed_summary.
DeepRodent is intended solely as an assistive research framework and is not designed to replace expert veterinary oversight or certified behavioral assessment by trained experimental biologists.
The underlying study used a private, non-invasive laboratory video dataset (secondary analysis of recorded observation clips only); no housing conditions were altered and no invasive procedures were performed for the purpose of data collection. All animal care and handling from the primary data source were conducted under approved IACUC protocols, in accordance with the ARRIVE guidelines and the 3Rs principles (Replacement, Reduction, Refinement).
DeepRodent should be treated as a decision-support tool requiring expert oversight, continual monitoring, and multi-center validation prior to broader deployment in experimental biology workflows.
@article{panboonyuen2026deeprodent,
title = {DeepRodent: A Robust and Generalizable Vision Framework for Automated Rodent Monitoring in Experimental Biology},
author = {Panboonyuen, Teerapong},
year = {2026},
url = {https://github.com/kaopanboonyuen/DeepRodent}
}