This paper dives into the cutting-edge world of road asset detection on Thai highways, showcasing a novel approach that combines an upgraded REG model with Generalized Focal Loss. Our focus is on identifying key road elements—like pavilions, pedestrian bridges, information and warning signs, and concrete guardrails—to boost road safety and infrastructure management. While deep learning methods have shown promise, traditional models often struggle with accuracy in tricky conditions, such as cluttered backgrounds and variable lighting. To tackle these issues, we’ve integrated REG with Generalized Focal Loss, enhancing its ability to detect road assets with greater precision. Our results are impressive, the REGx model led the way with a mAP50 of 80.340, mAP50-95 of 60.840, precision of 79.100, recall of 76.680, and an F1-score of 77.870. These findings highlight the REGx model’s superior performance, demonstrating the power of advanced deep learning techniques to improve highway safety and infrastructure maintenance, even in challenging conditions.
I am thrilled to share that our paper titled “Enhanced REG-Based Object Detection of Road Assets Utilizing Generalized Focal Loss: A Case Study on Thai Highway Imagery” has been accepted for an oral presentation at the 5th International Conference on Highway Engineering (iCHE 2024)! After a long absence from international conferences since my Ph.D. studies, I’m incredibly excited to rejoin the academic community in person and present our latest research.
Dive into the complete details of our research on road asset detection in Thai highways with advanced vision models. Check out the full blog post here: REG: Refined Generalized Focal Loss for Road Asset Detection
Hi guys, fellow tech enthusiasts! I’m thrilled to unveil a cutting-edge innovation from my latest research—Refined Generalized Focal Loss (REG). This revolutionary approach is transforming road asset detection on Thai highways, and it’s as exciting as it sounds.
So, what’s the big deal with REG? Imagine a detection system that not only sees but truly understands the intricate details of highway scenes. REG pushes the boundaries of current vision-based detection models by tackling the most challenging issues: imbalanced datasets, tiny objects, and complex highway backdrops.
My method (check out Fig. 1) brings a whole new level of precision to the table. By integrating a custom loss function into the detection architecture, REG doesn’t just improve performance—it redefines it. This means sharper, more reliable detection of critical road assets like signs, lane markings, and barriers. And let’s be real, that’s a game-changer for infrastructure management and road safety.
Fig. 1. The REG-based detection framework employs Generalized Focal Loss to master class imbalance in Thai highway road asset detection. Combining Transformer layers with convolutional modules, and using Batch Normalization and Adaptive Dropout, this model stands out for its robustness. It’s finely tuned to capture the unique aspects of Thai highways, focusing on rare and challenging assets. [Refined Generalized Focal Loss]
REG isn’t just a theoretical leap; it’s a practical breakthrough with real-world impact. It’s especially useful for regions with road structures similar to Thai highways, where conventional detection algorithms might falter. By merging Vision Transformers (ViT) with conditional random fields (CRF), we’ve supercharged the model’s ability to segment and identify road assets with pinpoint accuracy.
Thailand’s highway infrastructure plays a critical role in its economic development and connectivity. However, managing and maintaining these extensive road networks presents numerous challenges, particularly in detecting and assessing road assets. Accurate identification of road features such as signs, barriers, and markings is essential for effective maintenance and safety management.
Our proposed method (Fig. 2) enhances the capability of REG-based object detection systems by incorporating a tailored loss function designed to address the unique characteristics of Thai highway imagery. By optimizing the detection process, our approach aims to provide more reliable and precise data for road asset management. This advancement not only contributes to the field of highway engineering but also supports the development of more efficient infrastructure management practices in Thailand.
Fig. 2. The proposed Enhanced REG-based object detection framework integrates Generalized Focal Loss for improved detection accuracy. This approach includes various REG model variants, ranging from REGn to REGx, each offering a balance between computational efficiency and detection performance. The network architecture leverages convolutional layers with Batch Normalization and Leaky ReLU activations. The Generalized Focal Loss, designed to address class imbalance, enhances performance for small and difficult-to-detect objects by focusing on hard examples. Our contribution didn’t just stop at the models; we also built our own dataset from scratch. By equipping a vehicle with high-resolution cameras, we captured detailed imagery of road assets across Thai highways. This custom dataset forms the backbone of our approach, providing a strong foundation for model training. The training utilizes the AdamW optimizer with specific hyperparameters to optimize convergence and model performance. [REG: Refined Generalized Focal Loss]
This paper represents a significant step forward in applying cutting-edge computer vision techniques to real-world problems. We are enthusiastic about presenting our findings at iCHE 2024 and engaging with other experts in the field to explore further innovations and collaborations.
Stay tuned for updates, and a big thank you to my incredible research team:
N. Rattanachona (N’Fuse), P. Thungthin (N’Dear), N. Subsompon (N’Tien). Your hard work and dedication were essential to this project!
Here I am, presenting our work on the Enhanced REG model and its application in detecting road assets!
We have visualizations of the detection results produced by the Enhanced REG model. The bounding boxes and labels demonstrate the model’s ability to accurately locate and classify objects. These visuals reflect the high-resolution output and the model’s performance in detecting road assets in various environments. The clarity of these results illustrates the practical utility of our model in real-time applications. It effectively showcases how our model handles complex and dynamic scenes.
The detection task focuses on identifying seven key classes of road assets: Pavilions, Pedestrian bridges, Information signs, Single-arm poles, Bus stops, Warning signs, and Concrete guardrails (Fig. 3). The challenge lies in dealing with class imbalance — smaller and harder-to-detect objects can be easily overlooked by traditional object detection models. We address this by utilizing Generalized Focal Loss (GFL), which extends the classical Focal Loss to multi-class detection, giving more focus to underrepresented and challenging classes.
Fig. 3. My proposed Generalized Focal Loss for multi-class detection tackles class imbalance across seven asset classes. By extending Focal Loss, we improve detection accuracy for small and difficult-to-classify objects.
For the segmentation task, we detect road assets at the pixel level, focusing on five classes: Pavilions, Pedestrian bridges, Information signs, Warning signs, and Concrete guardrails (Fig. 4). The key here is to ensure that every pixel is correctly classified into one of these categories, which is a non-trivial problem in cluttered highway imagery. My Refined Generalized Focal Loss applies pixel-wise loss calculation, extending GFL into the realm of segmentation.
Fig. 4. The segmentation process classifies each pixel into one of five road asset classes, using Refined Generalized Focal Loss to enhance pixel-wise accuracy in segmentation tasks.
Now, let’s look at a real-world application of our Enhanced REG model in detecting road assets. This image showcases how effectively our model identifies and classifies different road features such as signs and markings. The accuracy of these detections is vital for applications like autonomous driving and urban infrastructure management. As you can see, the model handles a variety of objects with high precision, demonstrating its robustness in practical scenarios. This performance underscores the model’s potential for real-world deployment.
This chart presents a comparison of performance metrics between our Enhanced REG model and previous versions. We observe significant improvements in precision, recall, and F1-score. The enhancements are particularly evident in challenging conditions, such as varied lighting and traffic scenarios. These metrics highlight the effectiveness of our model’s enhancements. By achieving superior results, our approach sets a new benchmark in object detection accuracy.
Finally, this image illustrates the training process for the Enhanced REG model. It depicts the stages of optimization and fine-tuning, with various datasets and augmentation techniques used to enhance the model’s performance. The iterative process shown here is crucial for achieving the high accuracy demonstrated in our results. Observing these training phases provides insights into how we refined the model. This rigorous approach is key to ensuring the model’s effectiveness and reliability in practical applications.
To further enhance learning, we introduce a spatial-contextual refinement term $(g_{i,c})$ that dynamically adjusts the loss based on the geometric and contextual significance of each object class (Fig. 5). This term allows the model to account for the spatial distribution of road assets, making it more adept at handling complex scenes typical of real-world road environments.
Fig. 5. The refinement term \(g_{i,c}\) adjusts the loss based on spatial and contextual relevance, improving model learning in complex and cluttered road scenes.
We then integrate the detection and segmentation tasks into a joint optimization framework. By combining the losses for both tasks (Fig. 6), the model learns complementary representations, allowing it to improve both object detection and pixel-wise segmentation accuracy. This joint approach ensures that the model balances precision and recall across different road asset classes.
Fig. 6. Joint optimization balances detection and segmentation losses, enhancing performance across both tasks by learning complementary features.
To further refine REG, we incorporated prediction uncertainty using a Gaussian distribution (Fig. 7). This technique accounts for the inherent noise and ambiguity in complex environments, particularly under varying lighting and cluttered backgrounds, thereby improving both robustness and accuracy.
Fig. 7. We model prediction uncertainty using a Gaussian distribution to handle noise and ambiguity, particularly in challenging road scenes.
The optimization of REG is based on advanced techniques in stochastic optimization, where we extend traditional gradient descent to operate on Riemannian Manifolds (Fig. 8). Given the non-convex nature of the loss landscape, we utilize variational inference, proximal gradient methods, and Lagrangian multipliers, allowing for efficient optimization in multi-task learning.
Fig. 8. Advanced mathematical techniques, including Riemannian stochastic gradient descent, underpin the optimization of REG in complex, high-dimensional spaces.
Finally, we tested the model’s performance on both detection (Fig. 9) and segmentation tasks (Fig. 10). REG demonstrated significant improvements in mAP50, F1-score, and other key metrics, showcasing its capability to handle both high-overlap detection and detailed mask segmentation.
Fig. 9. REG outperforms other models in detection tasks, especially in high-overlap scenarios, with superior mAP50 and F1 scores.
Fig. 10. The segmentation performance of REG shows exceptional accuracy in generating precise masks, particularly in challenging environments.
this work introduces Refined Generalized Focal Loss (REG), which significantly improves the detection and segmentation of road assets in complex environments. By applying advanced mathematical techniques and integrating spatial-contextual learning, REG addresses the challenges of class imbalance and localization in highway asset detection. The mathematical insights behind this model, including optimization on Riemannian manifolds and probabilistic refinement, provide a robust framework for future improvements in vision-based infrastructure management systems.
For those interested in exploring the full mathematical derivation and code, please check out the REG: Refined Generalized Focal Loss on GitHub.
Imagine you’re driving along a bustling Thai highway, surrounded by a landscape dotted with various road assets. These assets include everything from pavilions providing shade and rest areas, pedestrian bridges allowing safe crossing, and information signs guiding motorists, to single-arm poles supporting traffic signals, bus stops, warning signs alerting drivers of upcoming hazards, and concrete guardrails safeguarding the road’s edge. Each of these elements plays a critical role in ensuring the safety and efficiency of the highway system.
To manage and maintain these assets effectively, automated systems are employed to detect and segment these features from images captured along the highway. This process involves two main tasks: detection and segmentation.
Detection Tasks:
In detection, the goal is to identify and locate these assets within images. For the Thai highways, there are seven specific classes of road assets to detect:
Segmentation Tasks:
Segmentation takes this a step further by assigning a specific class label to each pixel in the image, providing a detailed map of where each type of asset is located. For the Thai highways, the segmentation focuses on five classes:
1. Detection:
Picture an advanced AI system analyzing highway images. It scans each image to detect the seven classes of road assets. Using bounding boxes, the system outlines each asset’s location, distinguishing between the pavilions providing shade and the concrete guardrails ensuring safety. This detection process helps in cataloging and managing each asset efficiently.
2. Segmentation:
Moving to segmentation, the AI system processes the same images to create a detailed pixel-level map. Each pixel in the image is classified into one of the five categories, such as pavilions, pedestrian bridges, and warning signs. This precise classification allows for a thorough understanding of where each asset is situated, helping with tasks like maintenance scheduling and safety assessments.
This dual approach—detection and segmentation—ensures that every asset along the Thai highways is accurately identified and mapped. For instance, knowing the exact location of warning signs can help in assessing their visibility and effectiveness. Similarly, detailed segmentation of concrete guardrails aids in monitoring their condition and integrity.
Our research addresses a critical issue in road safety: detecting key road assets such as pedestrian bridges, pavilions, signs, and concrete guardrails. We implemented an enhanced REG model integrated with Generalized Focal Loss, which significantly improves detection accuracy, especially in complex environments with diverse lighting and backgrounds.
In computer vision, both object detection and semantic segmentation are crucial tasks that leverage different approaches and final layer architectures in deep learning models. This document provides an in-depth technical overview of Generalized Focal Loss applied to both tasks, and a detailed comparison of the final layers used in each.
Generalized Focal Loss (GFL) is designed to address class imbalance and focus learning on hard-to-detect objects by adjusting the standard focal loss. This approach is applicable to both detection and segmentation tasks but is formulated slightly differently for each.
Objective: In object detection, GFL helps to improve the accuracy of detecting objects and managing class imbalance by focusing on harder-to-detect objects.
Mathematical Formula:
For detection tasks involving multiple classes (e.g., Pavilions, Pedestrian Bridges, etc.), the Generalized Focal Loss is given by:
$$ \mathcal{L}_{\text{GFL}}^{\text{Detection}} = - \alpha \left(1 - p_t\right)^\gamma \log(p_t) $$
Where:
For detection, consider a scenario where we need to locate a Pavilion on a highway. The Generalized Focal Loss helps reduce the loss contribution from easily detected Pavilions—those that are in clear view—and shifts the model’s focus to harder cases, like Pavilions that may be partially obscured by other objects or in poor lighting. By emphasizing these challenging examples, the model improves its overall performance on diverse highway scenes.
For segmentation, imagine the task of segmenting an Information Sign pixel by pixel. Here, the Generalized Focal Loss works at a finer level, focusing on accurately predicting the boundaries of the sign, even in complex or cluttered backgrounds. The model learns to pay more attention to pixels where it’s less confident, which results in sharper and more accurate segmentation outcomes.
This dual application of the Generalized Focal Loss—both for bounding box detection and for pixel-level segmentation—enables our model to excel in both tasks, effectively handling the complexities of road asset management in real-world highway conditions.
In the images, we’re showcasing a progression of deep learning techniques. Starting with (a) as the original input and (b) as the expected target output, we then move through different versions of REG—(c) REGn, (d) REGs, (e) REGm, (f) REGl, and (g) REGx. Now, the key point to note is that (f) and (g) highlight our proposed enhancement, where we’ve integrated a refined Generalized Focal Loss into YOLO. What’s impressive here is that you’ll see it clearly outperforms the other methods, especially in both detection (bounding boxes) and segmentation (pixel-based).
The first image focuses on detection, showing the bounding box results. Meanwhile, the second image dives deeper into instance segmentation, illustrating pixel-level accuracy.
So, let’s break it down. In the first image, you’ll see how each version of REG handles object detection by drawing bounding boxes around the identified objects. This is a core task in computer vision, and we can compare the accuracy and precision of the various YOLO models. With our enhanced method using the refined Generalized Focal Loss, which we’ve integrated into REGl and REGx, you’ll notice a significant improvement in the clarity and correctness of the bounding boxes. These results indicate that our approach performs better at accurately locating objects in the images.
Next, in the second image, the focus shifts to instance segmentation, where instead of just detecting objects with boxes, we’re identifying the exact pixel regions for each object. This is a more complex task that requires higher precision. Here again, our enhanced REG models stand out. The pixel-level accuracy is much more refined, capturing object boundaries more precisely, thanks to the integration of our proposed method. This allows for a more detailed and accurate segmentation of objects within the images.
To summarize, our proposed enhancements to the REG model—through the integration of refined Generalized Focal Loss—deliver significant improvements in both object detection and instance segmentation. The results across both images clearly demonstrate that our approach excels at accurately detecting and precisely segmenting objects. Whether it’s drawing clean bounding boxes or defining exact pixel regions, our method proves to be the clear winner. This shows that refining loss functions can have a big impact on model performance, pushing the boundaries of what’s possible with deep learning in computer vision.
In our paper, ‘Refined Generalized Focal Loss for Road Asset Detection on Thai Highways Using Vision-Based Detection and Segmentation Models’, we explore advancements in object detection and segmentation models tailored for detecting road assets on Thai highways. These assets include a variety of elements crucial for road safety and efficiency.
Generalized Focal Loss (GFL) is an enhancement over traditional focal loss, which aims to address class imbalance by focusing more on hard-to-detect objects. It introduces a dynamic focal weight that is adaptive to different classes, improving detection performance in complex scenarios.
The Generalized Focal Loss is formulated as: $[ \text{GFL}_{\text{det}} = - \frac{1 - \text{p}_i^{\gamma}}{1 - \text{p}_i} \cdot \text{log}(\text{p}_i) ]$ where $\text{p}_i$ is the predicted probability for the $i$-th class, and $\gamma$ is the focusing parameter.
For segmentation tasks, GFL adapts by focusing on pixel-wise predictions, enhancing the model’s ability to handle imbalanced data and challenging regions within the images.
The primary difference in the loss functions for detection and segmentation tasks is how they handle spatial versus class-level data. Detection models often deal with bounding boxes and class predictions, while segmentation models handle pixel-wise classification.
Optimization in object detection and segmentation models involves tuning hyperparameters and adjusting learning rates to improve convergence and performance.
Understanding the following formulas is crucial for implementing and refining GFL in detection and segmentation tasks:
Softmax Function: $[ \text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}} ]$ where $z_i$ is the score for class $i$.
Cross-Entropy Loss: $[ \text{CrossEntropy}(y, \hat{y}) = - \sum_{i} y_i \log(\hat{y}_i) ]$ where $y_i$ is the ground truth and $\hat{y}_i$ is the predicted probability.
Dice Coefficient: $[ \text{Dice} = \frac{2 |A \cap B|}{|A| + |B|} ]$ where $A$ and $B$ are the predicted and true segmentation masks.
Our paper will undergo a fast-track formal review process for potential publication in the Transportmetrica A journal. We’re optimistic that this research will significantly contribute to highway engineering and road asset management fields.
I’m genuinely excited to share our findings at iCHE 2024 and connect with the incredible minds in the field. I hope our research sparks inspiration in others, pushing the boundaries of what’s possible. It would be truly rewarding if our work motivates even one person to contribute to something extraordinary in the world. Research is not just about discovering new things—it’s about igniting ideas, fostering collaboration, and collectively making a positive impact. Here’s to all the future breakthroughs, and may this be just the beginning of many more amazing contributions ahead!
Panboonyuen, Teerapong. (Sep 2024). Refined Generalized Focal Loss for Road Asset Detection on Thai Highways Using Vision Models. Blog post on Kao Panboonyuen. https://kaopanboonyuen.github.io/blog/2024-09-07-refined-generalized-focal-loss-for-road-asset-detection-on-thai-highways-using-vision-models/
For a BibTeX citation:
@article{panboonyuen2024refinedfocal,
title = "Refined Generalized Focal Loss for Road Asset Detection on Thai Highways Using Vision Models.",
author = "Panboonyuen, Teerapong",
journal = "kaopanboonyuen.github.io/",
year = "2024",
month = "Sep",
url = "https://kaopanboonyuen.github.io/blog/2024-09-07-refined-generalized-focal-loss-for-road-asset-detection-on-thai-highways-using-vision-models/"}