MSC-Bench: Benchmarking and Analyzing Multi-Sensor Corruption for Driving Perception

Xiaoshuai Hao1      Guanqun Liu2      Yuting Zhao3      Yuheng Ji3
Mengchuan Wei4      Haimei Zhao5      Lingdong Kong6      Rong Yin7      Yu Liu8

1Beijing Academy of Artificial Intelligence    2IQIYI    3Institute of Automation, CAS    4Beijing Samsung Telecom R&D Center    5The University of Sydney    6National University of Singapore    7Institute of Information Engineering, CAS    8Hefei University of Technology
pipeline image

This work introduces Multi-Sensor Corruption Benchmark (MSC-Bench), the first comprehensive benchmark aimed at evaluating the robustness of multi-sensor autonomous driving perception models against various sensor corruptions.

Abstract

Multi-sensor fusion models play a crucial role in autonomous driving perception, particularly in tasks like 3D object detection and HD map construction. These models provide essential and comprehensive static environmental information for autonomous driving systems. While camera-LiDAR fusion methods have shown promising results by integrating data from both modalities, they often depend on complete sensor inputs. This reliance can lead to low robustness and potential failures when sensors are corrupted or missing, raising significant safety concerns. To tackle this challenge, we introduce the Multi-Sensor Corruption Benchmark (MSC-Bench), the first comprehensive benchmark aimed at evaluating the robustness of multi-sensor autonomous driving perception models against various sensor corruptions. Our benchmark includes 16 combinations of corruption types that disrupt both camera and LiDAR inputs, either individually or concurrently. Extensive evaluations of six 3D object detection models and four HD map construction models reveal substantial performance degradation under adverse weather conditions and sensor failures, underscoring critical safety issues. The benchmark toolkit and affiliated code and model checkpoints have been made publicly accessible.


Benchmark Definition

pipeline image

Overview of the MSC-Bench. Definitions of the multi-sensor corruptions in MSC-Bench. Our benchmark encompasses a total of 16 corruption types for multi-modal perception models, which can be categorized into weather, interior, and sensor failure scenarios.


Benchmark Study

pipeline image

Benchmarking 3D object detection models. We report detailed information on the methods grouped by input modality, backbone, and input image size. “L” and “C” represent Lidar and Camera, respectively. ‘Swin-T”, “R50”, “VOV-99”, and “SEC” are short for Swin-Transformer, ResNet50, VovNet, and Second. We report NuScenes detection score (NDS) and mean average precision (MAP) on the official NuScenes validation set.

pipeline image

Benchmarking HD map constructors. We report detailed information on the methods grouped by input modality, BEV encoder, backbone, and training epochs. “L” and “C” represent Lidar and Camera, respectively. “Effi-B0,” “R50,” “PP,” and “SEC” refer to EfficientNet-B0, ResNet50, PointPillars, and Second. AP denotes performance on the clean NuScenes val set. The subscripts b., p., and d. denote boundary, pedestrian crossing, and divider, respectively.


Sensor Corruptions

       


        Fog


        Snow


        Motion Blur


        Spatial Misalignment


        Temporal Misalignment


        Camera Crash


        Frame Lost


        Cross Sensor


        Cross Talk


        Incom Echo


        Crash & Sensor


        Crash & Talk


        Crash & Echo


        Frame & Sensor


        Frame & Talk


        Frame & Echo

Robustness Evaluation

pipeline image

Robustness benchmark of state-of-the-art multi-modal methods under multi-sensor corruptions. For the 3D object detection task, we use NDS as the metric. Additionally, we use MAP as the metric for the HD map construction task.


pipeline image

Robustness against all corruption types and severity levels in 3D object detection tasks is evaluated through the Resilience Score (RS), calculated using the NDS score for varying severity levels.


pipeline image

Robustness against all corruption types and severity levels in 3D object detection tasks is evaluated through the Resilience Score (RS), calculated using the NDS score for varying severity levels.


pipeline image

Relative robustness visualization. Relative Resilience Score (RRS) computed with NDS using BEVFusion as baseline.


pipeline image

Relative robustness visualization. Relative Resilience Score (RRS) computed with mAP using MapTR as the baseline.


License

The datasets and benchmarks are under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License