Understanding Robustness of Visual State Space Models for Image Classification

Visual State Space Model (VMamba) has recently emerged as a promisingarchitecture, exhibiting remarkable performance in various computer visiontasks. However, its robustness has not yet been thoroughly studied. In thispaper, we delve into the robustness of this architecture through comprehensiveinvestigations from multiple perspectives. Firstly, we investigate itsrobustness to adversarial attacks, employing both whole-image andpatch-specific adversarial attacks. Results demonstrate superior adversarialrobustness compared to Transformer architectures while revealing scalabilityweaknesses. Secondly, the general robustness of VMamba is assessed againstdiverse scenarios, including natural adversarial examples, out-of-distributiondata, and common corruptions. VMamba exhibits exceptional generalizability without-of-distribution data but shows scalability weaknesses against naturaladversarial examples and common corruptions. Additionally, we explore VMamba’sgradients and back-propagation during white-box attacks, uncovering uniquevulnerabilities and defensive capabilities of its novel components. Lastly, thesensitivity of VMamba to image structure variations is examined, highlightingvulnerabilities associated with the distribution of disturbance areas andspatial information, with increased susceptibility closer to the image center.Through these comprehensive studies, we contribute to a deeper understanding ofVMamba’s robustness, providing valuable insights for refining and advancing thecapabilities of deep neural networks in computer vision applications.

Further reading