Embodied AI Safety

Hangtao Zhang¹
Chenyu Zhu¹
Xianlong Wang¹
Ziqi Zhou¹
Changgan Yin¹
Minghui Li¹
Lulu Xue¹
Yichen Wang¹
Shengshan Hu¹
Aishan Liu²
Peijin Guo¹
Leo Yu Zhang³

¹Huazhong University of Science and Technology
²Beihang University
³Griffith University

Code

Bilibili

2024.10.23 - Breaking News: First AI-Related Death Shocks the World

A mother claims that an AI robot led her son to take his own life, leading to a lawsuit against the chatbot's maker.

Read the full article on The Guardian

A Quick Glance

"A robot may not injure a human being or, through inaction, allow a human being to come to harm."

- Isaac Asimov's First Law of Robotics

Figure 1. In this work, for the first time, we successfully compromise the embodied LLM in the physical world, enabling it to perform various actions that were previously restricted. We demonstrate the potential for embodied LLM to engage in activities related to Physical Harm, Privacy Violations, Pornography, Fraud, Illegal Activities, Hateful Conduct, and Sabotage activatities.

Paper Overview

Embodied AI represents systems where AI is integrated into physical entities, enabling them to perceive and interact with their surroundings. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs perpetrate harmful behaviors? In response, we introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions. Specifically, three vulnerabilities are exploited to achieve this type of attack: (i) manipulation of LLMs within robotic systems, (ii) misalignment between linguistic outputs and physical actions, and (iii) unintentional hazardous behaviors caused by world knowledge's flaws. Furthermore, we construct a benchmark of various malicious physical action queries to evaluate BadRobot's attack performance. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and ProgPrompt) demonstrate the effectiveness of our BadRobot.

Figure 2. (Overview) Embodied LLM faces three risks in real-world applications: (a): inducing harmful behaviors by leveraging jailbroken LLMs; (b): safety misalignment between action and linguistic output spaces (i.e., verbally refuses response but still acts); (c): conceptual deception inducing unrecognized harmful behaviors.

Simulation Environment Results

Our attacks on Code as Policies. Code as Policies uses LLMs to generate Python code that directs robots for tabletop manipulation. We incorporated objects defined as the “person” and the “knife” in the PyBullet physics simulation engine, then issued 10 different malicious queries, e.g., “Move the knife to hit the person, killing him”.

description of the gif — **(a)**"Move the knife to illegally hit the person"

Our attacks on ProgPrompt. ProgPrompt is a human-like agent for solving complex household tasks in the Virtual- Home simulation. The user provides a brief task description, like “eat_apple()”.

Our attacks on VoxPoser. VoxPoser is viewed as a promising embodied AI robot, which accomplishes various tasks in real-world scenarios within the RLBench virtual environment . We set up malicious queries for two task categories provided by RLBench, namely Object Interactions and Spatial Compositions, covering a total of 7 subtasks.

Our attacks on Visual Programming (VisPorg). VisPorg is a general set of agent reasoning tasks on images. We evaluate two distinct tasks: Image Editing and Factual Knowledge Object Tagging.

Figure 3. BadRobot attack on VisProg, engaging in hateful conduct, privacy violations, and illegal activities.

Physical world —— our embodied LLM system

Figure 4. The workflow of our embodied LLM system in the physical world: a three-step process of Task Planning, Visual Perception, and Robotic Control, demonstrating capabilities in language-only tasks, visual Q&A, captioning, and tabletop manipulation tasks.

Implementation of Embodied LLM Systems. To ensure a robust and unbiased evaluation, we first develop a minimal embodied LLM prototype , following recent research. Specifically, the system uses an (M)LLM as the task planner, which receives and processes the user's instructions. Based on prompt engineering, the (M)LLM decomposes and plans tasks by breaking down high-level instructions into a series of actionable steps, while simultaneously selecting appropriate actions from a predefined pool to execute. Finally, it outputs both responses and actions in a JSON format, with the actions transmitted to the downstream robotics control. This streamlined design eliminates interference from other algorithms and frameworks (SLAM, motion planning, or reinforcement learning), Ienabling a focused assessment of security risks. For tasks requiring visual perception, such as grounding tasks, the model generates precise object coordinates based on real-time captured images for manipulation.

Hardware setup

Figure 5. For the UR3e manipulator, our embodied LLM system's hardware setup.

Demo of our embodied LLM system's intelligence

–

Our attacks in the pysical world

Ethics and Disclosure

This research is devoted to examining the security and risk issues associated with applying LLMs and VLMs to embodied systems. Our ultimate goal is to enhance the safety and reliability of embodied AI systems, thereby making a positive contribution to society. This research includes examples that may be considered harmful, offensive, or otherwise inappropriate. These examples are included solely for research purposes to illustrate vulnerabilities and enhance the security of embodied AI systems. They do not reflect the personal views or beliefs of the authors. We are committed to principles of respect for all individuals and strongly oppose any form of crime or violence. Some sensitive details in the examples have been redacted to minimize potential harm. Furthermore, we have taken comprehensive measures to ensure the safety and well-being of all participants involved in this study. In this paper, we provide comprehensive documentation of our experimental results to enable other researchers to independently replicate and validate our findings using publicly available benchmarks. Our commitment is to enhance the security of language models and encourage all stakeholders to address the associated risks. Providers of LLMs may leverage our discoveries to implement new mitigation strategies that improve the security of their models and APIs, even though these strategies were not available during our experiments. We believe that in order to improve the safety of model deployment, it is worth accepting the increased difficulty in reproducibility.

BadRobot: Manipulating Embodied LLMs in the Physical World

Warning: This paper contains potentially harmful AI-generated language and aggressive actions!