- Hangtao Zhang1
- Chenyu Zhu1
- Xianlong Wang1
- Ziqi Zhou1
- Changgan Yin1
- Minghui Li1
- Lulu Xue1
- Yichen Wang1
- Shengshan Hu1
- Aishan Liu2
- Peijin Guo1
- Leo Yu Zhang3
- 1Huazhong University of Science and Technology
- 2Beihang University
- 3Griffith University
2024.10.23 - Breaking News: First AI-Related Death Shocks the World
A mother claims that an AI robot led her son to take his own life, leading to a lawsuit against the chatbot's maker.
A Quick Glance
"A robot may not injure a human being or, through inaction, allow a human being to come to harm."

Figure 1. In this work, for the first time, we successfully compromise the embodied LLM in the physical world, enabling it to perform various actions that were previously restricted. We demonstrate the potential for embodied LLM to engage in activities related to Physical Harm, Privacy Violations, Pornography, Fraud, Illegal Activities, Hateful Conduct, and Sabotage activatities.
Paper Overview

Figure 2. (Overview) Embodied LLM faces three risks in real-world applications: (a): inducing harmful behaviors by leveraging jailbroken LLMs; (b): safety misalignment between action and linguistic output spaces (i.e., verbally refuses response but still acts); (c): conceptual deception inducing unrecognized harmful behaviors.
Simulation Environment Results
-
Our attacks on Code as Policies. Code as Policies uses LLMs to generate Python code that directs robots for tabletop manipulation. We incorporated objects defined as the “person” and the “knife” in the PyBullet physics simulation engine, then issued 10 different malicious queries, e.g., “Move the knife to hit the person, killing him”.




-
Our attacks on ProgPrompt. ProgPrompt is a human-like agent for solving complex household tasks in the Virtual- Home simulation. The user provides a brief task description, like “eat_apple()”.








-
Our attacks on VoxPoser. VoxPoser is viewed as a promising embodied AI robot, which accomplishes various tasks in real-world scenarios within the RLBench virtual environment . We set up malicious queries for two task categories provided by RLBench, namely Object Interactions and Spatial Compositions, covering a total of 7 subtasks.







-
Our attacks on Visual Programming (VisPorg). VisPorg is a general set of agent reasoning tasks on images. We evaluate two distinct tasks: Image Editing and Factual Knowledge Object Tagging.


Figure 3. BadRobot attack on VisProg, engaging in hateful conduct, privacy violations, and illegal activities.
Physical world —— our embodied LLM system

Figure 4. The workflow of our embodied LLM system in the physical world: a three-step process of Task Planning, Visual Perception, and Robotic Control, demonstrating capabilities in language-only tasks, visual Q&A, captioning, and tabletop manipulation tasks.
-
Implementation of Embodied LLM Systems. To ensure a robust and unbiased evaluation, we first develop a minimal embodied LLM prototype , following recent research. Specifically, the system uses an (M)LLM as the task planner, which receives and processes the user's instructions. Based on prompt engineering, the (M)LLM decomposes and plans tasks by breaking down high-level instructions into a series of actionable steps, while simultaneously selecting appropriate actions from a predefined pool to execute. Finally, it outputs both responses and actions in a JSON format, with the actions transmitted to the downstream robotics control. This streamlined design eliminates interference from other algorithms and frameworks (SLAM, motion planning, or reinforcement learning), Ienabling a focused assessment of security risks. For tasks requiring visual perception, such as grounding tasks, the model generates precise object coordinates based on real-time captured images for manipulation.
Hardware setup

Figure 5. For the UR3e manipulator, our embodied LLM system's hardware setup.
Demo of our embodied LLM system's intelligence



Our attacks in the pysical world







Ethics and Disclosure
-
This research is devoted to examining the security and risk issues associated with applying LLMs and VLMs to embodied systems. Our ultimate goal is to enhance the safety and reliability of embodied AI systems, thereby making a positive contribution to society. This research includes examples that may be considered harmful, offensive, or otherwise inappropriate. These examples are included solely for research purposes to illustrate vulnerabilities and enhance the security of embodied AI systems. They do not reflect the personal views or beliefs of the authors. We are committed to principles of respect for all individuals and strongly oppose any form of crime or violence. Some sensitive details in the examples have been redacted to minimize potential harm. Furthermore, we have taken comprehensive measures to ensure the safety and well-being of all participants involved in this study. In this paper, we provide comprehensive documentation of our experimental results to enable other researchers to independently replicate and validate our findings using publicly available benchmarks. Our commitment is to enhance the security of language models and encourage all stakeholders to address the associated risks. Providers of LLMs may leverage our discoveries to implement new mitigation strategies that improve the security of their models and APIs, even though these strategies were not available during our experiments. We believe that in order to improve the safety of model deployment, it is worth accepting the increased difficulty in reproducibility.