SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds
Captured source
source ↗SIMA 2: A Gemini-Powered AI Agent for 3D Virtual Worlds — Google DeepMind Skip to main content
November 13, 2025 Research SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds SIMA Team
Share
Last year, we introduced SIMA (Scalable Instructable Multiworld Agent), a generalist AI that could follow basic instructions across a wide range of virtual environments. SIMA was a crucial first step in teaching AI to translate language into meaningful action in rich, 3D worlds. Today we’re introducing SIMA 2, the next milestone in our research creating general and helpful AI agents. By integrating the advanced capabilities of our Gemini models , SIMA is evolving from an instruction-follower into an interactive gaming companion. Not only can SIMA 2 follow human-language instructions in virtual worlds, it can now also think about its goals, converse with users, and improve itself over time. This is a significant step in the direction of Artificial General Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general.
Reasoning Generalization Self-Improvement Next steps Responsibility
The Power of Reasoning The first version of SIMA learned to perform over 600 language-following skills, like “turn left,” “climb the ladder,” and “open the map,” across a diverse set of commercial video games. It operated in these environments as a person might, by “looking” at the screen and using a virtual keyboard and mouse to navigate, without access to the underlying game mechanics. With SIMA 2, we’ve moved beyond instruction-following. By embedding a Gemini model as the agent's core, SIMA 2 can do more than just respond to instructions, it can think and reason about them.
Your browser does not support the video tag. Your browser does not support the video tag.
MineDojo: SIMA 1 (left) attempts to follow the instruction while SIMA 2 (right) successfully completes the task in a game it has never seen before.
Your browser does not support the video tag. Your browser does not support the video tag.
ASKA: SIMA 1 (left) attempts to follow the instruction “Find a campfire” while SIMA 2 (right) successfully completes the task in a game it has never seen before.
SIMA 2’s new architecture integrates Gemini’s powerful reasoning abilities to help it understand a user’s high-level goal, perform complex reasoning in pursuit, and skillfully execute goal-oriented actions within games. We trained SIMA 2 using a mixture of human demonstration videos with language labels as well as Gemini-generated labels. As a result, SIMA 2 can now describe to the user what it intends to do and detail the steps it's taking to accomplish its goals.
Slide 1 of 3
Your browser does not support the video tag. Your browser does not support the video tag.
Moving beyond simple instruction following: SIMA 2 can answer the user’s questions and also reasons about its own behavior as well as its environment.
Your browser does not support the video tag. Your browser does not support the video tag.
Moving beyond simple instruction following: SIMA 2 can answer the user’s questions and also reasons about its own behavior as well as its environment.
Your browser does not support the video tag. Your browser does not support the video tag.
Moving beyond simple instruction following: SIMA 2 can answer the user’s questions and also reasons about its own behavior as well as its environment.
In testing, we have found that interacting with the agent feels less like giving it commands and more like collaborating with a companion who can reason about the task at hand. And thanks to our collaboration with our existing and new game partners (see, Acknowledgements), we have been able to train and evaluate SIMA 2 on a wider array of games. This is the power of Gemini brought to embodied AI: a world-class reasoning engine that can now perceive, understand, and take action in complex, interactive 3D environments.
Slide 1 of 4
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user's intent.
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user's intent.
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user's intent.
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 interprets abstract concepts and logical commands by reasoning about its environment and the user's intent.
A Leap in Generalization Performance The addition of Gemini has also led to improved generalization and reliability. SIMA 2 can now understand more complex and nuanced instructions than its predecessor and is far more successful at carrying them out, particularly in situations or games on which it’s never been trained, such as the new Viking survival game, ASKA, or MineDojo - a research implementation of the popular open-world sandbox game, Minecraft.
SIMA 2 can understand and accomplish long and complex tasks
Slide 1 of 4
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 is successful at carrying out long and complex instructions.
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 tackles a completely new game with no prior training, demonstrating impressive progress.
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 is successful at carrying out long and complex instructions.
Your browser does not support the video tag. Your browser does not support the video tag.
SIMA 2 is successful at carrying out long and complex instructions.
SIMA 2 understands multimodal prompts
Slide 1 of 3
Your browser does not support the video tag. Your browser does not support the video tag.
User is drawing a sketch on the screen.
Your browser does not support the video tag. Your browser does not support the video tag.
User is drawing a sketch on the screen.
Your browser does not support the video tag. Your browser does not support the video tag.
User is drawing a sketch on the screen.
SIMA 2 can understand…
Excerpt shown — open the source for the full document.
Notability
notability 8.0/10Major lab research post, high HN traction