Google DeepMind just launched Genie 3, a new foundation world model designed to train general-purpose AI agents. This model pushes DeepMind closer to artificial general intelligence (AGI).
Genie 3 can generate interactive 3D environments in real time at 720p and 24fps — a huge leap from Genie 2’s 10 to 20 seconds. It produces both photorealistic and imaginary worlds from simple text prompts, with “promptable world events” that allow changes to the environment based on instructions.
The model remembers what it previously generated, keeping simulations physically consistent over time. DeepMind says this memory wasn’t directly programmed and is crucial for teaching agents physics and real-world dynamics.
Shlomi Fruchter, DeepMind research director, explained the key feature:
“Genie 3 is the first real-time interactive general purpose world model.”
“It goes beyond narrow world models that existed before. It’s not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between.”
“The model is auto-regressive, meaning it generates one frame at a time.”
“It has to look back at what was generated before to decide what’s going to happen next. That’s a key part of the architecture.”
Genie 3 builds on its predecessor Genie 2 and DeepMind’s Veo 3 video generation model, which understands physics without hardcoded rules.
DeepMind showcased Genie 3 working with its generalist AI agent SIMA in a warehouse scene. The agent received goals — like “approach the bright green trash compactor” — and succeeded using Genie 3’s consistent simulations.
Jack Parker-Holder, research scientist, said:
“We think world models are key on the path to AGI, specifically for embodied agents, where simulating real world scenarios is particularly challenging.”
“In all three cases, the SIMA agent is able to achieve the goal.”
“It just receives the actions from the agent. So the agent takes the goal, sees the world simulated around it, and then takes the actions in the world. Genie 3 simulates forward, and the fact that it’s able to achieve it is because Genie 3 remains consistent.”
“We haven’t really had a Move 37 moment for embodied agents yet, where they can actually take novel actions in the real world.”
“But now, we can potentially usher in a new era.”
Genie 3 still has limits. Its physics aren’t perfect—snow in a skiing demo didn’t behave realistically. Agents’ actions and interactions are limited, and simulation length maxes out at a few minutes, far short of ideal training times.
Despite that, Genie 3 represents a major upgrade in creating AI agents that can learn through trial, error, exploration, and planning. This could reshape how AI develops general intelligence.