Work with Boston Dynamics’ humanoid Atlas robot showcases a new approach to learning complex new tasks using a human-watching AI model.
Programming a machine for the sheer variety of challenges in a human environment, from handling delicate objects to navigating cluttered spaces, has been a monumental task. Now, Large Behavior Models (LBMs) promise to help robots learn tasks more quickly.
Atlas becomes a robot apprentice
In a demo by Boston Dynamics and Toyota Research Institute (TRI), a single AI model guides the Atlas robot through a lengthy “Spot Workshop” task. The robot coordinates its entire body to pick up parts from a cart, fold them, and place them on a shelf. It then pulls out a low bin to store other components before clearing the remaining items into a large truck.
But the real magic happens when things don’t go to plan. The initial versions of the AI couldn’t cope with surprises. The solution wasn’t to write complex new code. Instead, the team simply had a human operator demonstrate how to recover from mistakes, like a part falling to the floor or a bin lid closing unexpectedly. After retraining the network with these new examples, the robot learned to be reactive and solve problems on its own.
[embedded content]
This ability comes from the policy’s power to estimate what’s happening in the world based on its sensors and react based on the experiences it has seen in its training. Programming new behaviours no longer requires years of expert engineering and creates an opportunity to rapidly scale up what the Atlas robot can do.
The human in the machine
So, how do you teach a robot? The process begins with a human operator stepping into a virtual reality rig. Wearing a VR headset, the operator is fully immersed in the robot’s workspace, seeing what it sees through its head-mounted stereo cameras. With trackers on their hands and feet, they can control Atlas in a fluid, intuitive way, with their movements mapped directly to the machine.
This teleoperation system is what allows for the collection of high-quality data. Whether crouching low to pick something up or taking careful steps to reposition itself, the robot’s actions are guided by a human, generating the raw data needed for learning.
This data then feeds Atlas’ robot “brain”: a 450 million parameter Diffusion Transformer architecture. The model takes in everything the robot senses – images, its own body position (proprioception), and a high-level language prompt telling it what to do – and in return, it generates the actions needed to control Atlas’s entire 50-degree-of-freedom body.
All of this happens in a continuous, iterative loop: collect data, train the model, and evaluate the performance to decide what data to collect next.
Learning complex and unpredictable tasks
By training policies on a huge variety of tasks, the researchers have found the robot gets better at generalising and recovering from errors. The system can learn to perform jobs that would be exceptionally difficult to code by hand due to their complex and unpredictable nature.
Using a single language-conditioned AI model, Atlas has learned to tie a rope, spread a tablecloth, and even manipulate a 22lb car tyre.
The team found that for LBMs, the process is the same whether the robot is stacking rigid blocks or folding a t-shirt: if a human can demonstrate it, a robot like Atlas can learn it. As a bonus, they also discovered they could speed up the robot’s actions at runtime, often having it perform tasks 1.5 to 2 times faster than the human demonstration without any drop in performance.
The next steps involve scaling up this “data flywheel” to increase the diversity and difficulty of tasks, while also exploring new AI algorithms and ways to incorporate other data sources. It’s a step on the long road toward a future where humanoid robots like Atlas can work with us, and for us, in the real world.
(Image credit: Boston Dynamics)
See also: Dominic Maidment, Unilever: IoT gives supply chains ‘a sixth sense’


Want to learn about the IoT from industry leaders? Check out IoT Tech Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Cyber Security & Cloud Expo, AI & Big Data Expo, Intelligent Automation Conference, Edge Computing Expo, and Digital Transformation Week.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.