Google wants to make its robots smarter with the release of Robotic Transformer (RT-2), an artificial intelligence learning model.

RT-2 is a new version of the company’s Vision-Language-Activity (VLA) model. The model teaches robots to better recognize visual and linguistic patterns to interpret instructions and infer which objects best match the query.

The researchers tested the RT-2 robotic arm in a kitchen office, asking it to decide what a good improvised hammer was and choose a drink to give an exhausted person. They also told the robot to raise a drink can to a picture of Taylor Swift.

The new model is trained with web and robotics data, leveraging research advances in large language models like Google’s own Bard and combining it with robotic data (such as which joints to move). The machine also understands instructions in languages ​​other than English.

For years, scientists have been striving for robots to make better inferences that would help them exist in real environments. It used to take a long time to teach a robot. The researchers had to program the directions individually. But thanks to VLA models like the RT-2, robots can access a larger set of information to infer what to do next.

However, Google’s new robot is not perfect either. The New York Times saw a live demo of the robot and reported that it incorrectly identified soda flavors and misidentified fruit as white.

Source: The Verge