Covariant, a company founded by University of California Berkeley professor Pieter Abbeel and three of his former students Peter Chen, Rocky Duan and Tianhao Zhang, spent years collecting warehouse robot data from cameras and other sensors and has now developed robots that can learn through hearing and observation.
By combining this data with the vast amount of text used to train chatbots like ChatGPT, the company has created AI technology that gives this bot a much broader understanding of the world around it.
After identifying patterns in this maze of images, sensory data and text, this technology empowers the robot to deal with unexpected situations in the physical world. The robot knows how to pick up a banana, even if it has never seen a banana before.
The bot can also respond in plain English, similar to a chatbot. When you tell him to “pick up the banana” he knows what that means. If you tell him to “pick up the yellow fruit”, he will understand that too. The robot can even create videos that predict what is likely to happen when it tries to pick up a banana. These videos have no practical use in the warehouse, but they show the robot’s understanding of its surroundings.
“If a robot can predict the next frames of a video, it can pinpoint the right strategy to follow,” said Dr. Abbeel.
Gary Marcus, an AI entrepreneur and professor emeritus of psychology and neuroscience at New York University, said the technology could be useful in warehouses and other situations where mistakes are acceptable. But he said it would be more difficult and risky to use in manufacturing plants and other potentially dangerous situations.
“It comes down to the cost of a mistake,” he said. “When you have a 150-pound robot that can do something harmful, that price can be high.”
As companies train this type of system on increasingly large and diverse data sets, the researchers believe it will improve rapidly.
This is very different from how bots used to operate. Typically, engineers programmed the robots to perform the same exact movements over and over again—such as picking up a box of a certain size or attaching a rivet to the rear bumper of a car. But robots couldn’t handle unexpected or random situations.
By learning from digital data—hundreds of thousands of examples of what’s happening in the physical world—robots can learn to deal with the unexpected. And if these examples are related to language, bots can also respond to text and voice suggestions like a chatbot.
This means that, like chatbots and image generators, robots are becoming more skilled.
Source: The New York Times