Researchers at MIT and the MIT-IBM Watson AI Lab have developed a method for robot navigation that uses large language models (LLM) instead of visual data, simplifying the process and reducing computational requirements.
The new approach turns visual observations into text captions, which a large language model then uses to guide the robot’s actions.
This method is useful when visual data is scarce and allows for the rapid generation of synthetic training data. Although it does not outperform vision-based methods, it improves navigation performance when combined with them. The researchers aim to further investigate language-based navigation and its possible improvements.
Source: Science Daily