Gemini is Google’s latest major language model, which was first mentioned by company CEO Sundar Pichai at the I/O developer conference in June and has now been unveiled to the public. According to Pichai and Google DeepMind CEO Demis Hassabis, this AI model is a huge step forward and will eventually affect virtually all Google products.

Gemini is more than just an AI model. There is a lighter version called Gemini Nano, which is designed for Android devices and runs offline. A more powerful version is called Gemini Pro, which will soon affect many Google AI services and is now the backbone of Bard. There’s also an even more capable model called the Gemini Ultra, which is the most powerful LLM Google has built so far and seems to be aimed mainly at data center and enterprise applications.

Google is currently rolling out the new AI model in a number of ways. Bard now has a Gemini Pro, and Pixel 8 Pro users will get some new features thanks to the Gemini Nano. Gemini Ultra is coming next year.

Starting December 13, developers and customers can access Gemini Pro through Google Generative AI Studio or Vertex AI on Google Cloud. Gemini is currently only available in English, with other languages ​​expected to come soon. But Pichai says the model will eventually be integrated into Google’s search engine, its ad products, and the Chrome browser.

OpenAI GPT-4 versus Google Gemini. “We’ve done very thorough side-by-side analysis and benchmarking of the systems,” says Hassabis. Google used 32 established benchmarks comparing the two models, ranging from broad general tests like the Multi-task Language Understanding benchmark to a benchmark that compares the two models’ ability to generate Python code. “I think we’re significantly ahead of 30 out of 32 of those criteria,” Hassabis noted.

In these benchmarks, Gemini’s clearest advantage is its ability to understand and interact with video and audio. Google has not trained separate models for images and voice, as OpenAI created DALL-E and Whisper. Google built a multisensory model from the ground up.

Currently, the most basic Gemini models are text in and out, but more powerful models like the Gemini Ultra can work with images, video and audio.

Google says they’ve worked hard to ensure Gemini’s security and accountability, through both internal and external testing and the creation of a dedicated team. Pichai points out that ensuring the security and integrity of data is especially important for enterprise products, where most generative AIs make money.

Source: The Verge