Nine months after the introduction of Gemini 1.5, Google has unveiled the next significant update for its Large Language Model (LLM), named Gemini 2.0. The first model in this new lineup, Gemini 2.0 Flash, is now available as an experimental option in Google AI Studio and Vertex AI.
Improved Speed and Functionality
Gemini 2.0 Flash boasts "enhanced performance at similarly fast response times" and is said to be "twice as fast" compared to 1.5 Flash. The upgraded LLM supports various modes of input, including images, text, video, and audio. Additionally, it can handle mixed media, combining pictures with text, as well as multilingual text-to-speech audio.
New Features and APIs
This new version allows direct access to Google Search and accommodates third-party code execution along with predefined functions. Google is also launching a Multimodal Live API for developers to utilize. A version of 2.0 Flash optimized for chat will soon be accessible on both desktop and mobile browsers, with plans for a release on the Gemini mobile app in the near future.
Advanced Prototypes
Google's Project Astra, a research prototype, has been upgraded with Gemini 2.0, showing improvements in dialogue, reasoning abilities, and native integration with tools such as Google Search, Lens, and Maps. This prototype can maintain up to 10 minutes of memory during a session.
Another research effort, Project Mariner, utilizes 2.0 to comprehend complex instructions and retrieve data from a browser screen. This includes analyzing "pixels and web elements like text, code, images and forms," and employing an experimental Chrome extension to assist users in completing tasks.
AI Code Assistant
The third prototype, an experimental AI code assistant named Jules, can be seamlessly integrated into GitHub workflows. It possesses reasoning and logic skills to address coding challenges and formulate solutions under the supervision of developers.
Google also revealed that it has created AI agents "using Gemini 2.0" capable of assisting users in navigating the virtual realms of video games. These agents can analyze game dynamics based solely on on-screen actions and provide real-time suggestions on what to do next in a conversational manner.
Source: Link