Google is wrapping up 2024 with a significant announcement. On Wednesday, the tech giant from Mountain View shared a wave of updates related to AI, prominently featuring the launch of Gemini 2.0. This new language model boasts cutting-edge multimodal abilities, marking what Google describes as the start of the "agentic era," in which virtual AI agents can complete tasks on your behalf.
Introduction of Gemini 2.0 Flash
At first, Google is rolling out just one model from the Gemini 2.0 lineup: Gemini 2.0 Flash experimental. This model is ultra-fast and lightweight, designed to support various input and output formats. It is capable of generating images that blend text and multilingual audio, while also being able to access Google Search, execute code, and utilize other tools seamlessly. Currently, these features are available for developers and beta testers. Despite its compact size, Gemini 2.0 Flash outshines Gemini 1.5 Pro in various metrics such as factuality, reasoning, coding, and math, all while operating at double the speed. Regular users can access the chat-optimized version of Gemini 2.0 Flash on the web today, with a mobile app version arriving soon.
Showcasing New Experiences
Google is also revealing several exciting applications developed with Gemini 2.0. One of these is the updated Project Astra, a virtual AI agent that was first presented in May 2024. Thanks to Gemini 2.0, it can now engage in conversations across multiple languages, utilize tools like Google Search, Lens, and Maps, recall information from previous chats, and understand language with the quickness of human dialogue. Project Astra is intended for use on smartphones and smart glasses, but is currently being tested by a select group of trusted users. If you're interested in testing this prototype on your Android device, you can sign up for the waitlist here. Additionally, there's an impressive demo of the Multimodal Live API, which shares similarities with Project Astra, allowing real-time interaction with a chatbot through video, voice, and screen sharing.
Exploring Project Mariner and Jules
Another notable project is Project Mariner, an experimental Chrome browser extension that can navigate the web and accomplish tasks for you. This extension is being tested by a limited number of users in the US and utilizes Gemini 2.0’s multimodal functions "to comprehend and reason through information displayed on your browser, including pixels and web elements like text, images, code, and forms." Google admits that this technology is still developing and may not always work reliably. However, even in its prototype state, it is quite remarkable, as demonstrated in a YouTube video.
Additionally, Google has introduced Jules, an AI-driven code agent that integrates directly into GitHub workflows. The company claims it can manage bug fixes and repetitive tasks, allowing you to concentrate on the actual development work you wish to accomplish.
Much of what has been announced is currently limited to early testers and developers. Google intends to incorporate Gemini 2.0 into its various products, like Search, Workspace, Maps, and more, early next year. At that point, we’ll have a clearer picture of how these new multimodal features and enhancements can be applied in real-world scenarios. There’s still no update on the Gemini 2.0 Ultra and Pro versions.
Source: Link