Tag: Gemini 2.0

  • The Future of Voice Assistants: A New Hotword Emerges

    The Future of Voice Assistants: A New Hotword Emerges

    On December 11, Google announced an important update for Gemini. The new Version 2.0 will bring in features such as native image and voice capabilities, along with the integration of services like Spotify, WhatsApp, and Google Home into Google Search. This update is set to launch widely in early 2025 and is a part of Google’s plan to make Gemini a next-gen AI, distinguishing it from the older Google Assistant.

    A Change in Activation

    Current indicators suggest that soon the hotword “Hey Google” may be swapped out. Instead, Android users might activate their AI assistant by saying “Hey Gemini.” This was hinted at during the recent reveal of Android XR, which is Google’s latest operating system tailored for mixed reality headsets and smart glasses.

    The Future Role of Gemini

    In the recent demo videos of Android XR shared by Android Police on YouTube, the phrase “Hey Gemini” is mentioned for the first time. This indicates that Gemini is expected to take on a key role in AI-driven devices and systems in the future. It is also anticipated that “Hey Gemini” will replace the well-known “Hey Google” on Android smartphones, including the Google Pixel 9. However, the exact timeline for this transition is still uncertain.

    Source: Link

  • Google Gemini 2.0 Launches with Experimental AI Agents

    Google Gemini 2.0 Launches with Experimental AI Agents

    Google is wrapping up 2024 with a significant announcement. On Wednesday, the tech giant from Mountain View shared a wave of updates related to AI, prominently featuring the launch of Gemini 2.0. This new language model boasts cutting-edge multimodal abilities, marking what Google describes as the start of the “agentic era,” in which virtual AI agents can complete tasks on your behalf.

    Introduction of Gemini 2.0 Flash

    At first, Google is rolling out just one model from the Gemini 2.0 lineup: Gemini 2.0 Flash experimental. This model is ultra-fast and lightweight, designed to support various input and output formats. It is capable of generating images that blend text and multilingual audio, while also being able to access Google Search, execute code, and utilize other tools seamlessly. Currently, these features are available for developers and beta testers. Despite its compact size, Gemini 2.0 Flash outshines Gemini 1.5 Pro in various metrics such as factuality, reasoning, coding, and math, all while operating at double the speed. Regular users can access the chat-optimized version of Gemini 2.0 Flash on the web today, with a mobile app version arriving soon.

    Showcasing New Experiences

    Google is also revealing several exciting applications developed with Gemini 2.0. One of these is the updated Project Astra, a virtual AI agent that was first presented in May 2024. Thanks to Gemini 2.0, it can now engage in conversations across multiple languages, utilize tools like Google Search, Lens, and Maps, recall information from previous chats, and understand language with the quickness of human dialogue. Project Astra is intended for use on smartphones and smart glasses, but is currently being tested by a select group of trusted users. If you’re interested in testing this prototype on your Android device, you can sign up for the waitlist here. Additionally, there’s an impressive demo of the Multimodal Live API, which shares similarities with Project Astra, allowing real-time interaction with a chatbot through video, voice, and screen sharing.

    Exploring Project Mariner and Jules

    Another notable project is Project Mariner, an experimental Chrome browser extension that can navigate the web and accomplish tasks for you. This extension is being tested by a limited number of users in the US and utilizes Gemini 2.0’s multimodal functions “to comprehend and reason through information displayed on your browser, including pixels and web elements like text, images, code, and forms.” Google admits that this technology is still developing and may not always work reliably. However, even in its prototype state, it is quite remarkable, as demonstrated in a YouTube video.

    Additionally, Google has introduced Jules, an AI-driven code agent that integrates directly into GitHub workflows. The company claims it can manage bug fixes and repetitive tasks, allowing you to concentrate on the actual development work you wish to accomplish.

    Much of what has been announced is currently limited to early testers and developers. Google intends to incorporate Gemini 2.0 into its various products, like Search, Workspace, Maps, and more, early next year. At that point, we’ll have a clearer picture of how these new multimodal features and enhancements can be applied in real-world scenarios. There’s still no update on the Gemini 2.0 Ultra and Pro versions.

    Source: Link

  • Google Unveils Gemini 2.0 Models for New Agentic Era

    Google Unveils Gemini 2.0 Models for New Agentic Era

    Nine months after the introduction of Gemini 1.5, Google has unveiled the next significant update for its Large Language Model (LLM), named Gemini 2.0. The first model in this new lineup, Gemini 2.0 Flash, is now available as an experimental option in Google AI Studio and Vertex AI.

    Improved Speed and Functionality

    Gemini 2.0 Flash boasts "enhanced performance at similarly fast response times" and is said to be "twice as fast" compared to 1.5 Flash. The upgraded LLM supports various modes of input, including images, text, video, and audio. Additionally, it can handle mixed media, combining pictures with text, as well as multilingual text-to-speech audio.

    New Features and APIs

    This new version allows direct access to Google Search and accommodates third-party code execution along with predefined functions. Google is also launching a Multimodal Live API for developers to utilize. A version of 2.0 Flash optimized for chat will soon be accessible on both desktop and mobile browsers, with plans for a release on the Gemini mobile app in the near future.

    Advanced Prototypes

    Google’s Project Astra, a research prototype, has been upgraded with Gemini 2.0, showing improvements in dialogue, reasoning abilities, and native integration with tools such as Google Search, Lens, and Maps. This prototype can maintain up to 10 minutes of memory during a session.

    Another research effort, Project Mariner, utilizes 2.0 to comprehend complex instructions and retrieve data from a browser screen. This includes analyzing "pixels and web elements like text, code, images and forms," and employing an experimental Chrome extension to assist users in completing tasks.

    AI Code Assistant

    The third prototype, an experimental AI code assistant named Jules, can be seamlessly integrated into GitHub workflows. It possesses reasoning and logic skills to address coding challenges and formulate solutions under the supervision of developers.

    Google also revealed that it has created AI agents "using Gemini 2.0" capable of assisting users in navigating the virtual realms of video games. These agents can analyze game dynamics based solely on on-screen actions and provide real-time suggestions on what to do next in a conversational manner.

    Source: Link