Google has introduced new voice and video search features for Google Lens during the I/O 2024 event back in May. Now users can easily long-press and ask questions using their voice, making the search process much simpler and more convenient.
Custom Gemini Model Powers Video Search
Enhanced Interaction with Google Lens
Once Lens starts capturing video, users can pose questions about what they observe. For instance, when asked, “Why are they swimming together?” the Lens responded through Google Gemini. This video search capability allows users to present their phone with moving objects and inquire about them, enhancing the usefulness of Google Lens in various situations. To access this feature, users can participate in the “AI Overviews and more” experiment within Search Labs.
Rajan Patel, Google’s vice president of engineering, explained how the feature operates. Google captures the video as a series of image frames, applying existing computer vision techniques used in Lens. Importantly, the responses are generated by a custom Gemini model designed to interpret multiple frames in sequence. Once the frames are processed, the model pulls relevant information from the web to formulate an answer.
In conclusion, this development effectively utilizes existing technology, adding significant value to Google Lens.