Google Launches PaliGemma 2 Vision-Language Models

Written by

December 6, 2024

Google has revealed the successor to its visual-language model PaliGemma, which was introduced in May 2024. The new version, PaliGemma 2, comes in a range of sizes, featuring parameter counts from 3 billion to 28 billion, and resolution options that go up to 896px.

Advanced Performance Features

According to the company, this model showcases “top-tier performance in recognizing chemical formulas, musical scores, spatial reasoning, and generating reports from chest X-rays.”

Enhanced Captioning Abilities

Additionally, it boasts long captioning functionality, offering “thorough, contextually relevant captions for images that go beyond basic object recognition to include descriptions of actions, emotions, and the overall story of the scene.”

Accessible and Flexible Options

The new models are designed to be a “drop-in replacement” across various sizes without the need for “significant code changes.” Pre-trained versions can be found on platforms like Hugging Face and Kaggle, available for free to anyone interested in testing them. It also provides support for several frameworks like Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

Google emphasizes that PaliGemma 2’s “adaptability makes it easy to fine-tune for particular tasks and datasets, allowing you to customize its functions to meet your specific requirements.”

google PaliGemma 2 visual-language model

Google Launches PaliGemma 2 Vision-Language Models

Advanced Performance Features

Enhanced Captioning Abilities

Accessible and Flexible Options

Comments

Leave a Reply Cancel reply

More posts

LG UltraGear 27G810A: New 4K Gaming Monitor with 360Hz Refresh Rate

Kingbull Launches Discover 2.0 Fat-Tire E-Bikes with Samsung Battery

Battlefield 6 SBMM Explained: Why It Isn’t a Top Priority

China Social Media Criticizes Nvidia H20 AI Chips for Backdoor