Tag: visual-language model

September 5, 2025

SwitchBot AI Hub: New Smart Home Hub with VLM Features

Key Takeaways

1. The SwitchBot AI Hub is the first smart home hub to use Visual Language Model (VLM) technology, unveiled at IFA 2025.
2. It can identify events through compatible smart cameras and activate devices based on those events, requiring a subscription for this feature.
3. The hub can translate video footage into text, enabling smart actions like turning on lights when someone is recognized reading in a dark room.
4. Users can search recorded video footage by text and receive daily event summaries via the SwitchBot app.
5. It features 32 GB of built-in storage, expandable up to 1 TB, and connects with over 100 smart devices using various connectivity options.

More information regarding the SwitchBot AI Hub has been revealed at IFA 2025. Launched in May 2025, this smart home hub is currently being showcased at the event in Berlin. The company asserts that it is the first AI hub worldwide that supports Visual Language Model (VLM) technology.

Features of the AI Hub

The SwitchBot AI Hub implements VLM to identify events and convert them into text, which allows it to activate smart home devices. To access this subscription-based function, the hub must be connected to one of three compatible smart cameras: the Smart Video Doorbell, the Pan/Tilt Cam Plus 2K Plus, or the Pan/Tilt Cam 3K Plus (currently priced at $69.99 on Amazon).

Real-World Applications

A demonstration of the SwitchBot AI Hub’s text translation capabilities using video footage is featured in a YouTube video (below). In the video, an individual is recognized as “in a dark room” while “reading a book.” According to SwitchBot, this could activate smart lighting in that space. Another application for this feature is enabling text searches of recorded video footage. For instance, you might ask, “Show me when I left my phone.” Additionally, users will receive daily event summaries through the SwitchBot app.

Storage and Connectivity Options

The SwitchBot AI Hub comes with 32 GB of built-in storage, which can be expanded up to 1 TB, allowing for local data storage and processing. It claims to connect with over 100 smart home devices, supporting Matter Over Bridge, Bluetooth, and dual-band 2.4GHz/5GHz Wi-Fi. It is yet to be determined when the SwitchBot AI Hub will officially launch and what the pricing will be.

Source:
Link

Tags: visual-language model
December 6, 2024

Google Launches PaliGemma 2 Vision-Language Models

Google has revealed the successor to its visual-language model PaliGemma, which was introduced in May 2024. The new version, PaliGemma 2, comes in a range of sizes, featuring parameter counts from 3 billion to 28 billion, and resolution options that go up to 896px.

Advanced Performance Features

According to the company, this model showcases “top-tier performance in recognizing chemical formulas, musical scores, spatial reasoning, and generating reports from chest X-rays.”

Enhanced Captioning Abilities

Additionally, it boasts long captioning functionality, offering “thorough, contextually relevant captions for images that go beyond basic object recognition to include descriptions of actions, emotions, and the overall story of the scene.”

Accessible and Flexible Options

The new models are designed to be a “drop-in replacement” across various sizes without the need for “significant code changes.” Pre-trained versions can be found on platforms like Hugging Face and Kaggle, available for free to anyone interested in testing them. It also provides support for several frameworks like Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

Google emphasizes that PaliGemma 2’s “adaptability makes it easy to fine-tune for particular tasks and datasets, allowing you to customize its functions to meet your specific requirements.”

Tags: google, visual-language model

Tag: visual-language model

SwitchBot AI Hub: New Smart Home Hub with VLM Features

Key Takeaways

Features of the AI Hub

Real-World Applications

Storage and Connectivity Options

Google Launches PaliGemma 2 Vision-Language Models

Advanced Performance Features

Enhanced Captioning Abilities

Accessible and Flexible Options