Microsoft is broadening its Phi-3 series of small language models with the launch of Phi-3-vision. Unlike its counterparts, Phi-3-vision isn't limited to text processing — it’s a multimodal model capable of analyzing and interpreting images as well.
The model excels at object recognition in images
This 4.2 billion parameter model is optimized for mobile devices and excels at general visual reasoning tasks. Users can pose questions to Phi-3-vision about images or charts, and it will provide insightful answers. While it isn’t an image generation tool like DALL-E or Stable Diffusion, Phi-3-vision is exceptional at image analysis and comprehension.
Expansion of the Phi-3 family
The introduction of Phi-3-vision follows the release of Phi-3-mini, the smallest model in the Phi-3 family with 3.8 billion parameters. The complete family now consists of Phi-3-mini, Phi-3-vision, Phi-3-small (7 billion parameters), and Phi-3-medium (14 billion parameters).
Emphasis on smaller models
This emphasis on smaller models highlights a growing trend in AI development. Smaller models require less processing power and memory, making them perfect for mobile devices and other resource-constrained settings. Microsoft has already achieved success with this strategy, as its Orca-Math model has reportedly outperformed larger competitors in solving math problems. Phi-3-vision is currently available in preview, while the rest of the Phi-3 series (mini, small, and medium) can be accessed through Azure’s model library.