Google ‘Gemma 4’ AI model: This new AI tool can build AI agents for you and handle text, image, audio tasks

Google Gemma 4: Google has introduced its new artificial intelligence model, Gemma 4, on April 2, expanding its lineup of AI tools with a focus on multimodal capabilities and flexible deployment. Gemma 4 is designed as a multimodal model, meaning it can process both text and image inputs while generating text outputs. Some smaller versions of the model also support audio features. The model offers multilingual support across more than 140 languages, making it suitable for a wide range of global applications.

The release includes open-weights models in both pre-trained and instruction-tuned versions, allowing developers to adapt the system for different use cases.

Range of model sizes and uses

Gemma 4 comes in four sizes: E2B, E4B, 26B A4B, and 31B. These variations are aimed at different environments, from mobile devices and laptops to high-performance servers. Smaller models are optimised for on-device use, while larger versions are intended for more complex computing tasks.

The models support a context window of up to 256,000 tokens, enabling them to process and understand large amounts of information in a single interaction.

Focus on reasoning and coding

All versions of Gemma 4 are built with reasoning capabilities, allowing them to handle step-by-step problem-solving tasks. The models also show improvements in coding, including code generation, completion, and correction.

In addition, the system includes native function-calling support, which enables structured interactions and supports the development of autonomous AI agents.

Architecture and technical features

Gemma 4 uses a mix of Dense and Mixture-of-Experts (MoE) architectures. It also features a hybrid attention mechanism that combines local and global processing, helping to balance speed and performance in long-context tasks.

The model supports image and video understanding, including object detection, document parsing, and chart analysis. It can also process mixed inputs, such as text and images within a single prompt.

Smaller models support audio tasks like speech recognition and translation. Overall, Gemma 4 is designed to handle a broad range of AI tasks across different platforms and use cases.

Hot topics

World

Business

Politics

Tech

Hot topics

World

Business

Politics

Tech

Range of model sizes and uses

Focus on reasoning and coding

Architecture and technical features

Topics

Related Articles

Categories

Latest

Newsletter