Google introduces Gemma 4 AI models for devices and high-end systems

Google said the models are designed to support advanced reasoning and can carry out agent-style tasks by connecting with external tools and software. The models are developed using research aligned with the systems behind Gemini 3 and are compatible with a broad range of hardware, including smartphones, laptops, developer machines, and accelerator-based systems.

Google has expanded its artificial intelligence lineup with the introduction of Gemma 4, a new collection of open models aimed at improving how developers build and deploy AI across different devices, from mobile phones to large-scale computing systems.

The company presented Gemma 4 as its most capable open models so far, introducing four versions designed for varying levels of performance needs. These include Effective 2B, Effective 4B, a 26B Mixture of Experts model, and a 31B Dense model, each tailored to balance efficiency and computing demand.

The release has been made under the Apache-2.0 license, allowing developers to access, adjust, and deploy the models without restrictions across different environments. This approach is intended to widen usage among developers working on different AI applications.

“This breakthrough builds on incredible community momentum: since the launch of our first generation, developers have downloaded Gemma over 400 million times, building a vibrant Gemmaverse of more than 100,000 variants,” Google said.

“We listened closely to what innovators need next to push the boundaries of AI, and Gemma 4 is our answer: breakthrough capabilities made widely accessible under an Apache 2.0 license,” it added.

One of the main features highlighted in Gemma 4 is the ability of the E2B and E4B models to function directly on local devices such as smartphones, laptops, and Internet of Things hardware. This removes the need for continuous internet connection when running certain AI tasks.

Google noted that these smaller models are built to handle text, images, video, and audio inputs while keeping response times low and managing power consumption efficiently. This makes them suitable for everyday devices where both speed and battery usage are important.

In contrast, the larger 26B Mixture of Experts and 31B Dense models are intended for more demanding computing setups. These include developer workstations, advanced desktop machines, and servers equipped with high-performance graphics processors such as NVIDIA H100 units.

“At the edge, our E2B and E4B models redefine on-device utility, prioritizing multimodal capabilities, low-latency processing, and seamless ecosystem integration over raw parameter count,” Google said.

The launch comes amid increasing competition among technology firms seeking to build more capable and efficient artificial intelligence systems that can operate both locally on devices and within large-scale cloud infrastructure.

Related Topics

Related Stories

Latest Stories