What is Gemini? Everything you should know about Google's new AI model

Gemini is a powerful artificial intelligence (AI) model from Google that can understand text, images, videos, and audio. As a multimodal model, Gemini is described as capable of completing complex tasks in math, physics, and other areas, and understanding and generating high-quality code in various programming languages.

It is currently available through the Gemini chatbot (formerly Google Bard) and some Google Pixel devices and will gradually be folded into other Google services. During Google I/O 2024, the company announced new features that will come to Gemini, including a new ‘Live’ mode and integrations with Project Astra. Gemini also powers AI overview in Google searches.

Also: I ranked the AI features announced at Google I/O from most useful to gimmicky

“Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research,” said Dennis Hassabis, CEO and co-founder of Google DeepMind, when announcing Gemini.

“It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across, and combine different types of information including text, code, audio, image, and video.”

Compared to GPT-4, a primarily text-based model, Gemini easily performs multimodal tasks natively. While GPT-4 excels in language-related tasks, such as content creation and complex text analysis natively, it resorted to OpenAI’s plugins to perform image analysis and access the web at the time of testing and relies on DALL-E 3 and Whisper to generate images and process audio.

This approach could change when OpenAI makes GPT-4o widely available, as ChatGPT won’t rely on three separate models to perform actions and will instead use an omnimodel.

Google’s Gemini also appears to be more product-focused than other models available. Gemini is either integrated into the company’s ecosystem or has plans to be, as it’s powering both the chatbot and Android devices. Other models, like GPT-4 and Meta’s Llama, are more service-oriented and available for various third-party developers for applications, tools, and services.

Related Posts

Amazon Prime Day deals live: We found 160+ of the best deals ahead of October's Big Deal Days

Best Prime Day Kindle deals to shop in October 2024

Best Prime Day deals under $50 to shop in October 2024

Best Prime Day deals under $25 to shop in October 2024

Our Services

Our Solutions