Technology

Gemini, Google’s rebuttal to GPT-4, is here

visualization of the three different gemini models

Google has unveiled Gemini, calling it their “largest and most capable AI model.”

On Wednesday, Google DeepMind, the company’s artificial intelligence research lab, introduced the highly-anticipated model that reportedly surpasses OpenAI’s GPT-4 on major benchmarks.

Ever since OpenAI launched ChatGPT a year ago, leading tech companies have been locked in a competition for AI advancements. So far, Microsoft has had a slight leg up due to its access to OpenAI’s models as a major investor in the AI company. Google has been uncharacteristically flat-footed. The initial release of Bard — its ChatGPT competitor — was botched. And Google has generally lagged behind releases from OpenAI and Microsoft with Bing and Copilot. Google Gemini, however, might be a big enough advancement to leapfrog ahead of OpenAI.

What is Gemini?

What makes Gemini so good, according to Google, is its multimodal capabilities, sophisticated reasoning, and advanced coding abilities. Unlike other multimodal AI models that are first built on text, then later add on image data, Gemini is natively multimodal. That means it was pre-trained on audio and image modalities in addition to text from the beginning. “This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models,” said the announcement.

Gemini scored 90 percent compared to GPT-4’s 86.4 percent on MMLU (massive multitask language understanding), which tests for multi-disciplinary knowledge and problem-solving.

So we know Gemini got good grades, but how does it do in the real world? We’re all about to find out. Google has optimized Gemini for three different sizes: Gemini Ultra, the largest model for highly complex tasks, Gemini Pro, the middleweight model capable most other tasks, and Gemini Nano, an efficient model that’s small enough to live on your phone.

Speaking of which, starting today, Gemini Nano will run on the Google Pixel 8 Pro. For now, Gemini Nano will power two features on the device. It can summarize transcripts in the Recorder app and will also suggest responses for Smart Reply for the phone’s keyboard (Gboard).

Even if you’re not an Android user, you can test out Gemini in Google Bard as of today. Expect to notice an all around improvement in reasoning, planning, and understanding. Gemini Pro will power Bard, but only text prompts.

Multimodality is coming soon. Next year, Google plans to announce an advanced version of Bard which will be powered by Gemini Ultra.

Mashable