Technology

That Google Gemini video was so amazing because of some slick editing

four blue squares trailing down to grids of smaller blue squares

A demo video of Gemini, Google’s new AI model, isn’t as “mindblowing” as it appears.

On Wednesday, Google released Gemini, a natively-built multimodal model that surpassed OpenAI’s GPT-4 in major intelligence benchmarks. A six-minute demo video showing off Gemini’s amazing abilities to track a ball in a cup trick, locate countries on a map, and identify a simple duck drawing wowed techies on social media — and seemed to convince the internet that AGI (artificial general intelligence) is on the horizon.

But it didn’t take long for experts to discover the Gemini video was a teensy bit exaggerated. As Parmy Olson for Bloomberg first reported, the video was edited in numerous ways.

How did Google embellish the Gemini demo?

As confirmed by Google, the video was not shot in real time. Instead, the video was created by “using still image frames from the footage, and prompting via text,” according to a Google spokesperson.

It seems like Gemini is being prompted by the person’s voice, but the audio was actually added in later. However, “the user voices over real excerpts from the actual prompts used to produce the Gemini output that follows,” the Google rep said. Additionally, according to the description on YouTube, “latency has been reduced and Gemini outputs have been shortened for brevity.” In other words, its speedy response time in the video is not real.

After it was revealed that Gemini’s capabilities were inflated by slick video editing, Google DeepMind VP of Learning and Research Lead Oriol Vinyals posted on X (formerly Twitter) to clear things up. “All the user prompts and outputs in the video are real, shortened for brevity,” said Vinyals. “The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.”

But users still felt deceived. “If you want to inspire developers then why don’t you post factual content? The prompts can’t be ‘real’ and shortened at the same time. It was disingenuous and misleading,” commented one user on Vinyals’ post. “Sorry, ‘real but shortened’ isn’t a thing,” said another.

Backlash over the demo overshadowed some of Gemini’s actual achievements. The blog post breaking down how the video was made showed off Gemini’s impressive reasoning skills, even if it was just through text prompts and photo stills (as opposed to voice and video). Other promo videos showcased specific use cases for how Gemini can extract scientific data from 200,000 research papers or help parents assist their kids with math and physics homework.

That said, whether Gemini’s abilities fall above or below expectations will be up to users to decide.

Mashable