A New Entrant Just Leaped Ahead of OpenAI’s ChatGPT

Dec 13, 2023
A New Entrant Just Leaped Ahead of OpenAI’s ChatGPT

The title didn’t say much, but I knew they were on to something. 

It was clear this was the strategic direction they were taking with their research.

The paper, “Collaborating with language models for embodied reasoning,” was published on February 1, 2023 by a team of researchers at Google’s DeepMind division. I read it a few days after that.

I knew it was a big deal and that it would lead to a future development path for artificial intelligence.

And yet, no one was talking about it.

I was so excited about the paper that I spoke about it on my return visit to Glenn Beck’s video podcast on February 9th. 

Glenn is passionate about technology and its impact on society and geopolitics. So it was no surprise we spent most of our time digging into the latest developments in artificial intelligence. For those who haven’t seen the interview, you can watch it by going to this link.

The DeepMind research got me excited because it represented a way to improve the performance of the kind of large language models (LLMs) — like OpenAI’s ChatGPT — that have been so popular this year. It does so with a form of artificial intelligence called reinforcement learning.

We can think of reinforcement learning as the ability of an AI to take unstructured inputs from its environment and “learn” the best outcomes through trial and error. I know that might not sound like much — but it is.

Large language models like ChatGPT are “just” software. We interact with them inside of a chat window on a computer or a smart phone. They don’t have the ability to interact with the real world, or with real time data. 

Most don’t know this, but the core ChatGPT software is only trained on internet information through September 2021. It’s as if the world no longer existed after that. 

And only last month, OpenAI released an updated version of GPT-4 called GPT-4 Turbo, which is trained on data through April 2023. 

Not many at all have access to that product, and still… it too has no ability to “see” or receive data past that date or use data inputs from the present.

The DeepMind research acknowledged the strengths of large language models. And it presented a method of using reinforcement learning to ingest real-time information in order to make a far more intelligent — and useful — artificial intelligence capable of operating in the real world.

This was the point I made towards the very end of my discussion with Glenn. I predicted that we’d see a major development — within a year — that utilizes this technology.

Which is why I wasn’t surprised a bit when DeepMind announced its latest breakthrough…

Introducing DeepMind's Gemini

On December 6, 2023, DeepMind unveiled Gemini, its “largest and most capable AI model.”

For something as incredible as this, seeing is believing.

If you have just six minutes, I encourage my subscribers to watch this short clip to understand what this new AI is capable of doing. 

Doing so will help us all “get it,” so that we can understand the implications of this latest news.

Source: Google

Gemini has the ability to perform tasks that involve complex reasoning. 

It builds on large language model (LLM) technology, and then ingests new information from images, audio, and video inputs.

It has the ability interpret image sequences like the one shown below:

Source: Google DeepMind

Gemini can also engage in games like ball and cup shuffling, which require it to watch an action and infer the right answers based on what it is seeing…

Source: Google DeepMind

These are just a couple of additional examples of how Gemini uses real world inputs to infer, reason, and even determine the correct course of action.

It’s phenomenal…

And yet the announcement from DeepMind was widely criticized.

Not a PR Firm

Technology journalists have denounced Gemini, claiming that the company intentionally “misled” everyone by editing the video linked above. 

The crux of the issue was that DeepMind edited the video into a smoother format by removing any latency from Gemini (latency is the delay in the response of Gemini as it processes a task).

And yet DeepMind was open about the purpose of the video and the edits.

One of its key executives publicly announced with the release of Gemini that (bolding for emphasis is mine):

All the user prompts and outputs in the video are real, shortened for brevity. The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.”

Removing the latency wasn’t done to misrepresent the technology. It was done to shorten the video.

DeepMind even went so far to provide the following disclaimer, which we can all see on the YouTube post:

“For the purposes of this demo, latency has been reduced and Gemini outputs have been shortened for brevity.”

Many tech journalists also wrote about how Gemini was basically nothing special. That Google was just trying to catch up with OpenAI’s ChatGPT.

While it is true that Google has been behind in large language model development, it’s not true that Gemini is just a “me too” product. 

And after reviewing all of the criticisms, the crux of “their” complaint was well captured in the following quote:

"If it wants to inspire developers, it’s not through carefully edited sizzle reels that arguably misrepresent the AI’s capabilities. It's through letting journalists and developers actually experience the product."

And there it is…

The journalists weren’t coddled and given access to the technology. So the knives are out. What an apt representation of what is completely broken in the media today.

But who cares? DeepMind certainly doesn’t. While it’s a division of a publicly traded company, it’s one of the most advanced research and development laboratories for artificial intelligence on the planet. It’s not a public relations firm. 

DeepMind has solved grand challenges, like accurately predicting how all known proteins fold with its AlphaFold and subsequent AlphaFold 2 AI.

This information has been open sourced and made available to the entire world. It will lead to an unknown number of breakthroughs in curing human disease with this newfound information.

Who cares if it ruffles a few journalists’ feathers along the way... 

And who cares about their trite, meaningless fluff? The story — the real story — is the technological advancements that DeepMind has continued to deliver at an incredible pace.

Just last week, we explored how DeepMind increased the number of known stable crystal materials by a factor of 10 with its latest research. For those that didn’t yet see that issue, you can access it here.

Again, that materials research has been made available to the world. It will ultimately lead to breakthroughs in electronics, semiconductors, energy, and computing systems.

And this “gotcha” about there being latency in the system completely misses the point… 

The hard problem to solve is designing and training an advanced artificial intelligence. Latency can and will be improved through additional training and computational resources. That’s a much easier problem to solve for.

Put simply, the latency will shrink quickly as DeepMind improves the technology.

So who cares what “they” have to say…

If we care about technological breakthroughs that will benefit the human race, then we want DeepMind focused on more breakthroughs with artificial intelligence, not worrying about pedantic detractors.

And that’s exactly what Gemini will do.

DeepMind Has the Software, the Hardware is Next…

The most advanced form of Gemini — called Gemini Ultra — will be released early next year. And it has already been benchmarked against OpenAI’s GPT-4.

The short summary is that Gemini Ultra is materially better than GPT-4 — in every single benchmark except one.

It’s better at reasoning, math, developing software code, interpreting images, video, and audio.

But the most important and exciting improvement Gemini Ultra brings is that it is multimodal. 

This multimodal ability links back to the research that was published in February — the research I referenced at the beginning of this issue.

Gemini has the ability to intake sensory information from the real world and use that information to infer and respond, based on that new information.

That may not sound like much, but it is a huge leap.

And the easiest way to visualize that leap is by imagining Gemini Ultra as an operating system for a humanoid robot that has the ability to listen and see its surroundings.

Once the software has been improved, the next leap will be to put the technology into hardware — specifically robots. And that’s when things get really exciting…

Fully functional, humanoid robots capable of performing human tasks in real world settings like industrial facilities, healthcare, and even in home use.

Perhaps DeepMind head Demis Hassabis summed it up best in a recent interview:

“We’ve got some interesting innovations we’re working on to bring to future versions of Gemini. You’ll see a lot of rapid advancements next year.”

Next year…

Not five or 10 years down the road.

We’d to be smart to believe him. 

It’s coming… much sooner that we think.

Previous Post Next Post