Badly bloodied by OpenAI's GPT-4, Google has struck back with a new, more powerful large language model (LLM) to upgrade Bard and create a suite of new AI services – starting with a model targeted at doctors. It also teased its next-gen Gemini AI.
Launched at the company's I/O '23 conference, the PaLM 2 model leapfrogs its predecessor on pretty much every metric, according to a technical report, but Google chose to highlight three areas in which it believes the new model's particularly strong.
The first is multilingual capabilities. PaLM 2's training data was loaded up with a greater percentage of non-English text, and can now pass a bunch of different language exams at a "mastery" level. It's now outperforming Google's own Translate engine, and displaying a subtle understanding of languages, idioms, metaphors and the cultures behind them.
The second is "reasoning" – there's been a keen focus on maths and scientific papers in the training data, and Google says it's displaying "improved capabilities in logic, common sense reasoning, and mathematics." Maths in particular is an area where LLMs as a whole have struggled; it's just not their forte – and indeed, while PaLM 2 does beat GPT-4 on selected benchmarks, the gains here appear incremental rather than revolutionary.
The third is in coding, an area of immense potential for these LLMs. Google claims PaLM 2 is super-capable with Python and Javascript, but also very strong in a range of more specialized programming languages.
PaLM 2 is already rolled out as part of the company's embattled Bard search AI. It's also now coming to Workspace, including Gmail and Google Docs, in the form of a collaborative "Duet AI" that can generate images and words for your projects, help you brainstorm, organize spreadsheets, analyze and label data, and do a bunch of other little things designed to get tasks to the finish line.
But perhaps more interestingly, Google is leaping into the arena of industry-specific AI models, starting out with one targeted specifically at doctors.
Med-PaLM 2 has had most of its alignment and human feedback done by groups of health researchers and medical professionals. As a result, it becomes the first AI to achieve an "expert" level on tests designed to mimic US medical licensing exams. It can answer all sorts of health-related questions and is well-read across a variety of medical literature.
Like GPT-4, PaLM 2 is also beginning to gain multimodal capabilities – the ability to understand images and other media in the same way it "understands" text. In the context of Med-PaLM 2, this means it'll soon be able to look at your X-rays and other medical scans, and report on them – an area in which AIs have excelled in early trials, sometimes outperforming medical specialists.
Google will open this tool up to a small group of users in the coming months, aiming to "identify safe, helpful use cases" that could see Med-PaLM 2 rolling out into doctors' offices. This is both an exciting and daunting prospect; it promises a leap forward in healthcare and could put incredible tools in the hands of medical professionals.
At the same time, it's hard to ignore the fact that many of humanity's best and brightest students have gone into the medical field. ChatGPT is already looking much, much better at answering medical questions than human doctors – as judged by healthcare professionals themselves, complete with superior bedside manner and empathy – and when these machines inevitably expand their abilities to outperform doctors across the board, it'll be yet another helping of humble pie for a species that considers itself pretty special.
Google also used the opportunity to announce a restructuring effort it hopes will "significantly accelerate" the development of next-generation AIs, merging the Google Research Brain team with DeepMind to form Google DeepMind.
And with this, the company revealed what sounds like an absolute beast of an AI in development: "we’re already at work on Gemini — our next model created from the ground up to be multimodal, highly efficient at tool and API integrations, and built to enable future innovations, like memory and planning. Gemini is still in training, but it’s already exhibiting multimodal capabilities never before seen in prior models. Once fine-tuned and rigorously tested for safety, Gemini will be available at various sizes and capabilities, just like PaLM 2, to ensure it can be deployed across different products, applications, and devices for everyone’s benefit."
Training Gemini from day one on audio, video, images and other media – as well as text, and the ability to use other tools and APIs – means this thing is designed to learn even more like humans do than the big LLMs of today, and its ability to interact with the outside world in a range of ways beyond just a text window is baked in rather than tacked on. It could well prove as much of a leap forward as anything else we've seen in the last six months – a sobering thought in itself.
On paper, today's announcements appear to show solid progress from Google, bringing it close to where OpenAI's been at for a few months now with GPT-4. The stockmarket certainly seemed satisfied, bumping Alphabet stock up more than 4% – but it'll be interesting to see how PaLM 2 performs in the harsh light of the real world over the coming weeks.
You can see the entire Google I/O keynote presentation in the video below.
Source: Google AI
The one truth I have learned in 25 years of being on the edge of technology in medical practice: that the only people that should never be able to touch medical software are software engineers…..they never get it right for many reasons.
@Drjohnf , You seem to think that human doctors are infallible. Though they are statistical based models ,they have shown to be more accurate than the average doctor.
It would hopefully free doctors up from basic diagnoses of illness, and give them time to do research, or help deal with conditions that didn't fit the most common statistical case.
I do think the risk is high though, current AI, because if its increasing similarity to humans, seems to make the same kind of errors that people do. e.g. if you test ChatGPT on basic math (pre-wolfram alpha plugin) it gets sloppy when doing more than 7 digit addition. That said, its easier to fix one AI and make it more rigorous in the future, than it is to fix every single future med student, and it takes a lot less time to train an AI.
As far as being cost affective, I think AI has a pretty steep initial cost, but scales very easily. So 1000 doctors is probably a lot more cost effective than 1 doctor AI, but 1 doctor AI might be cheaper than 100,000 doctors. And if its cheaper, that theoretically frees up resources to develop more cures for more diseases, or more treatments for more patients.