
Microsoft said it is one step closer to “medical superintelligence” after a new artificial intelligence (AI) tool beat doctors at diagnosing complex medical problems.
Tech giants are racing to develop superintelligence, which refers to an AI system that exceeds human intellectual abilities in every way – and they’re promising to use it to upend healthcare systems around the world.
For the latest experiment, Microsoft tested an AI diagnostic system against 21 experienced physicians, using real-world case studies from 304 patients that were published in the New England Journal of Medicine, a leading medical journal.
The AI tool correctly diagnosed up to 85.5 per cent of cases – roughly four times more than the group of doctors from the United Kingdom and the United States, who had between five and 20 years of experience.
The model was also cheaper than human doctors, ordering fewer scans and tests to reach the correct diagnosis, the analysis found.
Microsoft said the findings indicate that AI models can reason through complex diagnostic problems that stump physicians, who specialise in their fields but are not experts in every aspect of medicine.
However, AI “can blend both breadth and depth of expertise, demonstrating clinical reasoning capabilities that, across many aspects of clinical reasoning, exceed those of any individual physician,” Microsoft executives said in a press release.
“This kind of reasoning has the potential to reshape healthcare”.
Microsoft does not see AI replacing doctors anytime soon, saying the tools will instead help physicians automate some routine tasks, personalise patients’ treatment, and speed up diagnoses.
How the model works
Microsoft’s AI system made diagnoses by mimicking a doctor’s process of collecting a patient’s details, ordering tests, and eventually narrowing down a medical diagnosis.
A “gatekeeper agent” had information from the patient case studies. It interacted with a “diagnostic orchestrator” that asked questions and ordered tests, receiving results from the real-world workups.
The company tested the system with leading AI models, including GPT, Llama, Claude, Gemini, Grok, and DeepSeek.
OpenAI’s o3 model, which is integrated into ChatGPT, correctly solved 85.5 per cent of the patient cases, compared to an average of 20 per cent among the group of 21 experienced doctors.
Limitations and next steps
The researchers published their findings online as a preprint article, meaning it has not yet been peer-reviewed.
Microsoft also acknowledged some key limitations, notably that the AI tool has only been tested for complicated health problems, not more common, everyday issues.
The panel of doctors also worked without access to their colleagues, textbooks, or other tools that they might typically use when making diagnoses.
“This was done to enable a fair comparison to raw human performance,” Microsoft said.
The company called for more real-world evidence on AI’s potential in health clinics, and said it will “rigorously test and validate these approaches” before making them more widely available.