Microsoft launched an AI system that outperforms human doctors in complex medical diagnoses. The AI, paired with OpenAI’s o3 model, solved over 80% of tough case studies from the New England Journal of Medicine. Meanwhile, practicing doctors only got 20% right under test conditions without help.
The system uses a “diagnostic orchestrator” that mimics a panel of expert physicians. It decides what questions to ask and which tests to order—like blood labs or X-rays—before reaching a diagnosis. This step-by-step approach mirrors real clinical reasoning, unlike typical multiple-choice medical exams that reward memorization.
Microsoft said this method can cover more medical fields than any individual doctor. It also cuts costs by ordering fewer unnecessary tests.
The company downplayed fears of doctor replacements, stressing this AI works alongside clinicians who handle patient trust and complex judgment.
“Their clinical roles are much broader than simply making a diagnosis. They need to navigate ambiguity and build trust with patients and their families in a way that AI isn’t set up to do,” Microsoft wrote in a blog post.
The AI team, led by Mustafa Suleyman, tested over 300 interactive cases. Besides OpenAI, their system tapped models from Meta, Anthropic, Elon Musk’s Grok, and Google’s Gemini.
Microsoft calls this progress a “path to medical superintelligence,” hinting at AI that beats humans not just in tasks but across medicine. But they caution it’s not ready for clinical use. More testing is needed on common conditions.
The research now heads for peer review, marking a significant milestone for AI in healthcare.