Microsoft MDASH tops CyberGym, uncovers 16 Windows bugs
SkimNews Take
The multi-agent architecture's success suggests that distributed AI systems, rather than single, monolithic models, may be the more effective paradigm for complex tasks like cybersecurity.
Get the Tech newsletter
Daily tech — startups, AI labs, chips, the launches that shape the next decade. Free.
- Microsoft introduced MDASH, a multi‑model AI system that uses over 100 specialized agents across frontier and custom models to scan code, debate findings, and construct proof‑of‑concept attacks.
- Microsoft’s MDASH scored 88.45% on the UC Berkeley CyberGym benchmark, topping the leaderboard ahead of Anthropic’s Mythos (83.1%) and OpenAI’s GPT-5.5 (81.8%).
- Microsoft disclosed 16 new Windows vulnerabilities found by MDASH, including four critical remote‑code‑execution flaws that were patched in the latest Patch Tuesday release.
- Anthropic’s Mythos, a single‑model system inside an agent framework, placed second on the CyberGym benchmark with an 83.1% score, as reported by the company.
- OpenAI’s GPT-5.5 achieved an 81.8% score on the CyberGym benchmark, also a self‑reported figure.
- The CyberGym benchmark scores are self‑reported and have not been independently verified, meaning the results may not fully reflect real‑world performance.
- Microsoft plans to use MDASH internally and roll it out in a limited private preview for customers, warning that AI will make future Patch Tuesdays “bigger” as vulnerability discovery accelerates.
Why it matters: Microsoft’s MDASH gives its security teams and early‑adopter customers a faster way to spot exploitable bugs, forcing patch cycles to accelerate while also highlighting that the same multi‑agent AI can be weaponized by attackers; the self‑reported CyberGym scores, lacking independent verification, underscore the need for caution in interpreting AI‑driven security metrics.



