News

AI’s Ability to Code Explored: Study Identifies Challenges in Autonomous Software Engineering | MIT News

July 17, 2025

AI's Ability to Code Explored: Study Identifies Challenges in Autonomous Software Engineering | MIT News

MIT CSAIL and partners drop a new AI-for-software engineering roadmap aimed at tackling the hard parts beyond simple code completion.

The paper flags big problems still blocking AI from fully automating software grunt work: refactoring, legacy migration, concurrency bug hunting, testing, and scale. It calls out flashy demos as misleading, saying real software isn’t just “undergrad programming exercises” or small code snippets.

Evaluation’s a mess. Current benchmarks only cover tiny fixes or single functions, missing the massive scope and complexity of real work like codebase-wide refactors or performance tuning in major engines. Without better ways to measure progress on these tasks, AI can’t improve the parts that matter most.

Human-machine interaction also hits limits. AI spits out large unstructured code dumps with weak tests. Developers lack control or visibility into when AI is confident or bluffing. The system can’t effectively use debugging or analysis tools like humans do—or know when to ask for help.

Scaling is another headache. Large company codebases differ wildly from public training data, causing AI to hallucinate or break internal rules. Retrieval methods pick similar-looking code, not functionally similar code, worsening errors.

The team wants community-wide pushes to gather richer dev data, build realistic evaluation suites, and create transparent tools. AI should expose uncertainties and invite human input, not just serve as a black-box autocomplete.

Alex Gu, lead author and MIT grad student, framed the urgent goal:

“Why does any of this matter? Software already underpins finance, transportation, health care, and the minutiae of daily life, and the human effort required to build and maintain it safely is becoming a bottleneck. An AI that can shoulder the grunt work — and do so without introducing hidden failures — would free developers to focus on creativity, strategy, and ethics.”

Armando Solar-Lezama, CSAIL PI and paper senior author, added:

“Everyone is talking about how we don’t need programmers anymore, and there’s all this automation now available… But there’s also a long way to go toward really getting the full promise of automation that we would expect.”

They’re presenting their work at ICML. The hope: bit by bit, improved models and tools will move AI from autocomplete sidekick to genuine engineering partner — amplifying human coders, not replacing them.

Read the full paper on arXiv.

byMark Sanjani

Published July 17, 2025