Laude Institute just named its first winner for the K Prize, an AI coding challenge testing AI models on fresh GitHub issues. The winner is Eduardo Rocha de Andrade, a Brazilian prompt engineer, who scored just 7.5% correct answers—and bagged $50,000 for it.
The challenge launched by Databricks and Perplexity co-founder Andy Konwinski is designed to be tougher than existing benchmarks. It runs offline with limited compute, favoring smaller, open models over massive lab giants.
Konwinski explained the challenge’s edge:
“We’re glad we built a benchmark that is actually hard,” said Konwinski. “Benchmarks should be hard if they’re going to matter,” he continued, adding: “Scores would be different if the big labs had entered with their biggest models. But that’s kind of the point. K Prize runs offline with limited compute, so it favors smaller and open models. I love that. It levels the playing field.”
The K Prize uses only GitHub issues flagged after the submission deadline to avoid training contamination seen in other benchmarks like SWE-Bench. That’s why the top score here is a stark 7.5%, while SWE-Bench’s “easier” test hits 75%.
Konwinski also threw down a $1 million prize for the first open-source model to break 90% on K Prize.
He told TechCrunch:
“As we get more runs of the thing, we’ll have a better sense,” he told TechCrunch, “because we expect people to adapt to the dynamics of competing on this every few months.”
This new benchmark raises questions about AI coding tools’ real prowess. Princeton researcher Sayash Kapoor backs the move, warning existing scores might be inflated by leaderboard targeting or contamination.
Konwinski sums it up bluntly:
“If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% on a contamination free SWE-Bench, that’s the reality check for me.”
The K Prize is a fast, tough litmus test for AI coding skills—and it just set the bar a lot higher.