A crowdsourced “final exam” for AI promises to test LLMs like never before. Here’s how the idea, and its implementation, dooms us to fail.
There are all sorts of situations we face in life where we need to consult someone with expert, specialized knowledge in an area where our own expertise is insufficient. If there’s a water leak in your house, you may need to consult a plumber. If there’s pain and immobility in your shoulder, you may need to consult an orthopedist. And if there’s oil leaking from your car, you may need to consult an auto mechanic. Sure, you can attempt to solve these problems on your own — consulting articles on the internet, reading books, leveraging trial-and-error, etc. — but no matter expert-level knowledge you yourself possess, there will always be a place where your expertise ends. If you want to know more, you’ll have no choice but to either seek out a source who knows what you don’t, or to figure it out for yourself from scratch.
At least, those are the only avenues that were available until recently: consult an expert, figure out the answer on your own, or go and ignorantly mess with the problem at hand while hoping for the best. However, the rise of AI, particularly in the form of large-language models…