OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes

Victor Tangermann

Fri, September 13, 2024 at 5:15 PM UTC

4 min read

Generate Key Takeaways

On Thursday, OpenAI released its long-awaited AI model that had been hyped up under the code named "Strawberry."

The Sam Altman-led company made some big promises in its announcement, claiming that its "o1-preview" AI model "performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology."

With its new "human-like" ability to "reason," the AI model can tackle even more "complex tasks" and "harder problems," according to the company.

But as early testers have already discovered firsthand, it's still miles away from replacing a human scientist or coder.

In fact, if recent posts making their rounds on social media are anything to go by, the o1-preview is still often struggling with the absolute basics.

For instance, INSA Rennes researcher Mathieu Acher found, it's still repeatedly suggesting illegal chess moves in response to certain puzzles.

Tasks as basic as counting also remain elusive. In one example flagged by Meta AI scientist Colin Fraser, Strawberry attempts to take on a rudimentary word puzzle about a farmer transporting sheep across a river — and accidentally abandons the correct answer in favor of illogical garble at the end.

Even entering an prompt OpenAI used in its demo — a logic puzzle fittingly involving a strawberry — gave users varying answers.

"o1-preview gives the wrong answer to this prompt 75 percent of the time," one user found.

In fact, some users are claiming, the model is even still sometimes struggling with one of the most confounding word problems for AI language models: how many times the letter "R" appears in the word "strawberry."

In all fairness, OpenAI was clear right from the start that its latest AI is still a work in progress.

"As an early model, it doesn't yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images," the company wrote in its announcement. "For many common cases GPT-4o will be more capable in the near term."

Thanks to a new "chain of thought" process, o1-public differs significantly from its predecessors like GPT-4o, which powers the company's popular ChatGPT chatbot. Instead of spitting out the first answer it can produce, it takes its time to build out iterative answers before arriving at a conclusion.

That can extend its response time significantly. As one user found, the new AI model took 92 seconds to come up with an answer to a word riddle — before bungling the answer.

OpenAI research scientist Noam Brown, who worked on the new model, argued that having it take its time could result in some groundbreaking answers.

"OpenAI's o1 thinks for seconds, but we aim for future versions to think for hours, days, even weeks," he tweeted. "Inference costs will be higher, but what cost would you pay for a new cancer drug? For breakthrough batteries? For a proof of the Riemann Hypothesis?"

Those lofty conclusions didn't sit well with noted AI critic Gary Marcus.

"I really like a lot of your work, but the tweet below rubs me the wrong way," he wrote in response, "because it invites the inference that running versions of o1 for weeks or months might create a new cancer drug (in reality, at best you just get new candidates, but still need to do the clinical work), or create breakthrough batteries (again you aren’t going to shortcut the lab work) or prove the Riemann Hypothesis."

"This is not realistic," he added. "As you acknowledge o1 is still unreliable even at tic-tac-toe, and in some cases no better than earlier models. Longer processing times are unlikely to reach transcendent reasoning."

(To be fair, Brown also conceded that the new model is still flubbing certain answers, including ones as fundamental as tic-tac-toe.)

Marcus is tapping into a heated debate surrounding the tremendous hype gripping the AI industry.

As companies continue to lock down billions of funding — OpenAI is looking to raise a whopping $6.5 billion from investors, boosting its already sky-high valuation to $150 billion — skeptics and tech investors alike are growing uneasy about the amount of money being poured into the tech, nevermind its environmental impact.

In short, the company's latest AI still falling for the same old traps isn't exactly confidence-inducing.

OpenAI promised that it's only the beginning, though, symbolically naming its model to reset the "counter back to 1" — which, given it's stumbling right out of the gate, might end up being an appropriate name after all.

More on OpenAI: OpenAI Just Released Its Long-Awaited "Strawberry" Model

About Our Ads

Solve the daily Crossword

The daily Crossword was played 14,032 times last week. Can you solve it faster than others?The daily Crossword was played 14,032 times last week. Can you solve it faster than others?

Crossword

Today in Tech

OpenAI's New "Strawberry" AI Is Still Making Idiotic Mistakes

Solve the daily Crossword

Recommended articles