New AI University · Jobs Simplified

AI Models Fail Finance Test Due to Lack of Public Answers

Summary

  • Bridgewater and Thinking Machines Lab created a new AI model to handle financial tasks.
  • They tested it against well-known models GPT and Claude from OpenAI.
  • GPT and Claude failed the test because they didn't have access to the correct answers, which were not publicly available.
  • The new model achieved 84.7% accuracy, beating GPT and Claude in finance tasks.
  • The cost of creating this new model was significantly lower than its competitors.
  • The results of the test have not been verified by outside experts.

Why It Matters

  • This shows that even top AI models can struggle when faced with complex tasks without access to the right information.
  • It's a reminder that AI's performance depends on the data it's trained on and the environment it operates in.
  • As AI is increasingly used in finance and other critical areas, ensuring access to accurate and publicly available information is crucial.

GenAI EXPLAINED

Qwen3-235B model: Think of this as a specialized AI tool designed for a specific task, like a financial calculator. It's a type of model that's been fine-tuned for a particular job, making it more efficient and accurate in that area. Accuracy: Imagine you're trying to guess the correct answers to a series of math problems. Your accuracy would be how close your guesses are to the actual answers. In this case, the Qwen3-235B model achieved 84.7% accuracy, meaning it got 84.7% of the answers correct. Fine-tuning: Picture a person trying to get a piano to play a specific song. The piano can already play many songs, but the person needs to adjust the settings to make it play that particular song perfectly. Fine-tuning is like that – it's the process of adjusting an AI model to make it work better for a specific task.

SHARE THIS

WhatsApp LinkedIn

Save articles to read later — View Saved

READ NEXT

#4

Anthropic Launches Claude Science for Scientific Research Automation

Continue reading

MORE FROM THIS EDITION