Top

Claude 3 Dethrones GPT-4 to Mark Phase Two in LLM Competition

Anthropic, one of OpenAI’s top competitors, introduced its latest Large Language Model (LLM), called Claude 3, in early March. The AI community was taken by surprise as Claude 3’s capabilities proved to be superior to that of OpenAI’s flagship GPT-4, marking the first instance of GPT-4 being outperformed. Meanwhile, Google’s Gemini Ultra trailed behind both.

The launch of Claude 3 ushered in what appears to be the second phase of LLM competition, where companies prioritize in-context understanding, robustness and reasoning over mere scale. The generative AI sector has recently been accelerating rapidly on the back of contributions from key players including OpenAI, Anthropic, Google, Meta and Mistral AI.

The first phase of the LLM competition was set in motion following the debut of OpenAI’s ChatGPT in late 2022. This phase was characterized by a race to scale, with companies vying to develop increasingly powerful models primarily focused on size and computational capabilities.

OpenAI’s GPT-4 once epitomized the zenith of these efforts, setting benchmarks for what generative AI could achieve in terms of understanding and generating human-like text. Many subsequent LLMs, including Google’s Gemini series, Anthropic’s Claude 2, Meta’s Llama series and Mistral AI’s Mistral Large, continued to challenge the dominance of GPT-4, yet failed.

However, the ascendancy of Anthropic’s Claude 3 signifies a paradigm shift to a new era. Now the battlefield has become multi-polarized.

Phase Two Begins

We think GPT-4 being surpassed by Claude 3 marks the second stage of LLM contests:

  • The Claude 3 family showcases three cutting-edge models, named Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, arranged by their growing capabilities. Claude 3 Opus is superior to GPT-4 in all key performance benchmarks.
A chart comparing Claude with GPT and Gemini across various parameters
Source: Anthropic website: Claude 3 announcement
  • Claude 3 has an unprecedented level of understanding in advanced science. For example, Kevin Fischer, a theoretical quantum physicist, was astounded by Claude 3’s grasp of his doctoral thesis.

A snip of Kevin Fischer's Tweet on Claude

A screen snip of Guillaume Verdon's Tweet on Claude

  • Claude 3 not only comprehends complex scientific principles but also exhibits a degree of emergent capability. For example, another expert in quantum computing was taken aback when Claude 3 reinvented his algorithm with just two prompts, without seeing his yet-to-be-published paper.

A screen snip of Guillaume Verdon's Tweet on Claude

  • The degree of Claude 3’s “meta-awareness” (can be just superb pattern-matching alignments with data created by humans) lets it figure out that it is being tested in a simulation in the needle-in-the-haystack evaluation. This testing method, just like “finding a needle in a haystack,” is designed to ascertain whether LLMs can accurately pinpoint key facts within hundreds of thousands of words. Initially invented by Greg Kamradt, a member of the open-source community, this approach quickly gained traction among major AI companies. Giants like Google, Mistral AI, and Anthropic now commonly showcase their new models’ performance through these tests.

A screen snip of Alex Albert's Tweet on Claude

Claude 3 Opus Recall accuracy over 200K

Claude 3 Opus Observations

What does it mean to be in the Stage Two of LLM competition?

Linear vs Accelerated Progress

We have observed that currently there is an accelerated rate of innovation progress in the LLM battlefield. Even though it is only March, a host of contenders, such as Google’s Gemini Ultra and Mistral AI’s Mistral Large, have already attempted to take the throne from OpenAI’s GPT-4. However, it was Anthropic’s Claude 3 Opus that emerged on top, marking a pivotal breakthrough in the ongoing quest for supremacy.

Open vs Close

The rivalry within the realm of closed-source LLMs has escalated, positioning closed-source generative AI technologies as a pivotal tactic for forging any company’s defensive “moat”.

For instance, Mistral AI initially captured attention with its impressive open-source Mixture of Experts (MoE) lean models but has now pivoted to spotlight its proprietary Mistral Large model.

Advice for Developers

In the ever-changing LLM landscape, developers need to understand that given your specific use case, making assessments that truly gauge a model’s strengths and weaknesses becomes more important than blindly trusting the general benchmarks:

  • Stay agile, ready to integrate newer models or versions as they become available. Today’s choice might need reassessment tomorrow.
  • A blend of understanding each model’s unique strengths, continuous exploration and adaptation cannot be more emphasized, given the specific needs of your applications.
  • Much like the varied tactics of donning armor for battle, adapting your prompts is crucial to maximizing a model’s potential. Comprehensive tutorials are readily available online to guide you.
Wei is a senior consultant in Counterpoint specializing in Artificial Intelligence. She is also the China founder of Humanity+, an international non-profit organization which advocates the ethical use of emerging technologies. She formerly served as a product manager of Embedded Industrial PC at Advantech. Before that she was an MBA consultant to Nuance Communications where her team successfully developed and launched Nuance’s first B2C voice recognition app on iPhone (later became Siri). Wei’s early years in the industry were spent in IDC’s Massachusetts headquarters and The World Bank’s DC headquarters.

Term of Use and Privacy Policy

Counterpoint Technology Market Research Limited

Registration

In order to access Counterpoint Technology Market Research Limited (Company or We hereafter) Web sites, you may be asked to complete a registration form. You are required to provide contact information which is used to enhance the user experience and determine whether you are a paid subscriber or not.
Personal Information When you register on we ask you for personal information. We use this information to provide you with the best advice and highest-quality service as well as with offers that we think are relevant to you. We may also contact you regarding a Web site problem or other customer service-related issues. We do not sell, share or rent personal information about you collected on Company Web sites.

How to unsubscribe and Termination

You may request to terminate your account or unsubscribe to any email subscriptions or mailing lists at any time. In accessing and using this Website, User agrees to comply with all applicable laws and agrees not to take any action that would compromise the security or viability of this Website. The Company may terminate User’s access to this Website at any time for any reason. The terms hereunder regarding Accuracy of Information and Third Party Rights shall survive termination.

Website Content and Copyright

This Website is the property of Counterpoint and is protected by international copyright law and conventions. We grant users the right to access and use the Website, so long as such use is for internal information purposes, and User does not alter, copy, disseminate, redistribute or republish any content or feature of this Website. User acknowledges that access to and use of this Website is subject to these TERMS OF USE and any expanded access or use must be approved in writing by the Company.
– Passwords are for user’s individual use
– Passwords may not be shared with others
– Users may not store documents in shared folders.
– Users may not redistribute documents to non-users unless otherwise stated in their contract terms.

Changes or Updates to the Website

The Company reserves the right to change, update or discontinue any aspect of this Website at any time without notice. Your continued use of the Website after any such change constitutes your agreement to these TERMS OF USE, as modified.
Accuracy of Information: While the information contained on this Website has been obtained from sources believed to be reliable, We disclaims all warranties as to the accuracy, completeness or adequacy of such information. User assumes sole responsibility for the use it makes of this Website to achieve his/her intended results.

Third Party Links: This Website may contain links to other third party websites, which are provided as additional resources for the convenience of Users. We do not endorse, sponsor or accept any responsibility for these third party websites, User agrees to direct any concerns relating to these third party websites to the relevant website administrator.

Cookies and Tracking

We may monitor how you use our Web sites. It is used solely for purposes of enabling us to provide you with a personalized Web site experience.
This data may also be used in the aggregate, to identify appropriate product offerings and subscription plans.
Cookies may be set in order to identify you and determine your access privileges. Cookies are simply identifiers. You have the ability to delete cookie files from your hard disk drive.