Google Bar Surpasses GP4 on Arena Leaderboard: A Major Upgrade
Google Bar has made a significant leap on the Arena Leaderboard, surpassing GP4 to claim the second spot. The Arena Leaderboard, maintained by the LM CIS organization, is an open platform for LLN evaluation, collecting over 200,000 human preferences to rank LLNs using the ELO ranking system.
Excitement Surrounding Google Bar’s Achievement
The LM CIS organization tweeted about Google Bar’s achievement, and even Google retweeted the tweet, expressing excitement about the usage of Bar. This surprising development has left everyone in awe since Gemini Pro, which now holds the second position, previously held the eighth position on the leaderboard.
Understanding the Chat Arena Leaderboard
The Chat Arena Leaderboard is a crowd-sourced platform that benchmarks LLNs in real-world scenarios. Unlike the Hugging Face Open LLM Leaderboard, which tracks LLN performance on existing benchmarks, the Chat Arena Leaderboard presents users with two models randomly selected from a pool of 20+ proprietary and open-source large language models.
Users are presented with “Model A” and “Model B” without knowing which specific model they are. After generating responses, users can choose which response they prefer. For example, users can select “B” if they find it better. The leaderboard then reveals the model behind the chosen response, such as “B Gen 24 (Gemini Pro)” or “GBD4 Preview.”
The leaderboard tracks the average ELO score for different models based on user feedback. Currently, Bar with Gemini Pro holds the second position with an ELO score of 1215. However, since this ranking is based on human preferences, it may vary depending on the types of questions and queries asked.
Distinguishing Different Versions of Gemini Pro
There has been confusion regarding the different versions of Gemini Pro available on the leaderboard. The LMIS group provides explanations for three different versions:
- Gemini Pro: The Vortex AI API on Google Cloud
- Gemini Pro Dev: The Developer API on Google AI Studio (accessible for free with limitations)
- Bard January 24th Gemini Pro: The latest version, potentially not available to all users
The Bard version accessible on the Google website reflects the last update on December 18th, 2023. However, the version used in the Chat Arena Leaderboard is based on the January 2024 version, indicating that it is the latest one.
Gemini Pro’s Access to the Internet
The success of Gemini Pro in the latest update may be partially attributed to its access to the internet via the API. Unlike GP4 Turbo and other GP4 variants that solely rely on training data, the new version of Bar with Gemini Pro has access to the internet. This was confirmed by asking both GP4 and Bar about the winner of the Republican primary in Iowa.
While GP4 stated that Donald Trump won the 2020 Republican primary in Iowa, Bar provided more detailed information about the 2024 Republican caucuses held on January 14th. Bar listed the candidates and their respective vote percentages, although there was a minor discrepancy in the results.
It is worth noting that other models on the Chatbot Arena Leaderboard, such as those from Perplexity AI, also have access to the internet. However, the improved performance of Bar with Gemini Pro is a significant advancement.
Verification and Comparison on the Chatbot Arena
The Chatbot Arena offers the ability to run two models side by side to compare their responses. By selecting the B January 24th Edition of Bar and GP4 Turbo, the current leaderboard leader, users can test them on different prompts.
For example, when asked about a riddle regarding three killers in a room, GP4 Turbo provided two possible scenarios, while Bar with Gemini Pro emphasized responsible language use and the exploration of human behavior. This showcases Bar’s unique personality compared to GP4 Turbo.
It is important to note that while Bar with Gemini Pro has shown impressive performance, users should independently verify and cross-reference its responses to ensure accuracy.
The Future of Google Bar
The integration of the latest version of Bar powered by Gemini Pro is currently limited to the Chatbot Arena and not yet available as part of the Bard interface or the Gemini Pro API accessible through Google AI Studio. However, this development indicates that Google is catching up, and future updates, such as the potential release of Gemini Ultra, may further enhance Bar’s capabilities.
In conclusion, Google Bar’s major upgrade to surpass GP4 on the Arena Leaderboard is a significant milestone. The Chatbot Arena provides a unique platform for benchmarking LLNs and comparing their performance. While Bar’s improved performance is impressive, it is essential to exercise caution and independently verify its responses. The future holds exciting possibilities for Google Bar and its continued evolution.