TLDRs on Googles Gemma, spoiler dont use Gemma 7B yet? N
⚡ TLDRs on Google’s Gemma, spoiler: don’t use Gemma 7B (yet?)
👞 Non technical TLDR
-
Google “open weighted” 2 models: Gemma-2B and Gemma-7B, which are direct competitors of Microsoft’s phi2 and Mistral AI’s Mistral-7B resp.
-
They report that their 7B model is better on most benchmarks than Mistral-7B (the current leader for this “size class”)
-
The community (and I) see the contrary so far, but it wouldn’t be the first time that errors in implementations hinder performances, it has only been 4 days, let’s give it some time
-
Google created a Gemma tuning competition on Kaggle, awarding $10k / winner: they are pushing adoption
-
If Gemma is indeed worse, it could just be seen as a move from Google to be able to sit at the “LLM open [[sourcers]]” table without contributing significant value
🔬 Technical TLDR
-
Google released weights for Gemma-2B & Gemma-7B + a technical report + a standalone C++ inference engine implementation
-
2B trained on 2T tokens, 7B trained on 6T tokens. 6T is a lot but rumours say Mistral-7B was trained on 8T
-
Both models are now available in llama.cpp and ollama
-
Vocabulary 8x larger than Llama2’s (256k vs 32k)
-
Google observes better performance with Gemma 7B compared to Mistral 7B, the community (me included) observes the contrary so far
-
I tested it using llama.cpp, it would not be the first time that errors in implementation degrades performances, it’s maybe too early to judge, let’s see
-
Google introduced yet a new chat template (check attached screenshot)
-
They kinda lie on the number of parameters: Mistral-7B has 7.2B parameters; Gemma-7B has 8.5B
-
You can try Gemma on Hugging Face Chat & Perplexity: link in the comments
-
Waiting for chatbot areana benchmarks
Links in the comments