Qwen releases QwQ-32B, small reasoning model which rivals wi

Qwen releases QwQ-32B, small reasoning model which rivals with DeepSeek-R1 and o1-mini.

TL;DR:

  • Qwen is a serious player, just like DeepSeek
  • a 32B model that rivals with a 671B mixture of experts (37B params activated)
  • weights available on huggingface, no paper yet, you can chat with it on chat.qwen.ai
  • they used 2 stages of reinforcement learning, 1st one for math and coding tasks and 2md one for general capabilities
  • for math, reward is based on accuracy verifier, for coding it’s based on predefined tests cases that are actually executed
  • let’s wait for other benchmarks, Qwen’s look decent but interested to see if it replicates / what it looks like in the open LLM leaderboard