LLM | Chan Kha Vu 🇺🇦

We trained 14B and 7B reasoning model surpassing DeepSeek R1 models twice their size in math olympiads

Improving DeepSeek R1 in Math

I joined a team and we trained 7B and 14B math reasoning models based on DeepSeek-R1-Distill using SFT and GRPO. Our 14B model achieved 75.8% Maj@32 on AIME’25 (+8.7% improvement), and our 7B model reached 65.8% Maj@32 (+7.5%). Here is what I’ve learned.

Changes introduced by RL process is low-rank

Are Reasoning Abilities Low-Rank?

Turns out, the RL training process of DeepScaleR-1.5B introduced only low-rank changes to its base model, DeepSeek-R1-Distill-1.5B.

I enabled ChatGPT to “see” images and made it play Dixit with my friends

To celebrate the week of Bing’s integration with ChatGPT, I built an AI bot based on GPT-3 and BLIP-2 to play Dixit and gathered some friends and co-workers to play against it.