We trained 14B and 7B reasoning model surpassing DeepSeek R1 models twice their size in math olympiads

Improving DeepSeek R1 in Math

I joined a team and we trained 7B and 14B math reasoning models based on DeepSeek-R1-Distill using SFT and GRPO. Our 14B model achieved 75.8% Maj@32 on AIME’25 (+8.7% improvement), and our 7B model reached 65.8% Maj@32 (+7.5%). Here is what I’ve learned.

Real game situation with Dixit GPT bot

I enabled ChatGPT to “see” images and made it play Dixit with my friends

To celebrate the week of Bing’s integration with ChatGPT, I built an AI bot based on GPT-3 and BLIP-2 to play Dixit and gathered some friends and co-workers to play against it.