Reminder: Our goal is to train an open source model

image.png

Baichuan-M2’s training system

Just a quick reminder before we start, our end goal for this project is to train an open source medical reasoning model, probably using GPT-OSS 20b as the base. We are working our way right to left on this chart, starting with evaluations and RL environments, then we will perform RL with our environments, work on SFT data, then mid-training data. (Remember this is a research project, so plans might change as we learn along the way)

MedARC Evals*

This month we want to finalize our evaluation environments and use them to benchmark open and proprietary model’s medical knowledge. We will release our results as a MedARC blogpost, crediting anyone who has made a meaningful contribution.

This will likely be implemented using a helper script/lightweight library to tie all our medical evaluation environments together and report all the results back in one easy to manage location

Incomplete List of Open Questions

Timeline

We’d like to release our blogpost and benchmark at the end of October during the week of the 27th.

Tentative timeline: