MedARC Agentic Medical Fact Verifier

An Open-Source Version of Baichuan-M3’s Medical Fact Verification System

Overview

Reducing hallucination in medical language models requires turning a property of long form text (“this answer/reasoning trace contains a falsehood”) into a quantifiable signal which can be used for dataset filtering or as an RL reward. Baichuan-M3 improves upon Baichuan-M2 in part by focusing on medical fact verification to reduce hallucination rates.

The Baichuan-M3 team broke this task into three models with four steps:

Medical Claim Decomposer to break down input text, documents, reasoning traces, model outputs, into individual medical claims
Medical Fact Verifier takes these claims and compares them against a database of previously fact checked claims: “Claim X is supported by evidence set Y [under scope Z] as of date T”
If this is a new claim, a Medical Search Agent is dispatched to find supporting or contradictory evidence from a curated medical corpus
These results are returned to the Medical Fact Verifier which decides if the fact is supported, unsubstantiated, or unclear, and adds a new entry to the medical fact database

We will base our versions of the Medical Claim Decomposer, Medical Fact Verifier, and Medical Search Agent on the following papers/tech reports:

Claim Decomposer: the FactScore literature, including VeriScore, OpenFActScore, & VeriFastScore
Fact Verifier: Med-V1 and VeriFastScore
Search Agent: Chroma Context 1

These might be prompted off the shelf models or up to three separate models we train ourselves.