Project lead: @Benjamin Warner

Discord channel: https://discord.com/channels/1025299671226265621/1400621466483167334

GitHub Repo: https://github.com/MedARC-AI/amfv

MedARC Meeting Calendar: public link | iCal link

Untitled

MedARC Agentic Medical Fact Verifier

An Open-Source Version of Baichuan-M3’s Medical Fact Verification System

image.png

Overview

Reducing hallucination in medical language models requires turning a property of long form text (“this answer/reasoning trace contains a falsehood”) into a quantifiable signal which can be used for dataset filtering or as an RL reward. Baichuan-M3 improves upon Baichuan-M2 in part by focusing on medical fact verification to reduce hallucination rates.

The Baichuan-M3 team broke this task into three models with four steps:

  1. Medical Claim Decomposer to break down input text, documents, reasoning traces, model outputs, into individual medical claims
  2. Medical Fact Verifier takes these claims and compares them against a database of previously fact checked claims: “Claim X is supported by evidence set Y [under scope Z] as of date T”
  3. If this is a new claim, a Medical Search Agent is dispatched to find supporting or contradictory evidence from a curated medical corpus
  4. These results are returned to the Medical Fact Verifier which decides if the fact is supported, unsubstantiated, or unclear, and adds a new entry to the medical fact database

We will base our versions of the Medical Claim Decomposer, Medical Fact Verifier, and Medical Search Agent on the following papers/tech reports:

These might be prompted off the shelf models or up to three separate models we train ourselves.