Ground Truth
Most of the cohort was manually audited, and the resulting dataset was treated as a high-fidelity ground truth composite. Canonicalization mapped heterogeneous grade descriptions to standardized labels. The regex process prioritized explic…
1 sources - 5 claims
Most of the cohort was manually audited, and the resulting dataset was treated as a high-fidelity ground truth composite. Canonicalization mapped heterogeneous grade descriptions to standardized labels. The regex process prioritized explicit diagnostic headers over summary or clinical history mentions. Ground truth labels were created through human-in-the-loop regex bootstrapping. Reproducibility is limited because the reports contain protected health information and are not publicly available.