From Clinical Trials to Real-World Impact: Introducing a Computational Framework to Detect Endpoint Bias in Opioid Use Disorder Research. Article

Odom, Gabriel J, Brandt, Laura, Marker, Aaron et al. (2026). From Clinical Trials to Real-World Impact: Introducing a Computational Framework to Detect Endpoint Bias in Opioid Use Disorder Research. . DRUG AND ALCOHOL REVIEW, 45(1), e70085. 10.1111/dar.70085

cited authors

  • Odom, Gabriel J; Brandt, Laura; Marker, Aaron; Giorgi, Salvatore; Jainarain, Ganesh; Schwartz, H Andrew; Au, Larry; Castro, Clinton; ENDPOINT Consortium

authors

abstract

  • Introduction

    Clinical trial endpoints are a 'finite sequence of instructions to perform a task' (measure treatment effectiveness), making them algorithms. Consequently, they may exhibit algorithmic bias: internal and external performance can vary across demographic groups, impacting fairness, validity and clinical decision-making.

    Methods

    We developed the open-source Detecting Algorithmic Bias (DAB) Pipeline in Python to identify endpoint 'performance variance'-a specific algorithmic bias-as the proportion of minority participants changes. This pipeline assesses internal performance (on demographically matched test data) and external performance (on demographically diverse validation data) using metrics including F1 scores and area under the receiver operating characteristic curve (AUROC). We applied it to representative opioid use disorder (OUD) trial endpoints.

    Results

    F1 scores remained stable across minority representation levels, suggesting consistency in precision-recall balance (F1) despite demographic shifts. Conversely, AUROC measures were more sensitive, revealing significant performance variance. Training on demographically homogeneous populations boosted internal performance (accuracy within similar cohorts) but critically compromised external generalisability (accuracy within diverse cohorts). This pattern reveals an 'endpoint bias trade-off': optimising performance for homogeneous populations vs. having generalisable performance for the real world.

    Discussion and conclusions

    Consistently performing endpoints for one demographic profile may lose generalisability during population shifts, potentially introducing endpoint bias. Increasing minority representation in the training data consistently improved generalisability. The endpoint bias trade-off reinforces the importance of diverse recruitment in OUD trials. The DAB Pipeline helps researchers systematically pinpoint when an endpoint may suffer 'performance variance' (i.e., bias). As an open-source tool, it promotes transparent endpoint evaluation and supports selecting demographically invariant OUD endpoints.

publication date

  • January 1, 2026

published in

keywords

  • Algorithms
  • Bias
  • Clinical Trials as Topic
  • ENDPOINT Consortium
  • Humans
  • Opioid-Related Disorders

Digital Object Identifier (DOI)

Medium

  • Print

start page

  • e70085

volume

  • 45

issue

  • 1