AI could make healthcare fairer—by helping us believe what patients tell us


To test this possibility, the researchers trained a deep-learning model to predict the patient’s self-reported pain level from their knee x-ray. If the resultant model had terrible accuracy, this would suggest that self-reported pain is rather arbitrary. But if the model had really good accuracy, this would provide evidence that self-reported pain is in fact correlated with radiographic markers in the x-ray.

After running several experiments, including to discount any confounding factors, the researchers found that the model was much more accurate than KLG at predicting self-reported pain levels for both white and Black patients, but especially for Black patients. It reduced the racial disparity at each pain level by nearly half.

The goal isn’t necessarily to start using this algorithm in a clinical setting. But by outperforming the KLG methodology, it revealed that the standard way of measuring pain is flawed, at a much greater cost to Black people. This should tip off the medical community to investigate which radiographic markers the algorithm might be seeing, and update their scoring methodology.

“It actually highlights a really exciting part of where these kinds of algorithms can fit into the process of medical discovery,” says Obermeyer. “It tells us if there’s something here that’s worth looking at that we don’t understand. It sets the stage for humans to then step in and, using these algorithms as tools, try to figure out what’s going on.”

“The cool thing about this paper is it is thinking about things from a completely different perspective,” says Irene Chen, a researcher at MIT who studies how to reduce health care inequities in machine learning and was not involved in the paper. Instead of training the algorithm based on well-established expert knowledge, she says, the researchers chose to treat the patient’s self-assessment as truth. Through that it uncovered important gaps in what the medical field usually considers to be the more “objective” pain measure.

“That was exactly the secret,” agrees Obermeyer. If algorithms are only ever trained to match expert performance, he says, they will simply perpetuate existing gaps and inequities. “This study is a glimpse of a more general pipeline that we are increasingly able to use in medicine for generating new knowledge.”