Why is it so hard to evaluate health program efficacy?
It’s often said that HR leaders are in the health insurance business – and it’s true. Sufficiently large companies tend to be self-funded, meaning they foot the bill for their health plan claims. Benefits teams do not just manage their company’s benefits package, but spend a substantial amount of time trying to manage the health (and therefore risk) of their members.
Over the years, self-funded employers have experimented with a number of strategies to improve health and reduce cost: high deductible health plans, narrow networks, plan design changes, wellness programs, and most recently, targeted health vendors. In most cases, logic tells us that these interventions will be effective. Measuring the causal effects of these interventions (the “ROI”) has proven elusive.
Return-on-investment (ROI) for a health benefits program is typically measured across three variables:
- Clinical outcomes
- Patient experience
- Financial savings
Most health benefits interventions have clear clinical goals, and are able to report against those targets. Similarly, customer satisfaction (CSAT) and net promoter score (NPS) surveys are quickly becoming commonplace, making it possible to assess patient experience.
Measuring financial savings remains a challenge. There are many reasons for year-over-year variance. Changes in cost could be random chance, or due to employee turnover, or even related to administrative changes. Even if observed costs go down (or, more realistically, simply grow at a slower rate than originally expected), how can benefits teams attribute that to a particular health intervention?
Why is it so hard to measure the effectiveness of health benefits interventions?
To understand why this is such a difficult problem, it helps to understand the possible analytical approaches.
At a high-level, there are two categories of traditional approaches:
- Controlled clinical trial1
- Pre / post analysis
A controlled clinical trial, also called a randomized controlled trial (RCT), is often referred to as the “gold standard” for evaluating the effectiveness of an intervention. However, these are rarely undertaken when it comes to evaluating benefits programs.
Why? This level of rigor is required by the FDA – but many digital health solutions fall outside the purview of the FDA2. It’s also time-consuming, the slow pace out of sync with the often breakneck speed of tech innovation. Lastly, the logistics and ethics of an employer attempting to only provide benefits to random participants is questionable.
This leaves pre-post analysis, in which the time period before an intervention is compared to the time period after the intervention. Unfortunately, this analytical framework is rife with pitfalls3. Common drawbacks include:
- Reversion towards the mean - Is the decrease in PEPM due to chance?
- Selection bias - Were the most motivated patients going to improve anyway?
This leaves employers in a difficult position. How can they realistically evaluate their health interventions, and only re-invest in high value programs?
Introducing the Accorded method for ROI analysis: bottoms-up benchmarking
The Accorded methodology avoids the inherent bias of pre- vs post-intervention analysis, and instead reviews the timeframe in which an intervention was active. And it takes a bottoms-up approach, meaning that performance is evaluated on a cascade:
The evaluation asks: For each member who engaged with the intervention, did they incur more, less, or similar claims cost to a comparable member?
The methodology to answer this is elegantly simple.
- Assign a core condition for the member, customized for age and gender
Age and gender are crucial to adjust for.
- Evaluate the financial performance on each individual member’s medical condition set
Adjust for differences in risk profiles
- Compare each member’s medical claims cost against a proprietary benchmark panel, developed on all combinations of core condition & comorbid condition, and customized for that member
- In the example below, member 1 and member 20 both incur claims for Dorsalgia – but because their core conditions are different, the expected costs would be different.
- The benchmark cost is the median of a log normalized distribution of all data points for members with the same claim condition, AND the same core condition.
Actuarial adjustments ensure precision
- Benchmark panels are adjusted for geography and trend. The performance score is calculated by smoothing for outliers in order to not over-credit or over-punish extreme ends of members’ spend.
Methodology prevents skew from high cost claimants
- Secondary diagnoses are included in the condition panel and given credit when there are no costs associated with the diagnosis.
- Roll up each member’s condition-specific performance score to a holistic member performance score
Evaluation at the member level ensures precision, but also allows for productive conversation between employers and their partners. This provides actionable insight into where the vendor did well, or could have done better.
- Finally, roll up all members’ scores to a population level
The holistic vendor evaluation is the weighted average of each individual engaged patient's performance result.
Below is an example of one member’s performance evaluation.
In this illustrative example, the member’s core condition is Thoracic Disc Disorder. They also incurred a variety of claims in other conditions, such as Dorsalgia, Sleep Disorders, and Cancer Screening.
Running their claims data through Accorded Platform’s Impact Atlas, we find that:
- This member had an overspend score of $60.74 in Thoracic Disc Disorder.
- Meaning that this member incurred $60.74 more in medical claims than we would expect for a comparable patient - comparable based on age, gender, geography, and comorbidity.
- Note that the benchmark expected cost was $34.12, and the member incurred $100.10, so they technically incurred $65.98 more than expected ($100.10 - $34.12 = $65.98). However, smoothing for outliers reduces this score to $60.74.
- Credit is given for areas where the member incurred less than expected spend.
- For example, the member had an underspend score of -$11.28 in Dorsalgia.
- Credit is given for areas where it might be expected to incur spend, but did not.
- For example, the member was diagnosed with Sleep disorders, but it only showed up as a secondary diagnosis.
- They incurred no primary spend for Sleep disorders.
- The benchmark cost for Sleep disorders for members with Thoracic disc as a core condition is $11.70 – so this member is credited part of the benchmark cost (credited $3.95 as part of the full $11.70 benchmark).
- Note: these scores are on a PMPM basis.
Having person-level performance scores allows for great flexibility in grouping the members in various population groups. This could be a comparison of various core conditions, diagnosis chapters, or engagement indicators. For example, by merging in engagement flags, this methodology can calculate the spend performance of all members who engaged with an employer’s targeted health program.
The custom-tailored benchmarks for each member precludes the need for comparing pre-period and post-period. Because members in both periods will be measured against custom benchmarks depending on their different condition risks, the methodology will produce performance scores for both periods separately.
- The practical realities of benefits administration make it difficult to evaluate health interventions with randomized controlled trials, and the inherent drawbacks of typical pre / post analysis make it difficult to draw sound conclusions.
- Most targeted health solutions are able to report on clinical and satisfaction outcomes, but financial outcomes remain elusive.
- Using customized, person-level benchmarking to analyze the timeframe during which an intervention was active avoids many of the pitfalls of pre / post analysis design.
Want to evaluate the efficacy of your health programs this year? Learn how the Accorded Platform is supercharging benefits teams and their consultants with the ROI analysis they need for renewal season. Schedule a meeting →
This article is a summary of efficacy analysis frameworks and an introduction to Accorded’s person-level (“bottoms up”) approach.
1Hariton E, Locascio JJ. Randomised controlled trials - the gold standard for effectiveness research: Study design: randomised controlled trials. BJOG. 2018 Dec;125(13):1716. doi: 10.1111/1471-0528.15199. Epub 2018 Jun 19. PMID: 29916205; PMCID: PMC6235704.
2Guo, C., Ashrafian, H., Ghafur, S. et al. Challenges for the evaluation of digital health solutions—A call for innovative evidence generation approaches. npj Digit. Med. 3, 110 (2020). https://doi.org/10.1038/s41746-020-00314-2
3Tofthagen C. Threats to validity in retrospective studies. J Adv Pract Oncol. 2012 May;3(3):181-3. PMID: 25031944; PMCID: PMC4093311.