- On many metrics, US health measures are inferior to to those of other countries, despite substantial US spending on health care.
- OECD mortality metrics aren't adequate indicators of cross-country health status differences or heath care system efficiency.
- More carefully structured methodology is necessary to accurately identify cross-country health differences.
The United States spends substantially more on health care per capita than other developed countries. Based on comparison data of health status, the Organisation for Economic Co-operation and Development (OECD) published a report on health system performance, finding that the US system does not perform better than systems in countries that spend less. On many measures, US health status is inferior to those of other countries. We find these cross-country comparisons unable to adequately differentiate between health system performance and other confounding factors that determine health. In this Outlook, we provide a comprehensive critique of the OECD report and suggest several ways in which to strengthen the analysis. This includes improving the accuracy of infant mortality rates, employing life expectancy and premature mortality measures that are less sensitive to external factors, improving controls for external elements, and distinguishing between country-specific differences in health status and countries’ health care system efficiency.
Key points in this Outlook:
- The United States spends substantially more per capita on health care than other developed countries, yet commonly cited reports indicate that the United States does not have superior health system performance.
- The Organisation for Economic Co-operation and Development (OECD) uses mortality metrics to measure health care system performance, but these data do not adequately indicate health status differences and do not accurately judge health care system efficiency.
- The OECD and other researchers must adjust their methods for measuring infant mortality, life expectancy, and premature mortality and control for confounding factors such as lifestyle to give a more accurate picture of health system performance.
The United States spends substantially more on health care as a percentage of gross domestic product (GDP) than other developed countries. In 2010, US health care spending amounted to 17.9 percent of GDP, which worked out to $8402 per person. On the unadjusted measures customarily used to assess population health, US results are not better than those of countries that spend less, and on many of these measures, US outcomes are inferior.
This raises the question of whether the US health care system is inefficient. The primary source of comparison data on health outcomes is the Organisation for Economic Co-operation and Development’s (OECD) health system performance data and reports. This information is used to support broad criticisms of the US health care system and to compare it unfavorably with others, particularly the state-operated or state-controlled systems of Europe. Illustrations of such critiques include assessments by Washington Post columnist Richard Cohen and the Commonwealth Fund.
Using these health comparison data, the OECD Economics Department issued a major report in 2008, henceforth referred to as “the OECD report.” More recently, the OECD issued an expansion of the report, which is primarily based on the same underlying empirical analysis and was written by some of the same authors as the earlier report.
"The combination of higher delivery costs because of greater NICU use and the unique way the United States counts live briths could lead one to erroneously conclude that the United States is highly inefficient compared to other industrialized nations."
This Outlook offers a brief critical assessment of international health system performance metrics. We will focus on three statistics that the OECD delves into in its report: infant mortality, life expectancy, and premature death. The strengths and weaknesses of these measures are illuminated through brief examples that ultimately demonstrate that the measures do not reflect the efficiency of any country’s health system. Given that organizations such as the OECD continually try to evaluate countries’ health systems, US policymakers and analysts must understand the limitations of such exercises. We conclude with suggested changes in approach and a road map for improved research.
Before describing the key metrics for international comparison, it is useful to recall the relatively recent origin of international health statistics. The OECD was created in 1948 as the Organisation for European Economic Co-operation (OEEC) to administer funds made available by the US Marshall Plan for the reconstruction of Europe after World War II. Later, the OEEC’s membership was extended beyond Europe. In 1961, it was reformed into the OECD. Today, its members are thirty-four developed countries.
Over the last three decades, OECD has published a set of international health statistics based on data supplied by member countries. The data are collected and collated by the Health Division within the Directorate for Employment, Labor, and Social Affairs.
Health Status Metrics
A common misconception is that people value health care in and of itself. In reality, people value the improved health status that they hope to gain from receiving health care. Indeed, using most health care is unpleasant. Health status is not directly measurable; it can only be approximated through related factors that can be measured.
The OECD report focused on observable measures as proxies of health status to provide comparative statistics. A depressing reality is that these observable measures are all some derivative of mortality. The OECD expects all its member states to provide death registers as part of a planned, one-hundred-year public health mission to identify sources of death and time of death to track epidemiological emergencies such as those resulting from infectious diseases. In the service of OECD, mortality metrics are outcome measures that are meant to proxy health status and the output of health care systems, rather than the consumption of health services.
The OECD uses infant mortality, life expectancy, and premature death as measures of mortality in their report. The validity of each one of these measures as proxies for health system performance is examined below.
Infant Mortality. There are three overlapping OECD infant mortality measures: infant, neonatal, and perinatal mortality. Infant mortality is the number of deaths in the first year per one thousand live births. Neonatal mortality is the number of deaths in the first twenty-eight days per one thousand live births. Perinatal mortality is the number of deaths in the first week after birth, plus fetal deaths after twenty-eight weeks of gestation or fetuses that exceed a weight of one thousand grams.
Partly based on an argument by Nixon and Ullmann, the OECD report states that these infant mortality measures are less influenced by factors unrelated to the health care system than are other possible measures. However, we believe that the opposite is true. One major concern is that the basic definitions of infant mortality are not consistent across countries.
For example, babies who are not viable and who die quickly after birth are more likely to be classified as stillbirths in countries outside the United States, especially in Japan, Sweden, Norway, Ireland, the Netherlands, and France. This is especially likely for babies who die before their birth is legally registered. In the United States, however, nonviable births are often recorded as live births, making the US infant mortality rate appear misleadingly high. In a detailed study of medical records and birth and death certificates in Philadelphia, Gibson and colleagues found that infant mortality had been overstated by 40 percent, merely as a result of these nonviable births that were recorded as live births.
There is another problem with using infant mortality to represent health care efficacy. US physicians often go to great efforts—at the prenatal and postnatal stages—to save a baby with poor survival chances. The additional prenatal care an American doctor provides may improve the odds of the live birth of a baby with poor survival chances, who is then likely to require extensive neonatal care. Accordingly, the US uses substantially more neonatal intensive care units (NICU) than other industrialized countries. In this case, the additional health care may actually worsen reported infant mortality rates and misleadingly suggest poor care in the United States. Similarly, US physicians are more likely to resuscitate very small premature babies, many of whom nevertheless die and many others of whom live with serious and expensive medical problems. This practice also raises measured infant mortality rates for the United States.
The combination of higher delivery costs because of greater NICU use and the unique way the United States counts live births could lead one to erroneously conclude that the United States is highly inefficient compared to other industrialized nations. Furthermore, infant mortality is strongly and immediately affected by external influences such as the mother’s age, behavior, and lifestyle (meaning factors such as obesity and use of tobacco, alcohol, and illicit drugs). Infant mortality is strongly linked to birth weight and gestational age, which are highly, but not perfectly, correlated. Indeed, the correlation is high enough that researchers will often use one or the other measure according to conveniences. In any case, both measures are largely a result of parental lifestyles.
Teenage mothers are more likely to have preterm, low-birth-weight babies. The mortality rate for infants born to US teenage mothers is 1.5 to 3.5 times as high as the rate for infants born to mothers ages twenty-five to twenty-nine. The US rate of births for teenage mothers is very high—2.8 times that of Canada and 7.0 times that of Sweden and Japan. If the United States had the same birth weights as Canada, its infant mortality rate—adjusting for this variable alone—would be slightly lower than Canada’s (5.4 versus 5.5 per one thousand births).
Turning to gestational age, MacDorman and Mathews calculate that if the United States had the same distribution of gestational ages as Sweden, its recorded infant mortality rate would drop by 33 percent, tying it with France as the fifth lowest rate out of twenty-one developed countries. Moreover, in the United States, mortality rates for infants born to unwed mothers were about twice as high as for infants born to married women.
Overall, these lifestyle and socioeconomic factors may reflect poorly on some aspects of society in the United States in comparison to other countries. It is inappropriate, however, to conclude that the root cause is the US health care system rather than societal factors in a dynamic heterogeneous society. Infant mortality is a particularly misleading metric by which to grade country-specific health system performance and to make international comparisons.
"A further limitation of using potential years of life lost as a mortality measurement is that many deaths are caused by other external factors--such as obesity and pollution--which are disguised by the disease they cause."
Life Expectancy. In the abstract, life expectancy (LE) could be an effective metric for comparing international health systems. But there are problems with this measure. One important flaw is that it incorporates infant mortality, which, as discussed above, is confounded by external factors and is not identically measured across all countries covered in the OECD report.
Our main concern is the dependency of LE upon which benchmark age is used. For example, LE can be measured at birth or at older ages such as at the age of forty, sixty, or sixty-five. The OECD uses LE at birth. But LE at older ages is less affected by the measurement, lifestyle, and cultural problems inherent in infant mortality and in LE at birth. Measurement errors and definitional differences related to infant mortality do not directly affect LE at later ages.
Thus, the measurement errors and lifestyle and cultural influences that affect the infant mortality measure are directly imported into LE calculations. In a comparative study of the United States, the United Kingdom, and Germany, Martin Neil Baily and Alan Garber conclude:
Neonatal mortality is heavily influenced by social and economic factors, along with individual health behaviors, that are not strongly related to health care delivery. Overall life expectancy at birth, then, may be an unsuitable measure of health outcomes for the purpose of measuring productivity of health services.
As a result of the problems with infant mortality (as well as mortality due to violence and accidents), the difference between US life expectancy and that of other countries is reduced at later ages. This is demonstrated in empirical studies of the production of health, including in the OECD report itself and also in the raw data. For example, in 2000, female life expectancy at birth was 79.3 years in the United States, 80.3 in the United Kingdom and 81.2 in Germany. Female life expectancy at sixty-five was 19.0 years in the United States, 19.0 years in the United Kingdom and 19.6 years in Germany. The differences decline from 1.0 and 1.9 to 0.0 and 0.6.
Premature Mortality. Premature mortality, which is determined by potential years of life lost (PYLL), is a useful measure if appropriately calculated, though it is also strongly influenced by infant mortality. One advantage—stressed by the OECD—is that PYLL can be linked to cause of death. Since PYLL is calculated from deaths that occur before the defined full life (seventy years in the OECD report), one can include or exclude deaths based on their specific causes. This allows the analyst to reduce, but not eliminate, the confounding of some external causes with health care inputs and with country-specific effects. Oddly, the OECD does not use PYLL measurements for cross-country comparisons.
One can calculate PYLL numbers for categories of diseases that are more related to health care and analyze the effect of the health care system and other variables on PYLL by those categories. Miller and Frech have done this for the respiratory, circulatory, and cancer categories and Or, Wang, and Jamison have done the same for heart disease.
With this in mind, the OECD states that adjustments to PYLL numbers were made in one area, namely to exclude transport accidents, accidental falls, assaults, and suicides. However, while the OECD performs some analyses with these PYLL number adjustments, it does not do so for the country-specific analyses.
Though helpful, moreover, adjustments of PYLL numbers are not perfect. Accident and assault victims use health care resources, especially if they do not die quickly. But the costs associated with this care cannot be accounted for.
A further limitation of using PYLL as a mortality measurement is that many deaths are caused by other external factors—such as obesity and pollution—which are disguised by the disease they cause (respectively, circulatory and respiratory disease). PYLL cannot be adjusted to reflect these factors; the mediating disease, not the underlying external cause, will be recorded as the cause of death.
In the OECD report, the maximum age at which to establish PYLL is seventy. Thus, the costs and success (or lack of success) of a health care system in extending life and the quality of life beyond age seventy are not reflected. The authors of the report recognize that this is a weakness of this measure. The costs of this care for consumers ages seventy years or more are reflected in the OECD expenditure data, but the health outcomes are not reflected in the PYLL measure.
Accounting for Quality of Life
Mortality data are an inadequate proxy for health system performance for another reason: they measure years of life, but do not reflect the quality of that life. Mortality measures need to be adjusted to give a better picture of health status. The common terms for these adjusted measures are quality-adjusted LE (QALE), disability-adjusted LE (DALE), and health-adjusted LE (HALE). These adjustments depend on the values of the individual consumers and thus differ person by person. In practice, surveys of consumers or experts (typically panels of physicians) are used to find average weights to be applied in research. In some surveys, for instance, a year spent with a migraine headache is considered to be an indicator of very low quality of life and is counted as equivalent to only a month of healthy time; the year with the migraine would be weighted at one-twelfth, or 0.083 of a healthy year.
The OECD report, however, treats all years of life as the same, regardless of health status. HALE is discussed, but not used. The OECD report sticks with raw LE—rather than using quality-adjusted versions—because of the wider availability of unadjusted LE data, but at the expense of conceptual accuracy. As a result, the OECD report attributes no value to expenditures that permit people to enjoy a better life by, for example, being able to work or to be functional longer; it correlates expenditures only with mortality. Thus, money spent on knee replacements, for instance, would appear to be inefficient in that it does not decrease mortality, despite the obvious advantages of improved mobility and prevention of falls. Therefore, it is difficult to see mortality alone as an accurate measure of health system efficiency.
A Road Map for Improvement
We propose some improvements for future research of this kind, beginning with infant mortality. Infant mortality seems to be the least accurate measure of health status because it is most heavily influenced by factors external to the health care system. However, many of those external factors could be addressed by controlling for birth weight and gestational age. Keeping birth weight and gestational age constant would eliminate some of the confounding effects of lifestyle and other influences. The result of doing so is dramatic, as we have seen. One could form an index by picking a distribution of weights to multiply by the birth-weight-specific infant mortality rates.
LE at birth and PYLL numbers are at risk of being seriously flawed because of infant mortality miscalculations. Considering a version of PYLL that excludes most of the causes of death that affect infants would decrease this risk.
One can somewhat reduce the problem of confounding variables by focusing on LE at later ages. As discussed (and contrary to the assertions of the OECD report), infant mortality is highly influenced by external factors and by definitional and measurement problems. LE at later ages—such as at forty, sixty, or sixty-five—eliminates the people who have died before the selected ages. Furthermore, many of the lifestyle choices that lead to bad health outcomes are more heavily concentrated among younger consumers and affect LE more at younger ages. For example, in 2003-2005, annual US motor vehicle deaths peaked at 33 per 100,000 people at age seventeen. This peak was a maximum statistic that was not reached at any subsequent age.
Similarly, the all-injury death rate has an early peak at age eighteen. After that, the all-injury death rate does not catch up to that level until age seventy-five. Using LE at birth fails to adjust for these factors and incorrectly lowers the apparent efficiency of the US health care system.
Accidental and violent deaths need to be excluded from PYLL measurements in making country-level comparisons. The OECD pursues this to some extent by excluding certain accidental and violent deaths from their measurements. But since the PYLL results for country-level efficiency are not reported, the result of adjusting for these causes of death is not reflected. The country-level analysis is entirely in terms of LE and infant mortality, which have questionable validity.
"It is overreaching to interpret country-specific variation in health outcomes as a measure of health care system productivity."
Finally, since morbidity is so important, it would also be relevant to use a measure of quality-adjusted or disability-adjusted LE. This change would be a major contribution to the cross-country health status comparisons.
The OECD report raises important questions on how to determine the efficiency of health care in producing positive health outcomes and how to compare and contrast efficiency of systems among different countries. The OECD staff concludes that health care is highly productive in improving health outcomes and that efficiency varies greatly across countries. It provides country-specific estimates of that efficiency.
Unfortunately, major problems in OECD’s analysis render their conclusions—especially the country-specific conclusions—unreliable. Many external factors that influence health outcomes are either omitted or poorly measured. The net effect is to underweight the role that non-health care factors play in determining health. And since the United States scores relatively poorly on most of these external measures, omitting them or not adequately controlling for them increases the apparent relative inefficiency of the US health care system and probably biases the estimated productivity of health care as well. The OECD report controls to a limited extent for some lifestyle differences by gross measures (for example, consumption of alcohol, tobacco, fruits, and vegetables). It adjusts one health measure—PYLL—for violence and accidents, but does not use that measure for country-specific efficiency numbers. As explained above, we believe that these controls and adjustments are inadequate.
It is overreaching to interpret country-specific variation in health outcomes as a measure of health care system productivity. In reality, the country-specific estimates reflect all differences in country-level influences, whatever their source and measurement issues. As econometrician William Greene stated in a similar context, there are considerable differences among countries that masquerade as inefficiency. More carefully calibrated research is necessary to identify these differences.
H.E. Frech III ([email protected]) is an adjunct scholar at AEI and a professor in the Department of Economics at the University of California, Santa Barbara; Stephen T. Parente ([email protected]) is an adjunct scholar at AEI and a professor in the Department of Finance at the Carlson School of Management at the University of Minnesota; and John S. Hoff ([email protected]) is a visiting scholar at AEI and was health attaché to the US mission to the Organisation for Economic Co-operation and Development, 2005-2009.
1. Anne B. Martin et al., “Growth in US Health Spending Remained Low in 2010; Health Share Of Gross Domestic Product Was Unchanged From 2009,” Health Affairs 31, no. 1 (January 2012): 210.
2. Richard Cohen, “Boehner’s Health Delusion,” Washington Post, November 9, 2010, www.washingtonpost.com/wp-dyn
/content/article/2010/11/08/AR2010110804894.html (accessed May 24, 2012) and Karen Davis, Cathy Schoen, and Kristof Stremikis, How the Performance of the U.S. Health Care System Compares Internationally: 2010 Update (New York: The Commonwealth Fund, June 2010).
3. Isabelle Joumard et al., “Health Status Determinants: Lifestyle, Environment, Health Care Resources and Efficiency” (working paper, Economics Department, OECD Publishing, France, 2008).
4. Isabelle Joumard et al., Health Care Systems: Efficiency and Policy Settings (Paris, France: OECD Publishing, 2010).
5. OECD homepage, www.oecd.org/document/58/0,3746,en_2649_201185_1876671_1_1_1_1,00&&en-USS_01DBC.html (accessed June 15, 2012).
6. Journard et al., “Health Status Determinants,” 7.
7. Ibid., 47-48.
8. John Nixon and Philippe Ulmann, “The Relationship between Health Care Expenditure and Health Outcomes: Evidence and Caveats for a Causal Link,” European Journal of Health Economics 7 (2006): 7–18 and Journard et al., “Health Status Determinants,” 8.
9. Korbin Liu et al., “International Infant Mortality Rankings: A Look Behind the Numbers,” Health Care Financing Review 13, no. 4 (Summer 1992): 3; Kramer et al., “Registration Artifacts in International Comparisons of Infant Mortality,” Pediatric & Perinatal Epidemiology 16, no. 1 (January 2002): 16; and Marian F. MacDorman and T. J. Mathews, Behind International Rankings of Infant Mortality: How the United States Compares with Europe (Hyattsville, MD: National Center for Health Statistics, 2009): 2.
10. Eric Gibson et al., “Effect of Nonviable Infants on the Infant Mortality Rate in Philadelphia, 1992,” American Journal of Public Health 90, no. 8 (August 2000): 1303. For more on the measurement problems involved in infant mortality, see Journard et al., “Health Status Determinants,” 47-49; Kramer et al., “Registration Artifacts in International Comparisons of Infant Mortality;” and H. E. Frech III and Richard D. Miller Jr., The Productivity of Health Care and Pharmaceuticals: An International Comparison (Washington, DC: American Enterprise Institute, 1999), 28–29. In a comparison reported by Korbin Liu and Marilyn Moon, a small change in definition (combining infant mortality and stillbirths) moved the United States from eighteenth to fifteenth in infant mortality rate rankings and moved Japan from first to third. See Liu et al., “International Infant Mortality Rankings,”109. While in the rich countries, life expectancy is probably better measured than infant mortality, this relationship reverses in poor countries. In those countries, life expectancy is generally derived from infant mortality applied to model life tables, not any actual count of age-specific mortality. See Lant Pritchett and Lawrence H. Summers, “Wealthier is Healthier,” Journal of Human Resources 31, no. 4 (1996): 858–59.
11. Liu et al., “International Infant Mortality Rankings,” 113 and David M. O’Neil and June E. O’Neil, “Health Status, Health Care and Inequality: Canada vs. the U.S.,” Forum for Health Economics & Policy 10, no. 1 (2008): 8–12.
12. Liu et al., “International Infant Mortality Rankings,” 112.
13. O’Neil and O’Neil, “Health Status, Health Care and Inequality,” 10. For a more detailed analysis using slightly older data that makes similar calculations in comparisons to many other countries, see Liu et al., “International Infant Mortality Rankings.”
14. MacDorman and Mathews, Behind International Rankings of Infant Mortality, 3–5.
15. This relationship has likely weakened since the 1980s. Further, it is probably weaker in Europe, where unmarried fathers more often live with their children: Liu et al., “International Infant Mortality Rankings,” 112.
16. Martin Neil Baily and Alan M. Garber, “Health Care Productivity,” Brookings Papers on Economic Activity (1997): 143-215; 188.
17. The Organisation for Economic Co-operation and Development, “OECD Health Data 2011,” updated November 2011, www.oecd.org/document/16/0,3746,en_2649_37407_2085200_1_1_1_37407,00.html (accessed June 18, 2012).
18. While it is possible to adjust for cause of death, such an adjustment is not ideal for this study. For example, if more people with generally risky lifestyles die from accidents, the survivors have better than average lifestyles. Furthermore, as mentioned in the text, risky lifestyles directly raise health care use as well as PYLL.
19. Richard D. Miller Jr. and H. E. Frech III, Health Care Matters: Pharmaceuticals, Obesity and the Quality of Life (Washington, DC: AEI Press, 2004); Zeynep Or, Jia Wang, and Dean Jamison, “International Differences in the Impact of Doctors on Health: A Multilevel Analysis of OECD Countries,” Journal of Health Economics 24 (2005): 545.
20. Journard et al., Health Status Determinants, 8.
21. HALE is an extreme method for adjusting life expectancy for quality of life. It totally discounts years lived in poor health. Also, one can define quality-adjusted life years as well as the quality-adjusted life expectancies used in the text.
22. Miller Jr. and Frech III, Health Care Matters, 20–21.
23. Gwen Bergen et al., Injury in the United States: 2007 Chartbook (Hyattsville, MD: National Center for Health Statistics, 2008): 16; 18.
24. Journard et al., Health Status Determinants, 32–38; 69–72.
25. William Greene, “Distinguishing between Heterogeneity and Inefficiency: Stochastic Frontier Analysis of the World Health Organization’s Panel Data on National Health Care Systems,” Health Economics 13, no. 10 (2004): 959.