ASSESSING DATA QUALITY
Quick Navigation
- Data as indicators
- Data definitions
- Data collection
- Comparing the data
- So what does it all mean?
- Why pay attention to an IMR?
- Is a national IMR enough?
- Footnotes
- Additional resources of interest
- Boxes:
Why do children die? - How important a cause of child mortality is HIV/AIDS?
- Can infant mortality rates be used to assess the overall health of a population?
This article is about data quality, and its goal is to help you learn to critically assess the quality public health data. Two estimates for infant mortality rates are used as an example. The first section offers some basics on public health data. The second expands the use of infant mortality rates as a teaching example, asking you to analyze the difference between two datasets. The final section looks at the consequences of these differences and at the use of infant mortality rates to assess population health.
Infant mortality rate: Understanding a vital statistic
We see plenty of numbers for health status indicators from reputable sources, but what do these numbers mean? Why are these the data chosen as worthy of attention, and what do they really indicate? How do numbers from one source compare with those from another source, and what does this mean to policy considerations? This exercise uses a comparison of infant mortality rates reported by the World Health Organization (WHO) and the Demographic and Health Surveys (DHS) as tool to examine these questions. This interactive exercise is intended to help you find and critically evaluate public health data.
Data as indicators
The Association of Schools of Public Health defines the field of public health as “the science of protecting and improving the health of communities through education, promotion of healthy lifestyles, and research for disease and injury prevention.” In short, the goal of public health is to improve the health and well-being of people – at the population rather than the individual level. Thus, measurement of health focuses on population health, which is defined as “the health outcomes of a group of individuals, including the distribution of such outcomes within the group.” Study of population health includes “health outcomes, patterns of health determinants, and policies and interventions that link these two.”1

© 2009 Juhee Kim, Courtesy of Photoshare
One initiative to advocate for improved health and well-being at the population level is the Millennium Declaration. Efforts to quantify the ideas laid out in the declaration created eight Millennium Development Goals, or MDGs. But how to measure progress toward these goals? Sixty indicators were chosen. Selection of these indicators was hotly debated, and as Richard Alderslade frequently says, we live under the “tyranny of the easily counted.”Nevertheless, the MDG indicators are generally described as reasonably valid measures of that which they are supposed to count. Progress toward achieving the MDGs, as measured by the indicators, is actively tracked and assessed. For this exercise, we will be using data gathered to measure Goal 4: “Reduce child mortality,” which has the target of “Reduce by two-thirds, between 1990 and 2015, the under-five mortality rate.”The indicators chosen to measure progress toward this target are 1) under-five mortality rate, 2) infant mortality rate, and 3) the proportion of 1 year-old children immunized against measles. (See pages 30 to 36 of the United Nations Development Group’s (UNDG) Indicators for Monitoring the Millennium Development Goals for comments on MDG 4’s indicators). The data we use in this exercise to teach assessment of public health data is for the second MDG 4 indicator: infant mortality.
Data definitions
Why do children die?
How important a cause of child mortality is HIV/AIDS?
What is an infant mortality rate (IMR)? When we see a number for this, what does it mean? What is the numerator? What is the denominator? How are these data gathered?
Two datasets with records of infant mortality rates are those of the World Health Organization (WHO) and the Demographic and Health Surveys (DHS). Let’s start this investigation by comparing their definitions of infant mortality.
How does DHS define infant mortality? How does the WHO? Are these the same?
Click here for the answer ![]()
Why are infant mortality rates a “probability”? DHS explains:
A vital statistics approach in which the numbers of deaths to children under age 12 months in a particular period are divided by the numbers of births in the same period. What is estimated is a rate of mortality but not a probability; a variation in the number of births with time will change the rate without changes in the underlying probabilities. To correct this, separation factors would need to be used, which would have to come from the other variants.
The UN explanation described in their “methods of computation” is also useful.
Regardless, the DHS and WHO definitions look to be similar enough to be considered the same, right? But are they in fact measuring the same thing?
To answer this question, let’s first look at the definition of the numerator and denominator for their probability equations. Find the WHO definition of the numerator for infant mortality. How do they define the numerator? The denominator?
Click here for the answer ![]()
Now do the same for the DHS. How do they define what is included in the numerator? How do they define their denominator?
Click here for the answer ![]()
Notice, DHS uses the WHO definition of live birth, but there are some differences in their definitions.
Now, how is the DHS ratio expressed? The WHO’s?
Click here for the answer ![]()
Are their ratios the same? Notice again, that while both are legitimate statistical approaches for what to include and how to make these calculations, there are some differences. These differences come not so much from a stand on the methods used to make probability calculations as from the messiness usually associated with data collection and the need to adjust the results for greater accuracy than actual counts will produce.
Data collection
The UNDG Indicators for Monitoring the Millennium Development Goals offers this comment on where to get reliable data: “The best source of data is a complete vital statistics registration system-one covering at least 90 per cent of vital events in the population.” They go on to say that this kind of data is generally not available in low- or even middle-income countries. When this is the case, estimates are obtained “from sample surveys or derived by applying direct and indirect estimation techniques to registration, census or survey data” (Page 33).
When consistently, thoroughly and accurately gathered, vital statistics are the “gold standard” because they are not estimates but are a record of actual events. In other words, vital statistics record all births and deaths, and the records often include a cause of death along with some demographic information.
Large, carefully done surveys, on the other hand, can also offer considerable reliability. (Sample size is particularly important when gathering IMR data because it is a relatively rare event (UNDG, page 33).) The advantage of a survey over a vital statistics system is that surveys can gather additional, key data, linking household and community data to the numbers of births and deaths. The additional information permits analysis of both physical and social determinants of health. The drawback of surveys is that even when able to create a sample that accurately reflects the spectrum of people within the entire population (something difficult to accomplish), surveys rely upon an individual’s memory and understanding of events. These are often flawed.
Thus, the ideal method of data collection is to make use of vital statistics and supplement these with large, representative surveys - confirming and adjusting the results of each to produce an even more accurate and reliable estimate that includes data on health determinants.2, 3
So, how do the WHO and DHS collect their data? What are their methods?
Click here for the answer ![]()
Do the differences in data collection methods produce different results?
COMPARING THE DATA
To answer whether the data is different we must look at the data itself, and this means downloading it from their websites and doing some basic statistical comparisons. The first step is to access WHO infant mortality data. Use the WHOSIS database (available on the WHO website), for all countries and download the data as a comma separated values file.
If you need help with downloading from WHOSIS, here are the steps: Go to the WHOSIS detailed database search. For this exercise, “Select all countries” for the regions/countries; the indicator is “Infant mortality rate (per 1000 live births) for both sexes”; and the time period is “Select all time periods.” Next, create the table. The table will display values for the years 1990, 2000 and 2006. To download the data as comma separated values, select the “Export (.csv)” link.4
Now do the same for the DHS statistics. Go to the DHS STATcomplier to build a customized table. First, select “All surveys” from the surveys available. To select the indicator “Infant mortality,” click on “Early childhood mortality,” then “Infant and child mortality,” and next “Infant and child mortality (5 year rates). A box will pop up. In this box, check “Infant mortality (1q0)” and “All values,” and click the select button. Finally, click the button labeled “Create Table.” When the table comes up, select the button “Sort - Survey,” and then “Export to Excel.”
What does the data reveal? Which countries have the highest rates of infant mortality in 1990? Which have the lowest rates? What about in 2005? What might be some reasons for the improvement in some countries and the deterioration in others?

© 2009 Chris Holden, Courtesy of Photoshare
Now compare results based on the different data sets. What do you see? A quick glance tells you that these two datasets are not immediately comparable. What do you see that makes these two lists different? Try merging some of the data to get a comparison of relative equals. What do you see? Do any countries change position in their overall ranking?
To see the spreadsheet that we created with all the data merged, sign in to Google documents using the account name “nyumph” and the password “nyumph12.” Open the spreadsheet document named “dhs & whosis data only.”
Now what do you see? From an aerial view (without analyzing the numbers), are the datasets the same? Yes, the years are not exactly the same, but absent armed conflict, an epidemic or widespread famine, do death rates generally change substantially from year to year? Should there not be similarities? How different are these data? What follows helps answer these questions.
When there were similar data from each agency, we looked at both the absolute difference (positive or negative) and the percentage difference between the the infant mortality rate estimates of the WHO and the DHS. To see our results, go again to Google documents using the account name “nyumph” and the password “nyumph12.” This time, open the spreadsheet document named “compare dhs & whosis.” As you can see, there is no universal trend in the difference in the samples available (approximately 50). Table 1 (below) summarizes the differences we found between the two sets of infant mortality data. Notice, that there is no consistent pattern in the differences between the two data sets.
Table 1 - Differences between WHOSIS and DHS infant mortality rates
|
|
Minimum |
Maximum Value |
Mean |
Median |
|
Absolute difference between WHOSIS IMR & DHS IMR c1990 |
-26.5 |
67.9 |
9.0 |
7.2 |
|
% difference between WHOSIS & DHS c1990 |
-55.2% |
61.3% |
8.1% |
10.0% |
|
Absolute difference between WHOSIS IMR & DHS IMR c2000 |
-24.9 |
35.9 |
0.2 |
-0.7 |
|
% difference between WHOSIS & DHS c2000 |
-67.3% |
34.1% (Georgia) |
-1.9% |
-1.0% |
|
Absolute difference between WHOSIS IMR & DHS IMR c2006 |
-10.1 |
85.8 |
11.5 |
0.4 |
|
% difference between WHOSIS & DHS c2006 |
-53.2% |
55.2% |
5.0% |
0.8% |
How else would you compare the two datasets?
We wondered if perhaps, while there is no clear relationship between data points within a given year, is there is a similarity in trends over time? In other words, even if the numbers are not the same for a particular country in a given year, did infant mortality rates decrease (or increase) at similar rates in both data sets? How would you examine this question?
We looked at the question of trends by comparing the percentage of change over time in the WHOSIS data for a given country and the DHS percentage change for that country. Take a look at the spreadsheet “compare whosis & dhs w/more” in Google documents using the account name “nyumph” and the password “nyumph12.” What do you see? We see again that the data in the two dataset are not consistent, even when it comes to identifying a national trend. Table 2 (below) is a summary of our findings. In Table 2, a negative result means that the WHOSIS data shows a greater percentage decrease or a smaller percentage increase in the infant mortality rate than the DHS data during the period identified. A positive value means the reverse.
Table 2 - Differences between trends over time between WHOSIS and DHS infant mortality rates*
|
|
Minimum Value |
Maximum Value |
Mean |
Median |
Sample size |
|
Difference between WHOSIS & DHS percentage change from c1990 to c2000 |
-81.9 |
|
|
|
|
|
Difference between WHOSIS & DHS percentage change from c2000 to c2006 |
|
|
|
|
|
|
Difference between WHOSIS & DHS percentage change from c1990 to c2006 |
|
|
|
|
|
*The data in this table does not assess progress of infant mortality rates. Rather it looks at the relationship between the WHOSIS and the DHS data. It identifies no single type of difference between these data. One method does not consistenly yield lower numbers or higher numbers within a particular country or even among countries. Instead, the data from the 48 countries with reports from both WHOSIS and DHS in the years c1990 to c2006 showed 31 countries with both WHOSIS and DHS reporting an overall decrease in the infant mortality rate, 7 countries in which both reported an increase in the infant mortality rate, 9 in which WHOSIS reported a decrease but DHS reported an increase, 1 in which WHOSIS reported an increase but DHS reported a decrease in infant mortality.
SO WHAT DOES THIS ALL MEAN?
Excerpt from Rajaratnam et al.’s description of their methods to produce better IMR estimates
“We also analysed survey and census data for deaths in the household. We adjusted estimates on the basis of household deaths from single surveys by use of the growth balance method. When completeness of death reporting was estimated to be more than 100%, we adjusted the death rates downwards, with the logic that respondents might be telescoping deaths – i.e., including deaths that occurred outside the recall period in the period of recall. Child death registration is usually lower than is adult death registration, so estimates corrected upward (24 in total) must be viewed as lower bound estimates of child mortality. Sensitivity of our results to inclusion of these sources is presented in webappendix pp 10-11.”
What do the differences in datasets mean to those who use this data, whether to evaluate progress toward achieving the MDGs or to determine the effectiveness of an intervention? Appropriate responses include looking more deeply into the data or trying to gather more, or both. Looking more deeply into the data and gathering more can include combining good data from multiple sources, making comparisons of what is found, digging more deeply into anomalies and then reworking estimates to make them better. The UNDG offers their list of reputable data sources for infant mortality data (Page 34).
Gathering more data can also mean using multiple methods, e.g., surveys, focus groups and in-depth interviews. Methods that go beyond vital statistics alone not only develop the context and complexity behind the numbers but can also clarify and offer insights into why any differences in the data might exist.
Julie Knoll Rajaratnam (Institute for Health Metrics and Evaluation) et al.’s Neonatal, postneonatal, childhood, and under-5 mortality for 187 countries, 1970-2010: a systematic analysis of progress towards Millennium Development Goal 4 is an example of gathering more data by combining multiple datasets. They also looked more deeply into the data, comparing the numbers and adjusting the results using advanced statistical methods to produce much more accurate estimates than could be produced using one source alone. The article offers a brief description of their methods. Their webappendix offers considerable more detail. (These are accessible through free registration with The Lancet.)
Once data is gathered and established as reasonably reliable, the question becomes, “What does this data mean?” Why did rates go down, or up? What were the interventions? Were some interventions more successful than others? Was progress greater among some groups? Answers to these questions are not found in the data but require additional research.
Why pay attention to an IMR?
Can infant mortality rates be used to assess the overall health of a population—young and mature?
So far, we have demonstrated that gathering reliable data is expensive and challenging. So a reasonable question is, why bother? Why use funds to gather data that could be used to distribute care? The reason is that it is impossible to answer the questions we just raised without reliable measures of population health. First, we must know that interventions do no harm. Next, money to improve population health is not limitless. Identifying which interventions have greatest success is critical, as is identifying under what conditions and with whom these interventions are successful. Also critical is learning why they were successful.5 In short, data on outcomes is needed to identify how best to use the resources available to achieve better population health. It is also important to identifying who is being left behind.
So, what statistics best measure population health and well-being? So far, we’ve concentrated on death rates among very young children. Is this rate actually important? Would longevity be a better measure? Perhaps. But was the long (or short) life a vibrant one, without pain or trauma, or was it difficult? Does the indicator measure only physical health, or are mental and emotional health incorporated? QALY,6 DALY and HALE are examples of attempts to integrate quality of life measurements into longevity, but even if we were able to measure this perfectly, what about age and gender differences, urban and rural, rich and poor, and the like? Again, we are back to the groups question. Are some people being left out of progress toward improved health and life? Effectively measuring population health means being able to disaggregate data.
That said, infant mortality rates can offer a quick look into an entire population’s health. Infant mortality rates are a measure of both mother and child health. In many settings, women and children are two more physically and socially vulnerable groups. If these two groups are healthy, the likelihood is strong that others are healthy as well, including adult males. The UNDG puts it this way: Infant mortality rates “reflect the social, economic and environmental conditions in which children (and others in society) live, including their health care.” (For arguments on using IMR as a measure of population health, see the box Can infant mortality rates be used to assess the overall health of a population—young and mature?) The UNDG describes another advantage of tracking mortality rates over some of the more comprehensive measures described above - data availability: “Since data on the incidence and prevalence of diseases (morbidity data) frequently are unavailable, mortality rates are often used to identify vulnerable populations” (Page 32-33).
In the end, although IMR is an imperfect measure, because it includes economic, social and environmental conditions, it can offer a shorthand description of overall population health. It can also be used to identify those especially vulnerable within that population.
Is a national IMR enough?
Disparities in infant mortality rates reveal that some groups are more vulnerable than others. Within country differences are sometimes greater than between country differences. Rates differ for the urban and rural, rich and poor, boys and girls.
Table 3 is a breakout of infant mortality rates in South Africa based on income quintile. The ratio of high-income to low is 1:3.62. In other words, for every baby born into wealth who dies before age one, nearly four born into poverty are lost.
Table 3: Sample indicators of child health by wealth quintile in South Africa, 1998 7
|
Childhood mortality |
low |
2nd |
3rd |
4th |
high |
|
Infant mortality rate per 1000 live births |
61.6 |
51.6 |
35.8 |
34.0 |
17.0 |
|
Under-five mortality rate per 1000 live births |
87.4 |
71.0 |
48.6 |
39.8 |
21.9 |
Is there a difference between boys and girls?
“Girls have a survival advantage over boys during the first year of life, largely based on biological differences. This is especially so during the first month of life when perinatal conditions are most likely to be the cause or a contributing cause of death. While infant mortality is generally higher for boys than for girls, in some countries girls’ biological advantage is outweighed by gender-based discrimination (see also INDICATOR 13, 'Under-five mortality rate’). However, under-five mortality better captures the effect of gender discrimination than infant mortality, as nutrition and medical interventions are more important after age one” (UNDG, page 33).
There is also disparity among income groups in Brazil (2002): for every infant death in the richest 20% of households in Brazil, two babies die in the poorest (15.8 per 1000 for the richest 20%, and 34.9 for the poorest).8
During 1991 to 1994, there were 28 additional infant deaths per 1000 live births in rural India as comparted to urban India: 80 deaths in rural and 52 in urban settings.9
Great differences also exist within the United States. The states of Massachusetts and Minnesota have infant mortality rates below 5 per 1000 live births, while the IMR of the District of Columbia’s is 12.6. Louisiana and Mississippi are not far behind the District of Columbia, with infant mortality rates over 10. These rates are all still better than the richest of South Africa and Brazil, on average. However, the “on average” is important. African-American children living in the U.S. state Hawaii have an infant mortality rate of 20.5. In eleven U.S. states, African-American children have infant mortality rates greater than 1510 - that’s a higher infant death rate than the national average for more than 70 of the world’s countries.11
These disparities represent unnecessary deaths. If it is possible for more children in one region or among one group to keep living, then why not does it not happen in other regions or among other groups? What is present or missing from the lives and circumstances these children who die unnecessarily? Factors with both a stastical and logical (common sense) relationship to infant mortality rates are access to clean water and improved sanitation; maternal or child nutrition status; access to health care, including the presence of a skilled birth attendant, the level of prenatal care, or the distance from advanced care and emergency obstetrical intervention; maternal demographics such as the age of mother, her level of education, the number of previous births, the length of time since the previous pregnancy; and child demographics such as size and weight at birth and the sex of child. Policies aimed at reducing infant mortality must investigate the causes of higher infant mortality and target interventions at these causes.
But the first step to this is to track their deaths.
FOOTNOTES
1. Kindig David and Greg Stoddart (2003). What is population health? American Journal of Public Health 93.3 (March 2003): 380-383.
2. For more on estimating infant and child deaths, see the description of methods in JK Rajaratnam et al.’s Neonatal, postneonatal, childhood, and under-5 mortality for 187 countries, 1970-2010: a systematic analysis of progress towards Millennium Development Goal 4 and their Supplementary webappendix. (The article and appendix are accessible through free registration with The Lancet.) CJL Murray et al. describe WHO methods of statistical estimation in Modified Logit Life Table System: Principles, Empirical Validation and Application, GPE Discussion Paper No.39.
3. The UNDG Indicators for Monitoring the Millennium Development Goals (page 34) includes their recommendations for data sources to use when comparing and establishing more accurate estimates of IMR.
4. WHOSIS is being incorporated into WHO’s Global Health Observatory (GHO) Database, therefore, an alternate approach to this data is through the GHO. Once on their home page, click “World Health Statistics,” Mortality and burden of disease, and then “Child mortality.” A table will appear with three indicators: child, infant and neonatal mortality rates. To get just infant mortality rates plus the additional years needed for this exercise, click on the “Indicators” button, and select “Infant mortality.” To download this data, click the “Export” button. This table gives you 2008 data instead of the 2006 we used in this exercise, so your results will differ a little from those we discuss below, but the ideas will be the same.
5. Identifying which elements are necessary for successful intervention requires research outside of that used to establish reliable vital statistics, such as infant mortality rates.
6. See also Cerl Phillips’ “What is a QALY?”; A. Bowling’s Health-related quality of life: A discussion of the concept, its use and measurement background: the 'Quality of life’; and our duscussion under the subheading "Measures of disease burden" in the Test your knowledge of public health section, Question 2. Epidemiological transition.
7. Gwatkin, Davidson, Shea Rutstein, Kiersten Johnson, Eldaw Abdalla Suliman, Adam Wagstaff, and Agbessi Amouzou. Socioeconomic Differences in Health, Nutrition, and Population - South Africa. Washington, D.C.: World Bank, April 2007, pages 3-4, 24-27. Childhood Illness and Mortality Definitions: Infant mortality rate: number of deaths to children under 12 months of age, based on experience during the ten years preceding the survey. Under-five mortality rate: number of deaths to children under five years of age, based on experience during the ten years preceding the survey.
8. UNICEF. Brazil: Wide disparities in infant mortality rates between and within selected regions, by family income and by mother’s ethnicity, 2002.
9. UNESCAP. Reducing disparities: Balanced development of urban and rural areas and regions within the countries of Asia and the Pacific. United Nations, 2001.
10. Kaiser Families Foundation. 50 State Comparisons.
ADDITIONAL RESOURCES OF INTEREST:
Black, Robert E, Saul S Morris and Jennifer Bryce (2003). “Where and why are 10 million children dying every year?” The Lancet 361.9376 (28 June 28 2003): 2226-34.
Davalos, Maria (2007). Sources of International Poverty and Inequality Data for Monitoring MDG 1: Primary, Secondary and Household survey data sources. Prepared for the UNPD Poverty Group, New York, November 2007.
Fotso, Jean-Christophe, Alex Chika Ezeh, Nyovani Janet Madise and James Ciera 2007). “Progress towards the child mortality millennium development goal in urban sub-Saharan Africa: the dynamics of population growth, immunization, and access to clean water.” BMJ Public Health 7 (27 December 2007): 218-228.
Hogan, Margaret C, Kyle J Foreman, Mohsen Naghavi, Stephanie Y Ahn, Mengru Wang, Susanna M. Makela, Alan D Lopez, Rafael Lozano and Christopher JL Murray (2010). “Maternal mortality for 181 countries, 1980-2008: a systematic analysis of progress towards Millennium Development Goal 5.” The Lancet 375.9726 (8 May 2010): 1609-1623. (Accessible through free registration with The Lancet.)
Institute for Health Metrics and Evaluation. Child Mortality (Global).
Laxminarayan, Ramanan, Anne J Mills, Joel G Breman, Anthony R Measham, George Alleyne, Mariam Claeson, Prabhat Jha, Philip Musgrove, Jeffrey Chow, Sonbol Shahid-Salles, Dean T Jamison. “Advancement of global health: key messages from the Disease Control Priorities Project.” The Lancet 367.9517 (8 April 2006): 1193-1208. (Accessible through free registration with The Lancet.)
Mosley, WH, and LC Chen (1984). “An analytic framework for the study of child survival in developing countries.” Population and Development Review 10 (1984): 25-45. An extract was published in the Bulletin of the World Health Organization 81.2 (2003): 140-149. See also, Hill, Kenneth. “Frameworks for studying the determinants of child survival.” Bulletin of the World Health Organization 81.2 (2003): 138-139.
Murray, Christopher JL, Thomas Laakso, Kenji Shibuya, Kenneth Hill and Alan D Lopez (2007). Can we achieve Millennium Development Goal 4? New analysis of country trends and forecasts of under-5 mortality to 2015. The Lancet 370.9592 (22 September 2007): 1040-1054. (Accessible through free registration with The Lancet.)
The New York Times Letters to the Editor (2010). “A Welcome Fall in Maternal Deaths.” 18 April 2010: Mary Robinson, Serra Sippel, Rachel Ward and Nan Strauss, and Dorothy Balaba Byansi.
Rajaratnam, Julie Knoll, Jake R Marcus, Abraham D Flaxman, Haidong Wang, Alison Levin-Rector, Laura Dwyer, Megan Costa, Alan D Lopez, Christopher JL Murray (2010). Neonatal, postneonatal, childhood, and under-5 mortality for 187 countries, 1970-2010: a systematic analysis of progress towards Millennium Development Goal 4, published in The Lancet 375.9730 (5 June 2010): 1988-2008). (Accessible through free registration with The Lancet.)
Sahn, David E and David C Stifel (2003). “Progress Toward the Millennium Development Goals in Africa.” World Development 31.1 (January 2003): 23-52.
UNICEF's Multiple Indicator Cluster Survey (MICS).
World Health Organization's Partnership for Maternal, Newborn and Child Health.
World Health Organization Europe (2007). “Disparities in progress towards the Millennium Development Goals on reducing child and maternal mortality.” Fact sheet 08/07. Belgrade/Copenhagen: World Health Organization, 17 September 2007.
