The purpose of this resource is to provide examples of analyzing claims data. Specifically, the resource offers explanations and videos on using synthetic claims data developed by the Centers for Medicare & Medicaid Services (CMS) and instructions on acquiring and using the data.

Administrative data related to health insurance claims is compelling for improving population health to address cost, quality, and outcomes. Health care is a data-intense industry. Information is collected routinely for clinical purposes as part of every health care encounter. Health care data is also created for other purposes, including payment via submitted claims. Claims data include information at the patient encounter level regarding diagnoses, treatments, and billed and paid amounts. Clinical data from electronic health records (EHR) are critical for analyses to improve health care delivery. However, claims data can effectively complement EHR data by providing a comprehensive view of a patient's interactions across the health care system continuum, reducing selection bias, and providing access to large and diverse samples (Stein et al., 2014).

Regardless of its importance, there are many challenges with using claims data. One challenge related to using claims is assessing data quality and accounting for incomplete or missing data. Other challenges include integrating data from multiple sources and developing methods for describing the utilization or appropriateness of care (Stein et al., 2014). Other technical challenges with creating specific datasets based on claims data include:

  • Converting claims into unique visits
  • Identifying incomplete claims data
  • Categorizing providers and locations of service
  • Selecting the most valuable measures of utilization and expenditures (Tyree, Lind, and Lafferty, 2006)

DE-SynPUF Overview

The Data Entrepreneurs' Synthetic Public Use File (DE-SynPUF) is a set of realistic claims data from 2008 through 2010 made available by CMS. The information in the dataset is accurate patient data but is provided in a format that protects patients' identities. The purpose of the dataset is to provide training in data analysis, data mining, and software development that may lead to increased knowledge from claims data in practice.

The DE-SynPUF consists of five types of administrative data linked together by a unique identifier at the patient level -- beneficiary summary, inpatient claims, outpatient claims, carrier claims, and prescription drug events. The dataset includes a 5 percent sample of Medicare beneficiaries in 2008, and the total sample includes over 100 million records across the three years sampled.

To acquire the DE-SynPUF data, go to the DE-SynPUF website and choose the data you want to download. You will see that the data is segmented into 20 unique samples. When you click on a sample, you can download all the datasets for that sample of beneficiaries. The video below offers an example of how to interact with the website and download a sample of the data.

Analyzing Claims Data

Claims data is a rich source that includes diagnoses, procedures, and utilization information. Numerous analyses can be conducted on claims data to derive information and knowledge to drive decision-making. Claims data can be used for comparing prices of health care services at local, state, regional, or national levels. Claims data can be used to compare services provided by providers or health care organizations based on specific diagnoses (or combinations of diagnoses). It can also be used to evaluate the quality of care provided by health care providers. According to the Pew Charitable Trusts, "claims data can reveal whether a doctor followed nationally recommended medical protocols for treating patients diagnosed with diabetes. How many received quarterly exams? Did they receive an eye exam? How many were admitted to a hospital?" (Vestal, 2014).

The video below shows the CMS DE-SynPUF claims data as an example of how claims data can be used for population health analytics. The exercise demonstrates how to determine high outpatient utilizers using outpatient claims data to examine their economic impact.


National Rural Health Resource Center

Was this information helpful?

Please include your email if you want us to follow up with you.