CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF)

Author: Centers for Medicare and Medicaid Services (CMS)
Centers for Medicare and Medicaid Services (CMS)

The Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) is a set of realistic claims data from 2008 – 2010 made available by CMS. The information provided in the dataset is real patient data but is provided in a format that protects patients’ identities. The purpose of the dataset is to provide training in data analysis, data mining and development of software that may lead to increased knowledge from claims data in practice.

The DE-SynPUF consists of five types of administrative data that are linked together by a unique identifier at the patient level -- beneficiary summary, inpatient claims, outpatient claims, carrier claims and prescription drug events. The dataset includes a 5% sample of Medicare beneficiaries in 2008, and the total sample includes over 100 million records across the three years sampled.

To acquire the DE-SynPUF data, go to the DE-SynPUF website and choose the desired data for download. Data is segmented into 20 unique samples. Click on a sample and choose to download all of the datasets for that sample of beneficiaries. The video below offers an example of how to interact with the website and download a sample of the data.

A video tutorial about using this data is available on as part of the Population Health Portal.