Data Codebook
CRITICAL Dataset
The CRITICAL dataset is the first cross-Clinical and Translational Science Award (CTSA) initiative to create a multi-site, multi-modal, de-identified clinical dataset. It combines deep-data depth with broad-data width, addressing a major unmet need in healthcare research. The dataset encompasses comprehensive longitudinal inpatient and outpatient data, including pre-, during- and post-ICU admissions, for approximately 400,000 distinct critical-care patients. This diverse dataset supports the exploration of urgent clinical problems and facilitates the development of fair and generalizable AI tools for advanced patient monitoring and decision support.
The dataset has been curated to serve the research community, fostering innovations in AI/machine learning (ML), outcomes research, and other translational science domains. Its unique combination of size, diversity, and comprehensiveness makes it a valuable resource for tackling long-standing clinical challenges.
The CRITICAL dataset is designed for:
- AI/ML research to develop and validate predictive models and decision support systems.
- Outcomes-related research for better understanding of critical-care patient trajectories.
- Exploration of clinical translation in broader research communities.
Key Features:
- Largest Benchmark Dataset: Contains data from approximately 400,000 critical-care patients, making it the largest publicly shared, disease-independent clinical dataset of its kind.
- Diversity: Includes extensive racial, ethnic, and geographic profiles to support fair and generalizable AI development.
- Comprehensive Coverage: Features longitudinal data capturing in-patient and out-patient records before, during, and after ICU admissions.
Users of the dataset are bound by the signed Data Use Agreement (DUA), which strictly prohibits any attempt to reidentify patients, among other requirements.
Structure
CRITICAL is released as a collection of CSV files. It structures its data according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model v5.3. Due to data availability at participating sites, CRITICAL includes a subset of the OMOP v5.3 tables.
Version 1.0 of the CRITICAL Dataset contains the following 17 OMOP tables, stored as CSVs. Links provided below are to the original OMOP documentation for each table, with any CRITICAL-specific conventions or differences noted.
The .zip file containing all of the tables is approximately 40GB. Unzipped, the files are approximately 300GB in total.
Adjustments
Tables have had the following adjustments/transformations for all records:
- Date shifting: Date shifting is applied to all date and date/time fields.
- Suppress source fields: All
*_source_value fields have been set toNULL
Tables
- PERSON
- OBSERVATION_PERIOD
- VISIT_OCCURRENCE
- VISIT_DETAIL
- CONDITION_OCCURRENCE
- DRUG_EXPOSURE
- Columns
stop_reason ,sig , andlot_number have been set toNULL for all records
- Columns
- PROCEDURE_OCCURRENCE
- DEVICE_EXPOSURE
- MEASUREMENT
- OBSERVATION
- Column
value_as_string has been set toNULL for all records.
- Column
- DEATH
- SPECIMEN
- LOCATION
- Only
location_id is provided, all other columns are set toNULL .
- Only
- CARE_SITE
- PROVIDER
- Columns
provider_name ,npi ,dea , andyear_of_birth have been set toNULL for all records.
- Columns
- DRUG_ERA
- CONDITION_ERA
How to Cite/Acknowledge
If you use the CRITICAL dataset in your research, please acknowledge it as follows:
CRITICAL Dataset. Sponsored by the NIH National Center for Advancing Translational Sciences (NCATS) through grant number U01TR003528.
Additionally, please include the following DOI citation in your publications: 10.5281/zenodo.14532192
License/Terms of Use
The CRITICAL dataset is provided under the terms outlined in the signed DUA. Access to the CRITICAL dataset requires institutions to sign a DUA and designate administrators (DRI Admins) to oversee user access. Individual users from approved institutions submit applications through the CRITICAL website, including their role, affiliation, and intended use of the data. After institutional administrators review and approve the application, users receive instructions to download the data for a 30-day period. For details, refer to the DUA.
Key provisions include but are not limited to:
- Prohibition of Reidentification: Users must not attempt to reidentify patients in the dataset under any circumstances.
- Responsible Use: Users must adhere to ethical research practices and comply with applicable regulations and institutional review board (IRB) requirements.