2 Data sources
The Hospital-TTD-Mod relies on a synthesis of routine administrative healthcare data, national health surveys, published epidemiological literature, and official economic tariffs.
This chapter outlines the primary data sources utilised within the engine and the specific role each plays in the simulation architecture.
2.1 Routine administrative data
The foundational demographic and clinical data for the model is constructed from routine hospital data and national population statistics.
2.1.1 Admitted Patient Care (APC) Hospital Episode Statistics (HES)
The primary measure of acute service utilisation is derived from the English Hospital Episode Statistics (HES) Admitted Patient Care dataset. HES contains record-level data on all admissions to NHS hospitals in England.
HES data provides the volume of admissions, the calculation of unique individuals, and the specific diagnostic categorisation (using ICD-10 codes) of both focal admissions and subsequent readmissions.
The model processes adult admissions (aged 16 and over). It includes only patients who had an ordinary admission to hospital, and who did not die before discharge. It also excludes maternity related admissions.
The causal assignment of a hospital admission to a specific disease is based on a scan of the first five diagnostic positions of the admission episode of a continuous inpatient spell. If a tobacco-related diagnosis is found, then the diagnosis in the highest ranked diagnostic position is used to attribute the admission to one of the disease categories used in the model. To identify patient comorbidities for each patient, all diagnostic positions for all treatment episodes recorded in the year are scanned.
2.1.2 Office for National Statistics (ONS) mortality data
The model incorporates mortality risks for patients discharged from hospital.
ONS mortality rates provide age-, sex-, and IMD-specific baseline survival probabilities. Disease specific mortality risks are not used. The incorporation of mortality risks for discharged patients limits the volume of future readmissions that can be generated by the cohort, ensuring that health economic savings are not artificially inflated.
Note: this was not a feature of the initial Excel version of the model.
2.2 National health surveys
Routine hospital data does not comprehensively capture smoking status. Therefore, the model calibrates hospital data against national survey estimates.
2.2.1 Health Survey for England (HSE)
The HSE is an annual cross-sectional survey of the health of the general population living in private households in England.
A pooled sample of the HSE (2017 to 2019) is used to estimate baseline smoking prevalence (current, former, and never smokers) across specific demographic strata (age, sex, and IMD) and broad clinical condition categories.
As detailed in Chapter 3, these general population estimates are dynamically calibrated against local acute hospital audit data to reflect the higher prevalence of smoking within admitted hospital cohorts.
2.2.2 Sub-national estimates
To estimate regional impact using the model, synthetic population estimates of smoking at neighbourhood level in England are used (see forthcoming RiskPopLocal publication). The estimates are made using spatial microsimulation via Iterative Proportional Fitting. This method integrates individual microdata from pooled Health Survey for England (HSE) data for the 2017, 2018, and 2019 annual waves with small-area demographic data (age/sex, ethnicity, quintile of IMD 2019, highest qualification, legal partnership status, self-assessed general health, and National Statistics Socioeconomic Classification (NS-SEC)) from the 2021 UK Census.
2.3 Epidemiological literature
The translation of successful smoking cessation into long-term disease reduction requires parameterisation from established epidemiological literature (Webster et al. 2018).
2.3.2 Risk decay curves after quitting smoking
The reduction in disease risk following smoking cessation is not instantaneous.
The model utilises condition-specific risk decay curves to calculate dynamic relative risks over time. These curves define the speed and shape of the risk reduction (e.g., rapid decay for cardiovascular events, slow decay for respiratory for respiratory conditions, and even slower decay for cancers). Type II diabetes is assumed to follow the cardiovascular decay curve, and other conditions not cardiovascular, respiratory or cancer are, conservatively, assumed to follow the cancer curve.
The temporal lag structures are parameterised based on estimates derived from the Cancer Prevention II study in the United States (Oza et al. 2011; Kontis et al. 2014).
2.4 Economic parameters and tariffs
The health economic evaluation model requires unit costs and quality-of-life multipliers to calculate short-term budget impacts and long-term cost-utility.
2.4.1 NHS payment scheme tariffs
The financial value of a prevented hospital admission is calculated using official NHS tariffs.
The model assigns an average Healthcare Resource Group (HRG) unit cost to each disease category. This provides the gross financial saving associated with each prevented readmission event.
Unit costs are drawn from the NHS Payment Scheme (e.g., 2023/24 prices workbook).
2.4.2 Lifetime cost savings
While the Tier 1 evaluation focuses on short-term readmissions, the Tier 2 evaluation requires an estimation of the total lifetime societal healthcare cost avoided by a successful quit.
The model applies age- and sex-specific lifetime cost-saving multipliers to the projected volume of successful long-term quitters. This is only applied to patients estimated to achieve a lifetime quit.
These multipliers are derived from the economic modelling work by Godfrey et al. (Godfrey et al. 2011), updated to current price years.
2.4.3 Quality Adjusted Life Years (QALYs)
To calculate the Incremental Cost-Effectiveness Ratio (ICER), the model must quantify improvements in health-related quality of life.
The model applies age- and sex-specific lifetime QALY gain multipliers to the successful quitting cohort.
The lifetime QALY gains are parameterised based on Stapleton et al. (Stapleton and West 2012).
Note: QALY estimates were not a feature of the initial Excel model.