Overview
Spatial coverage
The data were collected for all Scotland.
Temporal coverage
- May 1st 2001 to April 30th 2008 for cardiovascular diseases, cancer, breast cancer screening, mental health and maternal and child health.
- May 1st 2001 to April 30th 2010 for respiratory diseases, gastrointestinal diseases and primary care (pilot)
- May 1st 2001 to April 30th 2013 for the ongoing phase which aims to
continue methodological work on the use of cardiovascular and respiratory risk factors from primary
care and in addition, to study:
- all-cause mortality
- all-cause hospitalisation
- hospitalisation and mortality for:
- infectious and parasitic diseases
- injuries, accident and poisoning
- Uptake of bowel cancer screening and pathology of screen–detected bowel cancers
Species
Human population
Methods
Steps
The methods and many of the findings of our retrospective cohort study (SHELS) have been published [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. We followed a strict protocol that preserved anonymity and maintained separation of personal data from the Census and clinical data. Figure 1 shows our approach to linkage. We used computerised probability matching of names, sex, addresses and dates of birth to link the 2001 census for Scotland, to the Scottish Community Health Index (CHI), which is a register of patients using the NHS (National Health Service). At this stage, other data fields in the two datasets were excluded. CHI and the census unique numbers were encrypted. A one-way cryptographic (‘hashing’) algorithm was used to encrypt the CHI number. The census number was encrypted using an algorithm developed by the National Record of Scotland (NRS).
Overview of record linkage process.
About 95% (approx 4.65 million) of the people participating in the 2001 census (4.9 million) were linked as above to the Scottish Community Health Index, with 85% or more linked in every ethnic group. This represents about 92% of the 2001 population. This linked file of encrypted CHI and census numbers is the key to subsequent linkage of any health data to the 2001 census records.
Using our retrospective cohort, we were able to analyse ethnic variations in various health and healthcare areas: cardiovascular diseases [2, 3, 4, 5, 6], cancer [7, 8, 9], maternal and child health [10], mental health [11], gastrointestinal diseases and respiratory diseases (to be submitted for publication in 2014). We have also linked primary care records from 10 general practices in Edinburgh and Glasgow.
Both hospitalisation diagnoses and causes of death (see tables 2 and 3 in appendix) were available in each health area dataset. Other health datasets were linked for the analyses of maternal and child health and mental health outcomes (see table 1 in appendix).
These morbidity and mortality data were examined in relation to ethnicity, adjusting for demographic and socioeconomic measures obtained from the 2001 census (See table 1 in appendix).
Sampling strategy
Not relevant here.
Quality Control
- The health data undergo quality control procedures by Information Services Division (ISD) of NHS National Services of Scotland. A Data Quality Assurance team ensure that the ISD health records are accurate, consistent and comparable across time and between sources. (http://www.isdscotland.org/Products-and-Services/Data-Quality/http://www.isdscotland.org/Products-and-Services/Data-Quality/ – accessed 17/09/2014, contact: nss.isd-dmdataquality@nhs.net)
- The Census data have undergone quality control procedures by NRS. (http://www.gro-scotland.gov.uk/files2/the-census/2001-census/census-assessment.pdf – accessed 17/09/2014, contact: customer@gro-scotland.gsi.gov.uk)
- Extracted datasets were checked and incidence figures calculated to compare to official published statistics.
Constraints
95% of the Census was linked to the CHI and at least 85% linkage was achieved for each ethnic group. In any health dataset, a small percentage of records were not linked for reasons including:
- CHI numbers not available or not linking to look up table
- A person was not in Scotland in 2001
- A person was not in the 2001 census
Privacy
Analytical datasets contain no personal identifiers. Statistical output is subject to a NRS disclosure protocol, and scrutiny by a disclosure committee. Researchers require government baseline security clearance for access to the data in a safe setting at NRS, as well as research governance training.
Ethics
The work was approved by the Multicentre Research Ethics Committee for Scotland and the Privacy Advisory Committee of NHS National Services Scotland, plus Community Health Index Advisory Group and Caldicott Guardian approval, where required.
Dataset description
Object name
SHELS data is not yet open access, for further information about the datasets see the tables in appendix. . A detailed metadata and data dictionary will be produced for each health extract once open access approval is agreed.
Data type
Secondary data, processed data.
Ontologies
None
Format names and versions
The datasets are stored in the safe haven as SAS dataset (.sas7bdat).
Creation dates
Creation date by health areas:
- Cardiovascular disease: April 2009
- Maternal and child health: October 2009
- Mental health: December 2009
- Cancer (including breast cancer screening): May 2010
- Gastrointestinal diseases: June 2012
- Respiratory diseases: November 2012
- Primary care risk factors: April 2013
- All-cause mortality: 2014 (ongoing)
- All-cause hospitalisation: 2014 (ongoing)
- Infectious and parasitic disease (including blood borne viruses): 2014 (ongoing)
- Injuries, accident and poisoning: 2014 (ongoing)
- Uptake of bowel cancer screening and pathology of screen–detected bowel cancers: 2014 (ongoing)
Dataset creators
A person independent of the core data analysis team linked the 2001 census numbers held by NRS to the CHI held by ISD, creating the SHELS look-up table with encrypted numbers. ISD extracted health datasets and NRS staff linked them to the Census using the look-up table held at NRS. Then, once the anonymised linked datasets were created, the SHELS researchers were responsible for the data analysis.
Language
English
Programming language
SAS (alternatively STATA) code was used to prepare and analyse the data. Programs are available upon request.
Licence
Not open licence
Accessibility criteria
The linked anonymised Census data and the health data are accessible on a stand-alone computer in a locked room at NRS. Currently, access is restricted to SHELS researchers who have security clearance.
Repository location
Not accessible for the moment, please contact the Principal Investigator (raj.bhopal@ed.ac.uk).
Publication date
Not applicable
Reuse potential
The datasets contain health and socioeconomic data for a wide range of health areas. NRS is currently assessing the feasibility of making SHELS data open access and potentially this could be reused by other researchers, with appropriate agreements from ethics and privacy advisory committees, to analyse the association between health in the Scottish population, as recorded in datasets with CHI number and any variable in the Census.
Appendix
Table 1
Health dataset showing census information.
Health information | Census information | ||
---|---|---|---|
Dataset 1: CHD/Stroke | |||
- | In patient and day-case discharge from the Scottish Morbidity Records SMR01 (general) database | - | Ethnic group |
- | Mortality data from NRS | - | Religion, current |
Dataset 2: cancers | |||
- | In patient and day-case discharge from SMR01 and SMR06 (cancer registrations) databases | - | Religion of upbringing |
- | Mortality data from NRS | - | Country of birth |
- | Screening data from the Scottish Breast Screening Programme | - | Age |
Dataset 3: maternal and child health | |||
- | In patient and day-case discharge from SMR01 and SMR02 (maternity) database | - | Sex |
- | Child health surveillance data | - | Long term illness |
- | Scottish Birth Record data from NRS | - | Self-assessed health |
- | Mortality data from NRS | - | Marital status |
Dataset 4: mental health | |||
- | In patient and day-case discharge from SMR01 and SMR04 (psychiatric) databases | - | Labour force status |
- | Mortality data from NRS | - | Socioeconomic status |
- | Data from the Mental Welfare Commission for Scotland | - | Highest qualification |
Dataset 5: gastrointestinal | |||
- | In patient and day-case discharge from SMR01 (general) database | - | Scottish Index of Multiple Deprivation decile |
- | Mortality data from NRS | - | Car ownership |
Dataset 6: respiratory | |||
- | In patient and day-case discharge from SMR01 (general) database | - | Housing tenure |
- | Mortality data from NRS | - | Household size |
Dataset 7: all-cause hospitalisation* | |||
- | In patient and day-case discharge from SMR01 (general) database | - | Numbers of rooms |
- | Mortality data from NRS | - | Urban/rural indicator |
Dataset 8: all-cause mortality* | |||
- | Mortality data from NRS | - | Health board (Glasgow, Lothian, Tayside, Other) |
Dataset 9: unintentional injuries and poisoning:* | |||
- | In patient and day-case discharge from SMR01 (general) database | - | Mobile (temporary) accommodation |
- | Mortality data from NRS | - | Self-contained accommodation |
Dataset 10: all infectious and parasitic diseases* | |||
- | In patient and day-case discharge SMR01 database | - | Central heating |
- | Mortality data from NRS | - | Moved within last year |
Dataset 11: hepatitis C, hepatitis B and HIV* | |||
- | Records from the hepatitis C, hepatitis B and HIV datasets held by Health Protection Scotland | - | Economic activity last week |
Dataset 12: bowel cancer screening* | |||
- | Bowel Cancer Screening Key Performance Indicator (KPI) dataset held by ISD. | ||
- | Cancer registrations SMR06 database | ||
Dataset 13: Primary care (sample of 10 GP practices) | |||
- | Cardiovascular and respiratory risk | ||
- | Further linked to Dataset 1, 6 and 8 | ||
- | Limited data on morbidity and prescribed drugs relating to cardiovascular and respiratory diseases | ||
* Under preparation
Table 2
SHELS health outcomes by health area for Phases 1-4 datasets (with outcomes between 2001 and 2013, minimum follow-up of 7 years).
Phase | Health area | Health outcomes | |
---|---|---|---|
Phase 1/2 | Cardiovascular diseases: | - | First myocardial infarction and survival (28-day) |
- | First chest pain | ||
- | First stroke | ||
- | First episode of heart failure | ||
Mental Health | - | First psychiatric disorder (any diagnosis) | |
- | First mood disorder | ||
- | First psychotic disorder | ||
- | Emergency detention certificate | ||
- | Short-term detention certificate | ||
- | Compulsory treatment order | ||
Cancer | - | Any first cancer (excluding non-melanoma skin cancer) | |
- | First lung cancer | ||
- | First prostate cancer | ||
- | First breast cancer | ||
- | First colorectal cancer | ||
- | Breast cancer screening non-attendance | ||
Maternal and child health | - | Mean maternal age | |
- | Mean gestational age | ||
- | Birthweight | ||
- | Proportion (%) of smokers (smoking history and smoking during pregnancy) | ||
- | Proportion of preterm birth | ||
- | Caesarean delivery | ||
- | Proportion of breast feeding at 6-8 weeks | ||
Phase 3 | Respiratory diseases | - | All-cause non cancer respiratory diseases |
- | Asthma | ||
- | COPD | ||
- | All upper respiratory infection | ||
- | Tonsillitis | ||
- | All lower respiratory infection | ||
- | Pneumonia | ||
- | Influenza | ||
Gastrointestinal diseases | - | Peptic ulcer disease | |
- | Oesophagitis | ||
- | Gastritis | ||
- | Gallstones | ||
- | Pancreatitis | ||
- | Irritable bowel syndrome | ||
- | Appendicitis | ||
- | Ulcerative colitis | ||
- | Crohn’s disease | ||
- | Any inflammatory bowel disease | ||
- | Diverticular disease | ||
Primary care | - | Pilot study: data extraction and data quality assessment | |
Phase 4 | Infectious diseases | - | Mortality/hospitalisation for infectious or parasitic diseases and incidence/prevalence for HIV and hepatitis C and B |
Non intentional accidents | - | Mortality/hospitalisation for accidents, injuries and poisonings | |
Cancer | - | Participation in and outcome of the Scottish Bowel Cancer Screening Programme | |
Other | - | All-mortality | |
- | All hospitalisation, length of stay and readmission | ||
- | Utilisation of morbidity and risk-factor data from primary care | ||
Table 3
Contents of Health Related Data.
Database | Fields | ||
---|---|---|---|
General hospital discharge record (SMR01) linked to mortality | Age in years | ||
Sex | |||
Admission date | |||
Admission type | |||
Admission reason | |||
Duration of hospital admission | |||
Days waiting | |||
ICD diagnostic group | |||
Main condition | |||
Other condition | |||
OPCS4 codes | |||
Main operation | |||
Date of the main operation | |||
Other operation | |||
Date of the other operation | |||
Date of death if dead | |||
Date of discharge | |||
Discharge type | |||
Discharge/transfer to | |||
Inpatient/day case marker | |||
Summarised admission code | |||
Summarised discharge code | |||
Maternity and birth record (SMR02) linked to mortality | Age at admission (years) | ||
Age at conception (years) | |||
Type of antenatal care | |||
Total previous: | |||
Pregnancies | |||
Spontaneous abortions | |||
Therapeutic abortions | |||
Caesarean sections | |||
Still births | |||
Neonatal deaths | |||
Previous admissions this pregnancy | |||
Parity | |||
Date of booking | |||
Original booking | |||
Delivery plan-place | |||
Delivery plan-management | |||
Height of mother | |||
Mother height group | |||
Weight of mother | |||
Type of abortion | |||
Management of abortion | |||
Sterilisation after abortion | |||
Principal complication after abortion | |||
Estimated gestation | |||
Duration of pregnancy | |||
Antenatal steroids | |||
Diabetes | |||
Smoking history (booking) | |||
Smoker during pregnancy | |||
Condition on discharge | |||
Drug misuse (this pregnancy) | |||
Drugs used 1-4 | |||
Ever injected illicit drugs | |||
Typical weekly alcohol consumption | |||
Indication of labour | |||
Duration of labour | |||
Analgesia during labour/delivery | |||
Psychiatric inpatient records (SMR04) linked to mortality | As for SMR01 plus: | ||
Status on admission | |||
Admission-referral from | |||
Previous psychiatric care | |||
Type of psychiatric care provided | |||
Age on discharge (years) | |||
Discharge-main condition | |||
Discharge-other condition 1-5 | |||
ECT 1st treatment date | |||
ECT treatments-no. this episode | |||
Clinical facility start | |||
Clinical facility end | |||
Arrangements for aftercare 1-4 | |||
Care plan arrangements | |||
Cancer registry (SMR06) linked to mortality | Date of incidence/incidence date | ||
Side | |||
Site ICD9 | |||
ICD10 cancer site | |||
ICDO2 | |||
Type ICDO | |||
Morphology | |||
Date of death | |||
Vital status | |||
Embarkation date | |||
Cause of death 1-4 | |||
Death certificate only | |||
Grade classification | |||
Grade cell type | |||
MVB diagnosis | |||
Histological verification | |||
Method 1st detection | |||
Stage clinical T | |||
Stage clinical N | |||
Stage clinical M | |||
Stage colorectal | |||
Tumour size | |||
Nodes examined | |||
No of nodes examined | |||
Positive nodes | |||
Positive nodes no | |||
ER status | |||
Surgery | |||
Date 1st surgery | |||
Hosp GP 1st surgery | |||
Referred to radiotherapy | |||
Treated with radiotherapy | |||
Date of 1st radiotherapy | |||
Hospital 1st radiotherapy | |||
Chemotherapy | |||
Date 1st chemotherapy | |||
Hospital GP 1st chemotherapy | |||
Hormone therapy | |||
Date 1st hormone therapy | |||
Hosp GP 1st hormone therapy | |||
Other therapy | |||
Type other therapy | |||
Date 1st other therapy | |||
Hosp GP 1st other therapy | |||
NRS death records | Date of event | ||
Primary cause of death | |||
Secondary causes of death | |||
Scottish Breast Screening data | Date of death | ||
Cancer found | |||
Episode | |||
Episode type | |||
Episode invited | |||
Episode attended | |||
Episode outcome | |||
Cancer diagnosis date | |||
Facility flag | |||
Referral flag | |||
Child health data | Birth details | ||
Birthweight | |||
Head circumference | |||
Number born (current pregnancy) | |||
Number born alive (current pregnancy) | |||
Birth order (current) | |||
Birth place type | |||
Age of mother | |||
Gestational age | |||
Delivery mode | |||
Onset of labour | |||
Location | |||
Early life/new born: | |||
- Feeding at 6 weeks | |||
- Neonatal care level | |||
- Newborn screening-all | |||
- Newborn status exam | |||
Health makers: | |||
- Height | |||
- Weight | |||
- BMI | |||
- Immusation activity | |||
- History of present illness (?) | |||
- Significant health concerns | |||
- Sympathetic nervous system(??) status | |||
- Dental status | |||
Basic health measurements | |||
- Height | |||
- Weight | |||
- BMI | |||
Primary care risk factor data | Date of registration | ||
Date of deregistration | |||
Tobacco consumption | |||
Tobacco consumption date | |||
Family history of disease | |||
Family history of disease date | |||
Exercise status | |||
Exercise date | |||
Diabetes | |||
Diabetes date | |||
CHD | |||
CHD date | |||
Stroke | |||
Strokes date | |||
Atrial fibrillation | |||
AF date | |||
Statins | |||
Statins date | |||
Height | |||
Height date | |||
Weight | |||
Weight date | |||
Cholesterol | |||
Cholesterol date | |||
Systolic blood pressure | |||
Systolic blood pressure date | |||
Diastolic blood pressure | |||
Diastolic blood pressure date | |||
Asthma | |||
Asthma date | |||
Asthma prescription | |||
Asthma prescription date | |||
Scottish Bowel Cancer Screening Program linked to cancer registry (SMR06) | Date of test kit sent to participant | ||
Health board of residence | |||
Sex | |||
Age in years | |||
Screening test result | |||
Flag for kit completed in error | |||
Health board identifier/code | |||
Date of notification of a screening result | |||
Colonoscopy performed | |||
Date colonoscopy performed | |||
Reason for not having a colonoscopy | |||
Colonoscopy completed | |||
Invasive cancer detected | |||
ICD-10 classification of neoplasm | |||
Tumour classification (after surgery) | |||
Nodal classification (after surgery) | |||
Metastases classification (after surgery) | |||
TNM classification of malignant tumour (tumour/nodal status/metastasis) derived Dukes’ stage | |||
Polyps detected | |||
Adenoma detected | |||
Count of adenomas | |||
Maximum dimension of the largest adenoma | |||
Polyp cancer detected | |||
Polypectomy performed at colonoscopy | |||
Complication from the colonoscopy requiring admission | |||
Death | |||
Site ICD9 | |||
ICD10 cancer site | |||
ICDO2 | |||
Type ICD 03 | |||
Grade cell type | |||
Stage colorectal | |||
General hospital discharge record (SMR01) linked to hepatitis B dataset from Health Protection Scotland (HPS) and mortality | Sex | ||
Age at diagnosis (years) | |||
NHS board of residence (at diagnosis) | |||
Date of earliest HBs Ag positive specimen | |||
Source of 1st positive test (hospital, routine/antenatal screen, GP, other community setting) | |||
HBV test result (recent/acute, chronic) | |||
Date of late diagnosis | |||
Late diagnosis indicator | |||
Date of death | |||
As for Hepatitis B plus: | |||
Date of earliest positive specimen | |||
General hospital discharge record (SMR01) linked to hepatitis C dataset from HPS and mortality | Source of 1st positive HCV test (hospital, routine/antenatal | ||
screen, GP, other community setting) | |||
HCV test result | |||
Description of result | |||
HCV genotype | |||
Risk group | |||
Date of first attendance (specialist services) | |||
Time from diagnosis to 1st attendance (specialist services) | |||
Date started antiviral therapy | |||
Time from diagnosis to start of antiviral therapy | |||
Response to antiviral therapy (sustained viral response(SVR) non-SVR) | |||
General hospital discharge record (SMR01) linked to HIV dataset from HPS and mortality | As for Hepatitis B plus: | ||
Date of earliest positive specimen | |||
Source of HIV diagnosis test (hospital, routine/antenatal screen, GP, other community setting) | |||
Risk group | |||
Date of AIDS diagnosis (symptoms) | |||
New case (known in Scotland or elsewhere/new to Scotland/unknown) | |||
Infected outside to Scotland (yes/no) | |||
Follow up status (attending/not attending/lost/dead/left Scotland/recent) | |||
Date last attended healthcare services | |||
Date of 1st attendance in HIV specialist care | |||
Time from diagnosis to 1st attendance in HIV specialist care | |||
Date of 1st attendance for CD4 measurement | |||
1st CD4 results (category: low, medium, high) | |||