The age structure of the population appears to play a key role in determining the severity of symptoms and the mortality of the disease caused by the SARS-CoV-2 infection. The importance of the demographic structure in determining the pandemic’s progression and impact has indeed been well recognized by researchers, see, e.g., [1, 2]. Also, a clearer understanding of the contagion’s interaction dynamics among age classes appears to be fundamental for devising effective containment measures and for establishing priorities for the vaccination campaigns. Despite the importance of age-related COVID-19 data, and despite the fact that calls for countries to provide this data have been repeatedly made (see, e.g., [2, 3, 4]) this type of data has been to date essentially unavailable to the public, and even to researchers. This fact motivated us to formally request specific age-related COVID-19 data to Italian authorities in charge of the COVID-19 surveillance (Istituto Superiore di Sanità – ISS), so to make them available to the public for research purposes.
2. Context and Methods
Spatial and temporal coverage
The data refers to the population of Italy and covers the period from Jan. 29, 2020 to October 15, 2021, with daily frequency. Data relative to the early phase of the contagion (i.e., previous to March 2020) have several missing values for some age classes.
Methodology and quality control
The data reported in the file are the data present in the Italian COVID-19 surveillance system, updated to the extraction date of October 15, 2021. The data represents aggregations of positive cases for SARS-CoV-2 derived from the Integrated Covid-19 Surveillance coordinated by the ISS (Ordonnance no. 640 of February 27, 2020). The Integrated Surveillance data is updated daily by each Region, both with new cases and with the addition of new information on cases already communicated previously, as they become available. In addition, the constant quality control of the data also seldom highlights the need, on the part of the Regions, to cancel some cases that are mistakenly duplicated.
The data collected is in a continuous phase of consolidation and, as expected in an emergency situation, some information is incomplete. In particular, the possibility of a delay of a few days between the execution of the swab for diagnosis and reporting on the dedicated platform is noted. Therefore, the number of cases observed in the most recent days, compared to the extraction date, must be interpreted as provisional and incomplete. The same applies to reporting hospitalization and death.
The data reported are disaggregated in a manner that guarantees compliance with the privacy legislation. In particular, it should be noted that for frequency values between 1 and 4 the value is expressed as “<5”.
3. Dataset description
Two files are provided. The first file is the main COVID-19 data file named “covid_ageclass_Italy.csv” while the second file named “ageclass_pop.csv” is an ancillary file that contains the population cardinality for each age class.
Format names and versions
File format is textual comma separated values (CSV).
The dataset was extracted from the national official database, upon request from the authors, by Dr. Patrizio Pezzotti from the Epidemiology, Biostatistics and Mathematical Models Department of the ISS.
The data is provided under the CC0-Public Domain Dedication waiver license.
Publication date: Nov. 22, 2021
The data file “covid_ageclass_Italy.csv” contains 6069 rows (plus the headings row) and eight columns. The columns contain the following data:
- “date” contains the date indicating the day to which the data in the other columns refers. It is the date of the confirmed diagnosis of microbiological SARS-CoV-2 infection, or the date of hospitalization, the date of recovery, the date of death, etc.
- “age_class” is the age class, in a ten-year range. In some rare cases it can be “unknown.”
- “cases” contains the number of confirmed positive SARS-CoV-2 infected cases for that day in the given age class.
- “hospitalized” contains the number of patients hospitalized (due to COVID) in that day in the given age class.
- “intensive_care” contains the number of patients that entered intensive care (due to COVID) in that day in the given age class.
- “deceased” contains the number of deceased persons (with death ascribed to COVID) in that day in the given age class.
- “recovered” contains the number of persons that recovered (from COVID) in that day in the given age class.
- “active_infected” contains the total number of persons that are active and infected with SARS-CoV-2 on the given day in the given age class.
The ancillary file data file “ageclass_pop.csv” contains 10 rows (plus the headings row) and two columns. The first column “age_class” contains the age class, the second column “population” contains the number of individuals resident in Italy for that age class, as of Jan. 2020.
A cumulative summary of part of the data is shown in Table 1. Mortality is here computed simply as the ratio between deceased individuals in a given age class and the population of that class. Lethality is computed as the ratio between deceased individuals in a given age class and the infected individuals (cases) in that class. Values reported as “<5” in the data are imputed a default value of 2. Figure 1 shows a pie chart of the deaths by age. Figure 2 shows an example of time-series data representing the daily cases for the 50–59 age class; the regular spikes in the plot correspond to Sundays. Figure 3 shows the time-series of the active infected individuals for the 50–59 age class; three major infection peaks are visible, the first in mid-April 2020, the second in late November 2020, and the third in early April 2021.
|AGE CLASS||POPULATION||CASES||INTENSIVE CARE||DECEASED||%MORTALITY||%LETHALITY|
4. Reuse potential
The data can be used for research purposes, including aggregation, analysis, reference, model (e.g., SIRD) building and validation, teaching or collaboration.
Data accessibility statement
This data paper is available as a preprint on arXiv at https://arxiv.org/abs/2104.06199.