1. Overview


The age structure of the population appears to play a key role in determining the severity of symptoms and the mortality of the disease caused by the SARS-CoV-2 infection. The importance of the demographic structure in determining the pandemic’s progression and impact has indeed been well recognized by researchers, see, e.g., [1, 2]. Also, a clearer understanding of the contagion’s interaction dynamics among age classes appears to be fundamental for devising effective containment measures and for establishing priorities for the vaccination campaigns. Despite the importance of age-related COVID-19 data, and despite the fact that calls for countries to provide this data have been repeatedly made (see, e.g., [2, 3, 4]) this type of data has been to date essentially unavailable to the public, and even to researchers. This fact motivated us to formally request specific age-related COVID-19 data to Italian authorities in charge of the COVID-19 surveillance (Istituto Superiore di Sanità – ISS), so to make them available to the public for research purposes.

2. Context and Methods

Spatial and temporal coverage

The data refers to the population of Italy and covers the period from Jan. 29, 2020 to October 15, 2021, with daily frequency. Data relative to the early phase of the contagion (i.e., previous to March 2020) have several missing values for some age classes.

Methodology and quality control

The data reported in the file are the data present in the Italian COVID-19 surveillance system, updated to the extraction date of October 15, 2021. The data represents aggregations of positive cases for SARS-CoV-2 derived from the Integrated Covid-19 Surveillance coordinated by the ISS (Ordonnance no. 640 of February 27, 2020). The Integrated Surveillance data is updated daily by each Region, both with new cases and with the addition of new information on cases already communicated previously, as they become available. In addition, the constant quality control of the data also seldom highlights the need, on the part of the Regions, to cancel some cases that are mistakenly duplicated.

The data collected is in a continuous phase of consolidation and, as expected in an emergency situation, some information is incomplete. In particular, the possibility of a delay of a few days between the execution of the swab for diagnosis and reporting on the dedicated platform is noted. Therefore, the number of cases observed in the most recent days, compared to the extraction date, must be interpreted as provisional and incomplete. The same applies to reporting hospitalization and death.


The data reported are disaggregated in a manner that guarantees compliance with the privacy legislation. In particular, it should be noted that for frequency values between 1 and 4 the value is expressed as “<5”.

3. Dataset description

Object name

Two files are provided. The first file is the main COVID-19 data file named “covid_ageclass_Italy.csv” while the second file named “ageclass_pop.csv” is an ancillary file that contains the population cardinality for each age class.

Format names and versions

File format is textual comma separated values (CSV).

Dataset creators

The dataset was extracted from the national official database, upon request from the authors, by Dr. Patrizio Pezzotti from the Epidemiology, Biostatistics and Mathematical Models Department of the ISS.


The data is provided under the CC0-Public Domain Dedication waiver license.

Repository location

dataverse.harvard.edu: https://doi.org/10.7910/DVN/VSS4CO

Publication date: Nov. 22, 2021

Data description

The data file “covid_ageclass_Italy.csv” contains 6069 rows (plus the headings row) and eight columns. The columns contain the following data:

  1. “date” contains the date indicating the day to which the data in the other columns refers. It is the date of the confirmed diagnosis of microbiological SARS-CoV-2 infection, or the date of hospitalization, the date of recovery, the date of death, etc.
  2. “age_class” is the age class, in a ten-year range. In some rare cases it can be “unknown.”
  3. “cases” contains the number of confirmed positive SARS-CoV-2 infected cases for that day in the given age class.
  4. “hospitalized” contains the number of patients hospitalized (due to COVID) in that day in the given age class.
  5. “intensive_care” contains the number of patients that entered intensive care (due to COVID) in that day in the given age class.
  6. “deceased” contains the number of deceased persons (with death ascribed to COVID) in that day in the given age class.
  7. “recovered” contains the number of persons that recovered (from COVID) in that day in the given age class.
  8. “active_infected” contains the total number of persons that are active and infected with SARS-CoV-2 on the given day in the given age class.

The ancillary file data file “ageclass_pop.csv” contains 10 rows (plus the headings row) and two columns. The first column “age_class” contains the age class, the second column “population” contains the number of individuals resident in Italy for that age class, as of Jan. 2020.

Data overview

A cumulative summary of part of the data is shown in Table 1. Mortality is here computed simply as the ratio between deceased individuals in a given age class and the population of that class. Lethality is computed as the ratio between deceased individuals in a given age class and the infected individuals (cases) in that class. Values reported as “<5” in the data are imputed a default value of 2. Figure 1 shows a pie chart of the deaths by age. Figure 2 shows an example of time-series data representing the daily cases for the 50–59 age class; the regular spikes in the plot correspond to Sundays. Figure 3 shows the time-series of the active infected individuals for the 50–59 age class; three major infection peaks are visible, the first in mid-April 2020, the second in late November 2020, and the third in early April 2021.

Table 1

Summary table.


0–9 4892494 273207 166 34 0.0007 0.0124

10–19 5706116 491752 233 44 0.0008 0.0089

20–29 6084382 585913 540 148 0.0024 0.0253

30–39 6854632 595830 1258 399 0.0058 0.0670

40–49 8937229 747765 3493 1215 0.0136 0.1625

50–59 9414195 793382 9353 4848 0.0515 0.6111

60–69 7364364 500624 15637 13825 0.1877 2.7616

70–79 5968373 360470 18035 33309 0.5581 9.2404

80–89 3628160 267017 8970 53126 1.4643 19.8961

>=90 791543 91373 1606 25578 3.2314 27.9930

ALL 59641488 4707333 59291 132526 0.2222 2.8153

Deaths by age class
Figure 1 

Deaths by age class.

Cases for the 50–59 age class
Figure 2 

Cases for the 50–59 age class.

Active infected individuals for the 50–59 age class
Figure 3 

Active infected individuals for the 50–59 age class.

4. Reuse potential

The data can be used for research purposes, including aggregation, analysis, reference, model (e.g., SIRD) building and validation, teaching or collaboration.

Data accessibility statement

This data paper is available as a preprint on arXiv at https://arxiv.org/abs/2104.06199.