1. Overview

Introduction/Study Description

VectorNet [1] is a joint initiative of the European Food Safety Authority (EFSA) and the European Centre for Disease Prevention and Control (ECDC), which started in May 2014. The project supports the collection of distribution data on tick, sandfly, mosquito and Culicoides midge vectors, related to both animal and human health.

While VectorNet and its predecessor VBORNET [2] have made substantial progress collating European data on key vector species, the coverage is still incomplete. The ‘Gap Analysis’ work within these projects aims to identify those areas of likely species distribution within the project extent where there are no current data. These estimates were produced throughout the project and were intended to meet two objectives: firstly to help direct extensive VectorNet sampling efforts in the field, and secondly to provide first indications of the current likely extent and distribution of key vector species within continental Europe and its surrounding regions. The models provided here are the latest iteration using the distribution data available at the end of 2018. It is hoped that publishing these models will aid experts to engage the more extensive research and professional community in the drive to expand and validate the VectorNet database, and will also contribute to the veterinary and public health planning for Europe and its neighbouring countries. Readers are encouraged to contact the authors or visit the VectorNet website [1] for further details of the project, and to view distribution maps of arthropod disease vectors of midges, ticks, mosquitos, and sandflies.

For each model, abundance maps with a resolution of 1 km were generated using both Boosted regression trees and Random Forest spatial modelling techniques available through the VECMAP [3] system. The outputs from each technique were ensembled to create a ‘consensus’ output of Ln Maximum Annual number per trap per day.

2. Context

Spatial coverage

Description: Continental Europe and surrounding regions

Northern boundary: 71.8

Southern boundary: 33.5

Eastern boundary: 62.3

Western boundary: –11.2

Temporal coverage

(01/04/2014 – 01/05/2018).

Species

Culicoides imicola Kieffer, Culicoides obsoletus (Meigen), Culicoides scoticus Downes and Kettle, Culicoides dewulfi Goetghebuer, Culicoides chiopterus (Meigen), Culicoides pulicaris (Linnaeus), Culicoides lupicaris Downes and Kettle, Culicoides punctatus (Meigen) and Culicoides newsteadi Austen.

Culicoides imicola is a proven bluetongue virus (BTV) vector species as a livestock-associated species, as numerous isolations of the virus have been made from field-collected individuals, and as the entire transmission cycle was reproduced experimentally for this species [4, 5]. The other listed species belonging to the Avaritia and Culicoides subgenera are considered probable vectors based on their ecological habits, on virus isolation or viral genome detections from field-collected individuals and on experimental infections. BTV was isolated from field-collected C. obsoletus [6, 7, 8] and C. pulicaris [9] – it was however not clear if these taxa referred to species or group of species. BTV-8 genome from C. dewulfi and C. chiopterus field individuals has been identified by real-time RT-PCR in the Netherlands [10, 11] and in France [12]. In the Basque country, BTV-1 genome was detected by real-time RT-PCR from C. obsoletus/C. scoticus, C. pulicaris and C. lupicaris parous females [13]. Culicoides obsoletus and C. scoticus from the United Kingdom have been experimentally infected by BTV-8 and BTV-9, C. scoticus showing higher viral titers [14]. Pools of C. pulicaris were found infected with BTV-2 in Sicily [15], and BTV genome was detected in C. punctatus and C. newsteadi field-collected specimens in Italy [16].

3. Methods

Steps

The series of procedures followed to produce the dataset. This should include any source data used, as well as software and instrumentation involved.

Model training data

The reported distributions of each vector species held in the VectorNet archive on May 2018 were used as the basis for species present training data for the analysis. They were formally released to the authors on request to ECDC (reference number 18-1421).

The raw input data was provided by light trap surveillance of adult Culicoides set up mostly in ruminant farms across continental Europe and surrounding regions (72N-33.5S, –11.2W – 62E), concentrated in Western countries, supplemented by transect samples in eastern and northern Europe. Data from central EU are relatively sparse (see maps Appendix 1). These data were obtained either from National surveillance systems or from surveys carried out by the VectorNet project. Species were identified using a morphological identification key [17] from field collections or, in some case, retrospectively from stored collections from National surveillance systems.

Midge abundance varies throughout the year, and several metrics may be used to represent abundance. The one used here for every species is the mean annual maximum number per trap per day. Data was used only from locations that were sampled with at least one collection per month throughout the season of the peak of abundance. If data from more than a single year was available, the annual average was used. For each species zero values from the abundance datasets were included in the input data, but were not supplemented by zero values for which only presence/absence data were available. These values represent a standardised measure of abundance at the annual resolution, and so represent one aspect of absolute abundance. They are not, however comparable with traditional absolute abundance measures as they are not associated with a specific date.

Maps of the recorded distributions at that time are presented as overlays to the model outputs, in Appendix 1 available within this data package.

Modelling procedure

A range of modelling techniques are available in the VECMAP [3] system, of which Boosted Regression Trees (BRT) and Random Forest (RF) [18], using 10–25 repeated bootstraps per replicate, were used. Five replicates were implemented for each method. Each model was run using a 25% holdback for validation, but which also ensured variability between replicates. BRT model parameters were adjusted to result in 1000 trees; the RF parameters were set to the system defaults = namely 100 trees, the best 15% of the available covariates, and each tree using approximately 70% of available sample data with replacement. An ensembled average (and an associated standard deviation image) was then produced from the ten replicates. The standard deviation maps provide useful indicators of uncertainty in the model outputs.

The covariates offered to the modelling procedures were drawn from a standardised set of environmental parameters, and in particular a suite of Fourier processed MODIS satellite imagery [19] which provides a range of biologically interpretable variables related to levels and seasonality of temperature and vegetation related factors during the period 2001–2015. These are summarised in Table 1 and are all available to registered members of the PALE-Blu Data Website [20]. Each BRT model was run with the top ten predictors identified in the trial model runs for each species, which are listed at the end of Appendix 1.

Table 1

Covariates offered to modelling procedures.


1 ER011503A0: Middle infra-red mean 38 ER011514P2: NDVI phase 2
2 ER011503A1: Middle infra-red amplitude 1 39 ER011514P3: NDVI phase 3
3 ER011503A2: Middle infra-red amplitude 2 40 ER011514VR: NDVI variance
4 ER011503A3: Middle infra-red amplitude 3 41 ER011515A0: EVI mean
5 ER011503MN: Middle infra-red minimum 42 ER011515A1: EVI amplitude 1
6 ER011503MX: Middle infra-red maximum 43 ER011515A2: EVI amplitude 2
7 ER011503P1: Middle infra-red phase 1 44 ER011515A3: EVI amplitude 3
8 ER011503P2: Middle infra-red phase 2 45 ER011515MN: EVI minimum
9 ER011503P3: Middle infra-red phase 3 46 ER011515MX: EVI maximum
10 ER011503VR: Middle infra-red variance 47 ER011515P1: EVI phase 1
11 ER011507A0: Daytime LST mean 48 ER011515P2: EVI phase 2
12 ER011507A1: Daytime LST amplitude 1 49 ER011515P3: EVI phase 3
13 ER011507A2: Daytime LST amplitude 2 50 ER011515VR: EVI variance
14 ER011507A3: Daytime LST amplitude 3 51 EDV590EL: DEM (Elevation)
15 ER011507MN: Daytime LST minimum 52 EDV590RG: DEM (Ruggedness)
16 ER011507MX: Daytime LST maximum 53 ERPRECA0: WORLDCLIM precipitation mean
17 ER011507P1: Daytime LST phase 1 54 ERPRECA1: WORLDCLIM precipitation amplitude 1
18 ER011507P2: Daytime LST phase 2 55 ERPRECA2: WORLDCLIM precipitation amplitude 2
19 ER011507P3: Daytime LST phase 3 56 ERPRECA3: WORLDCLIM precipitation amplitude 3
20 ER011507VR: Daytime LST variance 57 ERPRECMN: WORLDCLIM precipitation minimum
21 ER011508A0: Nighttime LST mean 58 ERPRECMX: WORLDCLIM precipitation maximum
22 ER011508A1: Nighttime LST amplitude 1 59 ERPRECP1: WORLDCLIM precipitation phase 1
23 ER011508A2: Nighttime LST amplitude 2 60 ERPRECP2: WORLDCLIM precipitation phase 2
24 ER011508A3: Nighttime LST amplitude 3 61 ERPRECP3: WORLDCLIM precipitation phase 3
25 ER011508MN: Nighttime LST minimum 62 ERPRECVR: WORLDCLIM precipitation variance
26 ER011508MX: Nighttime LST maximum 63 ERXXGRPD: GRUMP Human Population density
27 ER011508P1: Nighttime LST phase 1 64 ERV59EL500: SRTM Elevation
28 ER011508P2: Nighttime LST phase 2 65 EREELCBARE: consensus % bare ground
29 ER011508P3: Nighttime LST phase 3 66 EREELCDCBD: consensus % deciduous broadleaved forest
30 ER011508VR: Nighttime LST variance 67 EREELCEVBD: consensus % evergreen broadleaved forest
31 ER011514A0: NDVI mean 68 EREELCEVBD: consensus % evergreen needleleaved forest
32 ER011514A1: NDVI amplitude 1 69 EREELCFLD: consensus % flooded
33 ER011514A2: NDVI amplitude 2 70 EREELCHERB: consensus % herbaceous cover
34 ER011514A3: NDVI amplitude 3 71 EREELCMANG: consensus % managed land
35 ER011514MN: NDVI minimum 72 EREELCOTR: consensus % other land cover
36 ER011514MX: NDVI maximum 73 EREELCSHR: consensus % shrub cover
37 ER011514P1: NDVI phase 1 74 EREELCURB: consensus % urban
75 EREELCSNOW: consensus % snow
76 EREELCWAT: consensus % water

LST = Land Surface Temperature. NDVI Normalised Difference vegetation Index; EVI Enhanced Vegetation Index. DEM Digital Elevation. All files starting with ER0115 are Fourier processed MODIS Satellite Imagery produced by the Environmental Research Group Oxford [19].

Files with Worlclim in filename derived from WORLCLIM datasets [21].

GRUMP derived from population layers produced by [22].

All Files with EREELC in file name were derived from the Earthenv consensus land cover data product [23].

All layers extracted and standardised by ERGO for PALEBLU (www.palebludata.com) [20].

Quality Control

As indicated above, only raw data with sufficient samples per site to ensure reliability were used as model inputs. The model outputs were evaluated using the standard, and very extensive, accuracy metrics (e.g. R-squared, AIC, Kappa, Confusion matrices) provided by the VECMAP [3] software. Providing the accuracy metrics indicated sufficient statistical reliability, the outputs were ensembled as described above. AUCs for the training sets for all the models exceed 0.85.

Sampling strategy

The abundance data used to train the maps were collected by longitudinal UV-light trap collections, a method commonly used to survey adult Culicoides populations at a wide scale. The reliability of UV-light trap collections to assess the ‘aggressive density’ on animals (which is the abundance parameter related to the risk of transmission) is still under debate and may be species dependent [24, 25, 26, 27, 28]. However, it is worth highlighting that abundances assessed by UV-light traps have been used for more than a decade to manage animal movements under EU regulations, and that this system has demonstrated its utility.

Constraints

There were no constraints in data production.

Privacy

Not applicable. No human data were used in the analyses or are provided in these datasets.

Ethics

Not Applicable – no personal data has been provided, and no animal welfare constraints apply to entomological sampling.

4. Dataset description

Object name

VectorNet/PALE-Blu Midge Abundance Models

Data type

Processed data; Interpretation of data

Ontologies

N/A.

Format names and versions

JPG, TIF, TFW, DOCX

Creation dates

The start and end dates of when the data was created

01052018 – 01042019.

Dataset creators

The modelling work was led by William Wint (ERGO, the Environmental Research Group Oxford) using data assembled and processed by Thomas Balenghien (CIRAD) and provided by the authors listed above together with additional collaborators of the VectorNet project as listed, with literature sources in the table in Appendix 2.

Language

English

Programming language

N/A.

Licence

The open licence under which the data has been deposited CC-BY 4.

Accessibility criteria

The data are distributed as GIS raster GeoTIFF formats, which is a standard proprietary GIS raster format. To access and analyze the raster data directly GeoTIFFs can be read by most GIS software and some other software packages. These formats are compatible with proprietary (ESRI ArcGIS) and open source Quantum GIS (QGIS) or (R-project raster package). If the user has no suitable software already installed, the authors suggest downloading the open source QGIS software free of charge from http://www.qgis.org to view these data.

A simple schematic of the data layers and directories found within this data package is shown below with descriptions where filenames are not self-explanatory:

  • Appendices – Zipfile containing the appendices for this document.
    • ohd_VNMIDGESV1Appendix1.Pdf: document with quick looks of ensemble models with and without training data, and a summary of best covariate predictors
    • ohd_VNMIDGESV2Appendix2.Pdf. Full list of training data sources
  • Model output ZIPS – Each zip contains 1) geotiffs of ensemble model mean, standard deviation, for display and interrogation within GIS and geostatistical software*; and 2) the quicklook jpg format graphics for display in word processors and the like. Zip file names as follows:
    • chiopterusensemblemay18.zip. Files for Culicoides chiopterus
    • obsoletusandscoticusensemblemay18.zip. Files for Culicoides obsoletus/Culicoides scoticus
    • dewulfiensemblemay18.zip. Files for model of Culicoides dewulfi
    • imicolaensemblemay18.zip. Files for model of Culicoides imicola
    • pulicarisensemblemay18.zip. Files for model of Culicoides pulicaris
    • lupicarisensemblemay18.zip. Files for model of Culicoides lupicaris
    • pulicarisandlupicarisensemblemay18.zip. Files for model of Culicoides pulicaris/lupicaris
    • punctatusensemblemay18.zip. Files for model of Culicoides punctatus
    • newsteadiensemblemay18.zip. Files for model of Culicoides newsteadi

* Only the .tif files within this directory are listed. Other file formats of the same name within the directory (e.g. .tfw) are ancillary files that provide additional data to the GIS software and as a rule should be copied along with the TIFF file of the same name if you are moving the data between directories

Publication date

09/09/2020

5. Reuse potential

Please briefly (approx. 50–200 words) describe the ways in which your data could be reused by other researchers both within and outside of your field. This might for example include aggregation, further analysis, reference, validation, teaching or collaboration.

These layers have been created in an attempt to identify probable areas of species distribution where there are currently no sample data. These maps, therefore, attempt to identify the actual distribution of each species and so could be useful in identifying areas at risk from the disease for which each species is a vector and to identify suitable areas for further sampling. The VectorNet project plans to utilise these datasets in such a way.

The covariates of the models are also mainly climate orientated. A possible avenue of further work, therefore, could be to use the models to assess the potential change in distribution after a shift in climate parameters.