VectorNet  is a joint initiative of the European Food Safety Authority (EFSA) and the European Centre for Disease Prevention and Control (ECDC), which started in May 2014. The project supports the collection of distribution data on tick, sandfly, mosquito and Culicoides midge vectors, related to both animal and human health.
While VectorNet and its predecessor VBORNET  have made substantial progress collating European data on key vector species, the coverage is still incomplete. The ‘Gap Analysis’ work within these projects aims to identify those areas of likely species distribution within the project extent where there are no current data. These estimates were produced throughout the project and were intended to meet two objectives: firstly to help direct extensive VectorNet sampling efforts in the field, and secondly to provide first indications of the current likely extent and distribution of key vector species within continental Europe and its surrounding regions. The models provided here are the latest iteration using the distribution data available at the end of 2018. It is hoped that publishing these models will aid experts to engage the more extensive research and professional community in the drive to expand and validate the VectorNet database, and will also contribute to the veterinary and public health planning for Europe and its neighbouring countries. Readers are encouraged to contact the authors or visit the VectorNet website  for further details of the project, and to view distribution maps of arthropod disease vectors of midges, ticks, mosquitos, and sandflies.
For each model, abundance maps with a resolution of 1 km were generated using both Boosted regression trees and Random Forest spatial modelling techniques available through the VECMAP  system. The outputs from each technique were ensembled to create a ‘consensus’ output of Ln Maximum Annual number per trap per day.
Description: Continental Europe and surrounding regions
Northern boundary: 71.8
Southern boundary: 33.5
Eastern boundary: 62.3
Western boundary: –11.2
(01/04/2014 – 01/05/2018).
Culicoides imicola Kieffer, Culicoides obsoletus (Meigen), Culicoides scoticus Downes and Kettle, Culicoides dewulfi Goetghebuer, Culicoides chiopterus (Meigen), Culicoides pulicaris (Linnaeus), Culicoides lupicaris Downes and Kettle, Culicoides punctatus (Meigen) and Culicoides newsteadi Austen.
Culicoides imicola is a proven bluetongue virus (BTV) vector species as a livestock-associated species, as numerous isolations of the virus have been made from field-collected individuals, and as the entire transmission cycle was reproduced experimentally for this species [4, 5]. The other listed species belonging to the Avaritia and Culicoides subgenera are considered probable vectors based on their ecological habits, on virus isolation or viral genome detections from field-collected individuals and on experimental infections. BTV was isolated from field-collected C. obsoletus [6, 7, 8] and C. pulicaris  – it was however not clear if these taxa referred to species or group of species. BTV-8 genome from C. dewulfi and C. chiopterus field individuals has been identified by real-time RT-PCR in the Netherlands [10, 11] and in France . In the Basque country, BTV-1 genome was detected by real-time RT-PCR from C. obsoletus/C. scoticus, C. pulicaris and C. lupicaris parous females . Culicoides obsoletus and C. scoticus from the United Kingdom have been experimentally infected by BTV-8 and BTV-9, C. scoticus showing higher viral titers . Pools of C. pulicaris were found infected with BTV-2 in Sicily , and BTV genome was detected in C. punctatus and C. newsteadi field-collected specimens in Italy .
The series of procedures followed to produce the dataset. This should include any source data used, as well as software and instrumentation involved.
Model training data
The reported distributions of each vector species held in the VectorNet archive on May 2018 were used as the basis for species present training data for the analysis. They were formally released to the authors on request to ECDC (reference number 18-1421).
The raw input data was provided by light trap surveillance of adult Culicoides set up mostly in ruminant farms across continental Europe and surrounding regions (72N-33.5S, –11.2W – 62E), concentrated in Western countries, supplemented by transect samples in eastern and northern Europe. Data from central EU are relatively sparse (see maps Appendix 1). These data were obtained either from National surveillance systems or from surveys carried out by the VectorNet project. Species were identified using a morphological identification key  from field collections or, in some case, retrospectively from stored collections from National surveillance systems.
Midge abundance varies throughout the year, and several metrics may be used to represent abundance. The one used here for every species is the mean annual maximum number per trap per day. Data was used only from locations that were sampled with at least one collection per month throughout the season of the peak of abundance. If data from more than a single year was available, the annual average was used. For each species zero values from the abundance datasets were included in the input data, but were not supplemented by zero values for which only presence/absence data were available. These values represent a standardised measure of abundance at the annual resolution, and so represent one aspect of absolute abundance. They are not, however comparable with traditional absolute abundance measures as they are not associated with a specific date.
Maps of the recorded distributions at that time are presented as overlays to the model outputs, in Appendix 1 available within this data package.
A range of modelling techniques are available in the VECMAP  system, of which Boosted Regression Trees (BRT) and Random Forest (RF) , using 10–25 repeated bootstraps per replicate, were used. Five replicates were implemented for each method. Each model was run using a 25% holdback for validation, but which also ensured variability between replicates. BRT model parameters were adjusted to result in 1000 trees; the RF parameters were set to the system defaults = namely 100 trees, the best 15% of the available covariates, and each tree using approximately 70% of available sample data with replacement. An ensembled average (and an associated standard deviation image) was then produced from the ten replicates. The standard deviation maps provide useful indicators of uncertainty in the model outputs.
The covariates offered to the modelling procedures were drawn from a standardised set of environmental parameters, and in particular a suite of Fourier processed MODIS satellite imagery  which provides a range of biologically interpretable variables related to levels and seasonality of temperature and vegetation related factors during the period 2001–2015. These are summarised in Table 1 and are all available to registered members of the PALE-Blu Data Website . Each BRT model was run with the top ten predictors identified in the trial model runs for each species, which are listed at the end of Appendix 1.
|1||ER011503A0: Middle infra-red mean||38||ER011514P2: NDVI phase 2|
|2||ER011503A1: Middle infra-red amplitude 1||39||ER011514P3: NDVI phase 3|
|3||ER011503A2: Middle infra-red amplitude 2||40||ER011514VR: NDVI variance|
|4||ER011503A3: Middle infra-red amplitude 3||41||ER011515A0: EVI mean|
|5||ER011503MN: Middle infra-red minimum||42||ER011515A1: EVI amplitude 1|
|6||ER011503MX: Middle infra-red maximum||43||ER011515A2: EVI amplitude 2|
|7||ER011503P1: Middle infra-red phase 1||44||ER011515A3: EVI amplitude 3|
|8||ER011503P2: Middle infra-red phase 2||45||ER011515MN: EVI minimum|
|9||ER011503P3: Middle infra-red phase 3||46||ER011515MX: EVI maximum|
|10||ER011503VR: Middle infra-red variance||47||ER011515P1: EVI phase 1|
|11||ER011507A0: Daytime LST mean||48||ER011515P2: EVI phase 2|
|12||ER011507A1: Daytime LST amplitude 1||49||ER011515P3: EVI phase 3|
|13||ER011507A2: Daytime LST amplitude 2||50||ER011515VR: EVI variance|
|14||ER011507A3: Daytime LST amplitude 3||51||EDV590EL: DEM (Elevation)|
|15||ER011507MN: Daytime LST minimum||52||EDV590RG: DEM (Ruggedness)|
|16||ER011507MX: Daytime LST maximum||53||ERPRECA0: WORLDCLIM precipitation mean|
|17||ER011507P1: Daytime LST phase 1||54||ERPRECA1: WORLDCLIM precipitation amplitude 1|
|18||ER011507P2: Daytime LST phase 2||55||ERPRECA2: WORLDCLIM precipitation amplitude 2|
|19||ER011507P3: Daytime LST phase 3||56||ERPRECA3: WORLDCLIM precipitation amplitude 3|
|20||ER011507VR: Daytime LST variance||57||ERPRECMN: WORLDCLIM precipitation minimum|
|21||ER011508A0: Nighttime LST mean||58||ERPRECMX: WORLDCLIM precipitation maximum|
|22||ER011508A1: Nighttime LST amplitude 1||59||ERPRECP1: WORLDCLIM precipitation phase 1|
|23||ER011508A2: Nighttime LST amplitude 2||60||ERPRECP2: WORLDCLIM precipitation phase 2|
|24||ER011508A3: Nighttime LST amplitude 3||61||ERPRECP3: WORLDCLIM precipitation phase 3|
|25||ER011508MN: Nighttime LST minimum||62||ERPRECVR: WORLDCLIM precipitation variance|
|26||ER011508MX: Nighttime LST maximum||63||ERXXGRPD: GRUMP Human Population density|
|27||ER011508P1: Nighttime LST phase 1||64||ERV59EL500: SRTM Elevation|
|28||ER011508P2: Nighttime LST phase 2||65||EREELCBARE: consensus % bare ground|
|29||ER011508P3: Nighttime LST phase 3||66||EREELCDCBD: consensus % deciduous broadleaved forest|
|30||ER011508VR: Nighttime LST variance||67||EREELCEVBD: consensus % evergreen broadleaved forest|
|31||ER011514A0: NDVI mean||68||EREELCEVBD: consensus % evergreen needleleaved forest|
|32||ER011514A1: NDVI amplitude 1||69||EREELCFLD: consensus % flooded|
|33||ER011514A2: NDVI amplitude 2||70||EREELCHERB: consensus % herbaceous cover|
|34||ER011514A3: NDVI amplitude 3||71||EREELCMANG: consensus % managed land|
|35||ER011514MN: NDVI minimum||72||EREELCOTR: consensus % other land cover|
|36||ER011514MX: NDVI maximum||73||EREELCSHR: consensus % shrub cover|
|37||ER011514P1: NDVI phase 1||74||EREELCURB: consensus % urban|
|75||EREELCSNOW: consensus % snow|
|76||EREELCWAT: consensus % water|
As indicated above, only raw data with sufficient samples per site to ensure reliability were used as model inputs. The model outputs were evaluated using the standard, and very extensive, accuracy metrics (e.g. R-squared, AIC, Kappa, Confusion matrices) provided by the VECMAP  software. Providing the accuracy metrics indicated sufficient statistical reliability, the outputs were ensembled as described above. AUCs for the training sets for all the models exceed 0.85.
The abundance data used to train the maps were collected by longitudinal UV-light trap collections, a method commonly used to survey adult Culicoides populations at a wide scale. The reliability of UV-light trap collections to assess the ‘aggressive density’ on animals (which is the abundance parameter related to the risk of transmission) is still under debate and may be species dependent [24, 25, 26, 27, 28]. However, it is worth highlighting that abundances assessed by UV-light traps have been used for more than a decade to manage animal movements under EU regulations, and that this system has demonstrated its utility.
There were no constraints in data production.
Not applicable. No human data were used in the analyses or are provided in these datasets.
Not Applicable – no personal data has been provided, and no animal welfare constraints apply to entomological sampling.
4. Dataset description
VectorNet/PALE-Blu Midge Abundance Models
Processed data; Interpretation of data
Format names and versions
JPG, TIF, TFW, DOCX
The start and end dates of when the data was created
01052018 – 01042019.
The modelling work was led by William Wint (ERGO, the Environmental Research Group Oxford) using data assembled and processed by Thomas Balenghien (CIRAD) and provided by the authors listed above together with additional collaborators of the VectorNet project as listed, with literature sources in the table in Appendix 2.
The open licence under which the data has been deposited CC-BY 4.
The data are distributed as GIS raster GeoTIFF formats, which is a standard proprietary GIS raster format. To access and analyze the raster data directly GeoTIFFs can be read by most GIS software and some other software packages. These formats are compatible with proprietary (ESRI ArcGIS) and open source Quantum GIS (QGIS) or (R-project raster package). If the user has no suitable software already installed, the authors suggest downloading the open source QGIS software free of charge from http://www.qgis.org to view these data.
A simple schematic of the data layers and directories found within this data package is shown below with descriptions where filenames are not self-explanatory:
Appendices – Zipfile containing the appendices for this document.
- ohd_VNMIDGESV1Appendix1.Pdf: document with quick looks of ensemble models with and without training data, and a summary of best covariate predictors
- ohd_VNMIDGESV2Appendix2.Pdf. Full list of training data sources
Model output ZIPS – Each zip contains 1) geotiffs of ensemble model mean, standard deviation, for display and interrogation within GIS and geostatistical software*; and 2) the quicklook jpg format graphics for display in word processors and the like. Zip file names as follows:
- chiopterusensemblemay18.zip. Files for Culicoides chiopterus
- obsoletusandscoticusensemblemay18.zip. Files for Culicoides obsoletus/Culicoides scoticus
- dewulfiensemblemay18.zip. Files for model of Culicoides dewulfi
- imicolaensemblemay18.zip. Files for model of Culicoides imicola
- pulicarisensemblemay18.zip. Files for model of Culicoides pulicaris
- lupicarisensemblemay18.zip. Files for model of Culicoides lupicaris
- pulicarisandlupicarisensemblemay18.zip. Files for model of Culicoides pulicaris/lupicaris
- punctatusensemblemay18.zip. Files for model of Culicoides punctatus
- newsteadiensemblemay18.zip. Files for model of Culicoides newsteadi
* Only the .tif files within this directory are listed. Other file formats of the same name within the directory (e.g. .tfw) are ancillary files that provide additional data to the GIS software and as a rule should be copied along with the TIFF file of the same name if you are moving the data between directories
5. Reuse potential
Please briefly (approx. 50–200 words) describe the ways in which your data could be reused by other researchers both within and outside of your field. This might for example include aggregation, further analysis, reference, validation, teaching or collaboration.
These layers have been created in an attempt to identify probable areas of species distribution where there are currently no sample data. These maps, therefore, attempt to identify the actual distribution of each species and so could be useful in identifying areas at risk from the disease for which each species is a vector and to identify suitable areas for further sampling. The VectorNet project plans to utilise these datasets in such a way.
The covariates of the models are also mainly climate orientated. A possible avenue of further work, therefore, could be to use the models to assess the potential change in distribution after a shift in climate parameters.