1. Overview

Introduction/Study Description

VBORNET [1] was an initiative of the European Centre for Disease Prevention and Control (ECDC), which ran from 2009 to 2014. The project established a European network of entomological and public health specialists in order to assist ECDC in its preparedness activities on vector-borne diseases (VBD). As part of this work a database collating validated records of key species distributions were commissioned. This data paper focusses on four sand fly species Phlebotomus ariasi, Phlebotomus papatasi, Phlebotomus perniciosus and Phlebotomus tobbi vectors of Leishmaniasis.

VectorNet [2] has continued this work and builds upon VBORNET by supporting the collection of data on vectors and pathogens in vectors, related to both animal and human health. VectorNet is a joint initiative of the European Food Safety Authority (EFSA) and the European Centre for Disease Prevention and Control (ECDC), which started in May 2014.

Whilst VBORNET and VectorNet have made substantial progress collating European data on key vector species, the coverage is still incomplete. The ‘Gap Analysis’ work within these projects aims to identify those areas of likely species distribution within the project extent where there are no current data. These estimates produced by spatial modelling techniques are intended to meet two objectives: firstly to help direct extensive VectorNet sampling efforts in the field; and secondly to provide first indications of the current likely extent and distribution of key vector species within continental Europe and its surrounding regions. It is hoped that publishing these models will aid the VectorNet network of experts to engage the wider research and professional community in the drive to expand and validate the VectorNet database. Readers are encouraged to contact the authors or visit the VectorNet website [2].

For each species probability of presence maps at the resolution of 1km were generated using a variety of well-established spatial modelling techniques available through the VECMAP system [3]. Both the input data and the resulting models were iteratively assessed by project experts and the best performing are included in this data package.

2. Context

Spatial coverage

Description: Continental Europe

Northern boundary: 71.8

Southern boundary: 33.5

Eastern boundary: 62.3

Western boundary: –11.2

Temporal coverage

Known presence up to (31/01/2013).

Species

Phlebotomus ariasi vector of Leismania infantum and phleboviruses.

Phlebotomus papatasi vector of Leismania major and phleboviruses.

Phlebotomus perniciosus vector of Leismania infantum and phleboviruses.

Phlebotomus tobbi vector of Leismania infantum, Leishmania donovani and phleboviruses.

3. Methods

For each of the species the following method was followed.

Steps

Identifying presence and absence training data

The reported distributions of each of the four sand fly species by VBORNET were used as the basis for species present training data for the analysis. Data reported from the VBORNET map published January 2013 were used for Phlebotomus perniciosus and Phlebotomus tobbi, and January 2014 for Phlebotomus ariasi and Phlebotomus papatasi. Maps of the recorded known distributions at that time are presented in Appendix 1 available within this data package. These reported distributions were recorded in VBORNET at a coarse NUTS 3 polygon scale. The data originates from a combination of both aggregated data contributed by the authors and listed contributers. As well as a literature review completed by the VBORNET vector group leaders. The full data set and sources are available to contributors of VBORNET and VectorNet.

Habitat suitability and environmental limits

The recorded distributions were too coarse to be utilised by the model framework. In addition, the selected modelling methods required information on both presence and absence to calibrate the modelling process. It was therefore necessary to identify areas of absence within NUTS 3 regions assigned as present. To do this a suitability mask at 1 km resolution was compiled by requesting experts within the network (see the Data Creators section) to identify primary, secondary and unsuitable land cover classes. For the phase two models (Ph. ariasi and Ph. papatasi) environmental limiting factors which are derived from remotely sensed imagery were also identified and used in the mask.

Environmental limits masks were created using altitude measures and temperature limits derived from the SRTM 100m Digital Elevation Model [4] and BIOCLIM [5] temperature layers respectively. Phlebotomus ariasi limits were set as the minimum altitude within a 1km square must be below 1700m. While the temperature limits were using the BIOCLIM Tmax layer between 15–32 degrees centigrade. For Ph. papatasi minimum altitude limits must be below 2000m and between 20–30 degrees centigrade using the BIOCLIM Tmean layer. Whilst these values are loosely based on laboratory findings (Personal communication with Ozge Erisoz Kasap – See Data contributors) the values were assessed visually and calibrated to account for differences between laboratory measurements of species behaviour and recorded remotely sensed values at coarse resolution.

The land cover masks were defined utilising the 100m Corine land cover dataset [6] and the 300m GLOBCOVER [7] product where no Corine data was available. Definitions of land class suitability for each species as defined by experts can be found in Tables 1 and 2.

Table 1

Reclassed values defining the Corine [6].

CORINE LABEL PHTO PHPE PHAR PHPA

Continuous urban fabric 0 0 0 1
Discontinuous urban fabric 0 0 0 1
Industrial or commercial units 0 0 0 1
Road and rail networks and associated land 0 0 0 1
Port areas 0 0 0 1
Airports 0 0 0 1
Mineral extraction sites 0 0 0 1
Dump sites 0 0 0 1
Construction sites 1 1 1 1
Green urban areas 1 1 1 1
Sport and leisure facilities 0 0 0 1
Non-irrigated arable land 1 1 1 1
Permanently irrigated land 1 1 1 1
Rice fields 1 0 0 1
Vineyards 1 1 1 1
Fruit trees and berry plantations 1 1 1 1
Olive groves 1 1 1 1
Pastures 1 0 0 1
Annual crops associated with permanent crops 1 1 1 1
Complex cultivation patterns 1 1 1 1
Land principally occupied by agriculture, with significant areas of natural vegetation 1 1 1 1
Agro-forestry areas 1 1 1 1
Broad-leaved forest 1 1 1 0
Coniferous forest 1 1 1 1
Mixed forest 1 1 1 1
Natural grasslands 1 0 0 1
Moors and heathland 1 1 1 1
Sclerophyllous vegetation 1 1 1 1
Transitional woodland-shrub 1 1 1 1
Beaches, dunes, sands 0 0 0 0
Bare rocks 1 1 1 0
Sparsely vegetated areas 0 1 1 1
Burnt areas 0 0 0 0
Glaciers and perpetual snow 0 0 0 0
Inland marshes 0 0 0 0
Peat bogs 0 0 0 0
Salt marshes 0 0 0 0
Salines 0 0 0 0
Intertidal flats 0 0 0 0
Water courses 0 0 0 0
Water bodies 0 0 0 0
Coastal lagoons 0 0 0 0
Estuaries 0 0 0 0
Sea and ocean 0 0 0 0

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

LST = Land Surface Temperature. NDVI Normalised Difference vegetation Index; EVI Enhanced Vegetation Index. DEM Digital Elevation. All files starting with ED18 are Fourier processed MODIS Satellite Imagery produced by the TALA research Group Oxford [11].

Files with Bioclim and Worldclim in filename derived from WORLDCLIM datasets [5].

GRUMP derived from population layers produced by [13].

JRC Accessibility downloaded from [14].

Length of growing Period derived from data provided by FAO, Rome. Available from www.vmerge.com [12].

All layers extracted and standardised by ERGO for EDENext (www.edenextdata.com) [15]

Table 2

Reclassed values defining the Globcover [7].

GLOBCOVER LABEL PHTO PHPE PHAR PHPA

Post-flooding or irrigated croplands (or aquatic) 1 0 0 1
Rainfed croplands 1 0 0 1
Mosaic cropland (50–70%) / vegetation (grassland/shrubland/forest) (20–50%) 1 1 1 1
Mosaic vegetation (grassland/shrubland/forest) (50–70%) / cropland (20–50%) 1 1 1 1
Closed to open (>15%) broadleaved evergreen or semi-deciduous forest (>5m) 1 1 1 1
Closed (>40%) broadleaved deciduous forest (>5m) 1 1 1 1
Open (15–40%) broadleaved deciduous forest/woodland (>5m) 1 1 1 1
Closed (>40%) needleleaved evergreen forest (>5m) 1 1 1 1
Open (15–40%) needleleaved deciduous or evergreen forest (>5m) 1 1 1 1
Closed to open (>15%) mixed broadleaved and needleleaved forest (>5m) 1 1 1 1
Mosaic forest or shrubland (50–70%) / grassland (20–50%) 1 1 1 1
Mosaic grassland (50–70%) / forest or shrubland (20–50%) 1 1 1 1
Closed to open (>15%) (broadleaved or needleleaved, evergreen or deciduous) shrubland (<5m) 0 1 1 1
Closed to open (>15%) herbaceous vegetation (grassland, savannas or lichens/mosses) 0 1 1 1
Sparse (<15%) vegetation 0 1 1 1
Closed to open (>15%) broadleaved forest regularly flooded (semi-permanently or temporarily) – Fresh or brackish water 0 0 0 0
Closed (>40%) broadleaved forest or shrubland permanently flooded – Saline or brackish water 0 0 0 0
Closed to open (>15%) grassland or woody vegetation on regularly flooded or waterlogged soil – Fresh, brackish or saline water 0 0 0 0
Artificial surfaces and associated areas (Urban areas >50%) 1 1 1 1
Bare areas 0 0 0 0
Water bodies 0 0 0 0
Permanent snow and ice 0 0 0 0
No data (burnt areas, clouds, …) 0 0 0 0

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

LST = Land Surface Temperature. NDVI Normalised Difference vegetation Index; EVI Enhanced Vegetation Index. DEM Digital Elevation. All files starting with ED18 are Fourier processed MODIS Satellite Imagery produced by the TALA research Group Oxford [11].

Files with Bioclim and Worldclim in filename derived from WORLDCLIM datasets [5].

GRUMP derived from population layers produced by [13].

JRC Accessibility downloaded from [14].

Length of growing Period derived from data provided by FAO, Rome. Available from www.vmerge.com [12].

All layers extracted and standardised by ERGO for EDENext (www.edenextdata.com) [15]

Modelling procedure

A range of modelling techniques available in the VECMAP [3] system including Non Linear Discriminant Analysis [8], Logistic Regression [9] and Random Forests [10], using 10–25 repeated bootstraps per run, were used to provide a range of outputs for expert assessment.

The covariates offered to the modelling procedures were drawn from a standardised set of ecological parameters, and in particular a suite of Fourier processed MODIS satellite imagery [11] which provides a range of biologically interpretable variables related to levels and seasonality of temperature and vegetation related factors during the period 2000–2012. These are summarised in Table 3, and are all available to registered members of the VMerge/EDENext Data Website (www.vmergedata.com) [12].

Table 3

Covariates offered to modelling procedures.


1 ED1803A0: Middle infra-red mean 38 ED1814P2: NDVI phase 2
2 ED1803A1: Middle infra-red amplitude 1 39 ED1814P3: NDVI phase 3
3 ED1803A2: Middle infra-red amplitude 2 40 ED1814VR: NDVI variance 41 ED1815A0: EVI mean
4 ED1803A3: Middle infra-red amplitude 3 42 ED1815A1: EVI amplitude 1
5 ED1803MN: Middle infra-red minimum 43 ED1815A2: EVI amplitude 2
6 ED1803MX: Middle infra-red maximum 44 ED1815A3: EVI amplitude 3
7 ED1803P1: Middle infra-red phase 1 45 ED1815MN: EVI minimum
8 ED1803P2: Middle infra-red phase 2 46 ED1815MX: EVI maximum
9 ED1803P3: Middle infra-red phase 3 47 ED1815P1: EVI phase 1
10 ED1803VR: Middle infra-red variance 48 ED1815P2: EVI phase 2
11 ED1807A0: Daytime LST mean 49 ED1815P3: EVI phase 3
12 ED1807A1: Daytime LST amplitude 1 50 ED1815VR: EVI variance
13 ED1807A2: Daytime LST amplitude 2 51 EDBC2K12: BioClim Annual Precipitation
14 ED1807A3: Daytime LST amplitude 3 52 EDBC2K13: BioClim Precipitation of Wettest Month
15 ED1807MN: Daytime LST minimum 53 EDBC2K14: BioClim Precipitation of Driest Month
16 ED1807MX: Daytime LST maximum 54 EDBC2K15: BioClim Precipitation Seasonality (Coefficient of Variation)
17 ED1807P1: Daytime LST phase 1 55 EDBC2K16: BioClim Precipitation of Wettest Quarter
18 ED1807P2: Daytime LST phase 2 56 EDBC2K17: BioClim Precipitation of Driest Quarter
19 ED1807P3: Daytime LST phase 3 57 EDBC2K18: BioClim Precipitation of Warmest Quarter
20 ED1807VR: Daytime LST variance 58 EDBC2K19: BioClim Precipitation of Coldest Quarter
21 ED1808A0: Nighttime LST mean 59 EDV590AS: DEM (Aspect)
22 ED1808A1: Nighttime LST amplitude 1 60 EDV590EL: DEM (Elevation)
23 ED1808A2: Nighttime LST amplitude 2 61 EDV590RG: DEM (Ruggedness)
24 ED1808A3: Nighttime LST amplitude 3 62 EDWC57A0: WORLDCLIM precipitation mean
25 ED1808MN: Nighttime LST minimum 63 EDWC57A1: WORLDCLIM precipitation amplitude 1
26 ED1808MX: Nighttime LST maximum 64 EDWC57A2: WORLDCLIM precipitation amplitude 2
27 ED1808P1: Nighttime LST phase 1 64 EDWC57A2: WORLDCLIM precipitation amplitude 2
28 ED1808P2: Nighttime LST phase 2 65 EDWC57A3: WORLDCLIM precipitation amplitude 3
29 ED1808P3: Nighttime LST phase 3 66 EDWC57MN: WORLDCLIM precipitation minimum
30 ED1808VR: Nighttime LST variance 67 EDWC57MX: WORLDCLIM precipitation maximum
31 ED1814A0: NDVI mean 68 EDWC57P1: WORLDCLIM precipitation phase 1
32 ED1814A1: NDVI amplitude 1 69 EDWC57P2: WORLDCLIM precipitation phase 2
33 ED1814A2: NDVI amplitude 2 70 EDWC57P3: WORLDCLIM precipitation phase 3
34 ED1814A3: NDVI amplitude 3 71 EDWC57VR: WORLDCLIM precipitation variance
35 ED1814MN: NDVI minimum 72 EDXXGRPD: GRUMP Population density
36 ED1814MX: NDVI maximum 73 EDXXGRPW: GRUMP Population weighted
37 ED1814P1: NDVI phase 1 74 EDXXJRCA: JRC Access
75 EDXXLPG1: Length of Growing Period LGP

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

LST = Land Surface Temperature. NDVI Normalised Difference vegetation Index; EVI Enhanced Vegetation Index. DEM Digital Elevation. All files starting with ED18 are Fourier processed MODIS Satellite Imagery produced by the TALA research Group Oxford [11].

Files with Bioclim and Worldclim in filename derived from WORLDCLIM datasets [5].

GRUMP derived from population layers produced by [13].

JRC Accessibility downloaded from [14].

Length of growing Period derived from data provided by FAO, Rome. Available from www.vmerge.com [12].

All layers extracted and standardised by ERGO for EDENext (www.edenextdata.com) [15]

Output layers

The suitability masked modelled outputs are produced in the form of probability maps at the pixel level with a resolution of 1 kilometre for each species. Quick view for each vector species is available to view in Appendix 2 available within this data package.

Sampling strategy

Training sample point data for the model was extracted as follows:

  • Random present points were created from any area within a NUTS 3 polygon recorded as present and where the suitability masked did not indicate unsuitability.
  • Random absence points were selected areas from identified in the mask as unsuitable.

Quality Control

The model outputs were initially evaluated using the standard, and extensive, accuracy metrics (e.g. R-squared, AIC, Kappa, Confusion matrices) provided by the VECMAP [3] software. Providing the accuracy metrics indicated sufficient statistical reliability. The range of models were then sent to selected experts who were asked to choose from the selection provided. Experts included individuals listed in the Data Creator section of this paper.

In the first phase of modelling (Ph. perniciosus and Ph. tobbi) the best model selected by the experts was used as the final model for that species. During phase 2 of the modelling (Ph. ariasi and Ph. Papatasi), ensembles of the different model techniques were preferred to attempt to iron out any inherent bias within individual modelling methods. Naturally if a model was not approved by the network experts it was not included in the ensemble.

Ground truthing has yet to be completed on these models although fieldwork has been subsequently sponsored by the VectorNet project which will visit areas which have been modelled, but currently have no data available. So retrospective quality assessments should be completed in the future.

Constraints

There were no constraints in the data production.

Privacy

N/A.

Ethics

N/A.

4. Dataset description

Data type

Processed data; Interpretation of data.

Ontologies

N/A.

Format names and versions

JPG. JP2, TIF, TFW, XML.

Creation dates

The start and end dates of when the data was created (13/04/2013).

Dataset creators

The contributors listed in the table below all contributed data into the VBORNET database for one or all of the species detailed in this paper. Bulent Alten and Ozge Erisoz Kasap were key in defining the unsuitable habitat and environmental limits used in the input suitability masks. While Bulent Alten’s extensive experience of research in the field of Phlebomines in and around Europe were extremely useful when assessing the maps and the success of the model outputs.

Contributor Affiliation

Alten, Bulent Hacettepe University, Ankara, Turkey
Dikolli, Enkelejda Institute of Public Health, Tirana, Albania
Falcuta, Elena Cantacuzino Institute, Bucharest, Romania
Gunay, Filiz Hacettepe University, Ankara, Turkey
Hendrickx, Guy Avia-GIS, Belgium
Ivovic, Vladimir University of Primorska, Koper, Slovenia
Karakus, Mehmet Hacettepe University, Ankara, Turkey
Kasap, Ozge Erisoz Hacettepe University, Ankara, Turkey
Kavur, Hakan Cukurova University, Dept of Medical Parasitology, Adana, Turkey
Ognyan, Mikov National Centre of Infectious and Parasitic Diseases, Parasitology and Tropical Medicine, Sofia, Bulgaria
Oguz, Gizem Hacettepe University, Ankara, Turkey
Ozbel, Yusuf Ege University Faculty of Medicine Department of Parasitology, Izmir, Turkey
Pajovic, Igor University of Montenegro, Biotechnical Faculty, Montenegro
Petric, Dusan Faculty of Agriculture, University of Novi Sad, Serbia
Saska, Aleksandra Science and Research Center, Koper, Slovenia
Schaffner, Francis Consultancy, France
Sousa, Carla A. Instituto de Higiene e Mdecicina Tropical, Lisbon, Portugal
Zygutiene, Milda Centre for Communicable diseases and AIDS, Vilnius, Lithuania

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.

LST = Land Surface Temperature. NDVI Normalised Difference vegetation Index; EVI Enhanced Vegetation Index. DEM Digital Elevation. All files starting with ED18 are Fourier processed MODIS Satellite Imagery produced by the TALA research Group Oxford [11].

Files with Bioclim and Worldclim in filename derived from WORLDCLIM datasets [5].

GRUMP derived from population layers produced by [13].

JRC Accessibility downloaded from [14].

Length of growing Period derived from data provided by FAO, Rome. Available from www.vmerge.com [12].

All layers extracted and standardised by ERGO for EDENext (www.edenextdata.com) [15]

Language

English.

Programming language

N/A.

Licence

The open licence under which the data has been deposited CC-BY.

Accessibility criteria

The data are distributed as GIS raster GeoTIFF formats. Which is a standard proprietary GIS raster format. To access and analyse the raster data directly GeoTIFFs can be read by most GIS software and some other software packages. These formats are compatible with proprietary (ESRI ArcGIS) and open source Quantum GIS (QGIS) or R-project raster package). If the user has no suitable software already installed the authors suggest downloading the open source QGIS software free of charge from http://www.qgis.org to view these data.

A simple schematic of the data layers and directories found within this data package is shown below with descriptions where filenames are not self-explanatory:

  • Appendices – Directory containing the appendices for this document.
    • ohd_VBNPhBD_AltenEtAl_Appendix1.pdf
    • ohd_VBNPhBD_AltenEtAl_Appendix2.pdf
  • Quickview – Directory containing small JPEG files allowing the reader to view the data visually without specialist software.
    • appendix1mapsPHAR.jpg – VBORNET Status Phlebotomus ariasi
    • appendix1mapsPHPA.jpg – VBORNET Status Phlebotomus papatasi
    • appendix1mapsPHPE.jpg – VBORNET Status Phlebotomus perniciosus
    • appendix1mapsPHTO.jpg – VBORNET Status Phlebotomus tobbi
    • appendix2mapsPHAR.jpg – Model output Phlebotomus ariasi
    • appendix2mapsPHPA.jpg – Model output Phlebotomus papatasi
    • appendix2mapsPHPE.jpg – Model output Phlebotomus perniciosus
    • appendix2mapsPHTO.jpg – Model output Phlebotomus tobbi
  • Tiff – Directory containing model output data for display and interrogation within GIS and geostatistical software.*
    • pharMskense.tif – Model output Phlebotomus ariasi
    • phpaMskense.tif – Model output Phlebotomus papatasi
    • phpemodelMsk.tif – Model output Phlebotomus perniciosus
    • phtomodelMsk.tif – Model output Phlebotomus tobbi

*Only the tif files within this directory are listed. Other file formats of the same name within the directory are ancillary files that provide additional data to the GIS software and as a rule should be copied along with the TIFF file of the same name if you are moving the data between directories.

Publication date

If already known, the date the dataset was published in the repository (23/08/2016).

5. Reuse potential

These layers have been created in an attempt to identify probable areas of species distribution where there are currently no sample data. These maps therefore could be useful in identifying suitable areas for further sampling in an attempt to identify the true distribution of the species. The VectorNet project plans to utilise these datasets in such a way.

The covariates of the models are also mainly climate orientated. A possible avenue of further work therefore could be to use the models to assess the potential change in distribution after a shift in climate parameters.