1. Overview
Introduction/Study Description
VBORNET [1] was an initiative of the European Centre for Disease Prevention and Control (ECDC), which ran from 2009 to 2014. The project established a European network of entomological and public health specialists in order to assist ECDC in its preparedness activities on vector-borne diseases (VBD). As part of this work a database collating validated records of key species distributions were commissioned. This data paper focusses on four sand fly species Phlebotomus ariasi, Phlebotomus papatasi, Phlebotomus perniciosus and Phlebotomus tobbi vectors of Leishmaniasis.
VectorNet [2] has continued this work and builds upon VBORNET by supporting the collection of data on vectors and pathogens in vectors, related to both animal and human health. VectorNet is a joint initiative of the European Food Safety Authority (EFSA) and the European Centre for Disease Prevention and Control (ECDC), which started in May 2014.
Whilst VBORNET and VectorNet have made substantial progress collating European data on key vector species, the coverage is still incomplete. The ‘Gap Analysis’ work within these projects aims to identify those areas of likely species distribution within the project extent where there are no current data. These estimates produced by spatial modelling techniques are intended to meet two objectives: firstly to help direct extensive VectorNet sampling efforts in the field; and secondly to provide first indications of the current likely extent and distribution of key vector species within continental Europe and its surrounding regions. It is hoped that publishing these models will aid the VectorNet network of experts to engage the wider research and professional community in the drive to expand and validate the VectorNet database. Readers are encouraged to contact the authors or visit the VectorNet website [2].
For each species probability of presence maps at the resolution of 1km were generated using a variety of well-established spatial modelling techniques available through the VECMAP system [3]. Both the input data and the resulting models were iteratively assessed by project experts and the best performing are included in this data package.
2. Context
Spatial coverage
Description: Continental Europe
Northern boundary: 71.8
Southern boundary: 33.5
Eastern boundary: 62.3
Western boundary: –11.2
Temporal coverage
Known presence up to (31/01/2013).
Species
Phlebotomus ariasi vector of Leismania infantum and phleboviruses.
Phlebotomus papatasi vector of Leismania major and phleboviruses.
Phlebotomus perniciosus vector of Leismania infantum and phleboviruses.
Phlebotomus tobbi vector of Leismania infantum, Leishmania donovani and phleboviruses.
3. Methods
For each of the species the following method was followed.
Steps
Identifying presence and absence training data
The reported distributions of each of the four sand fly species by VBORNET were used as the basis for species present training data for the analysis. Data reported from the VBORNET map published January 2013 were used for Phlebotomus perniciosus and Phlebotomus tobbi, and January 2014 for Phlebotomus ariasi and Phlebotomus papatasi. Maps of the recorded known distributions at that time are presented in Appendix 1 available within this data package. These reported distributions were recorded in VBORNET at a coarse NUTS 3 polygon scale. The data originates from a combination of both aggregated data contributed by the authors and listed contributers. As well as a literature review completed by the VBORNET vector group leaders. The full data set and sources are available to contributors of VBORNET and VectorNet.
Habitat suitability and environmental limits
The recorded distributions were too coarse to be utilised by the model framework. In addition, the selected modelling methods required information on both presence and absence to calibrate the modelling process. It was therefore necessary to identify areas of absence within NUTS 3 regions assigned as present. To do this a suitability mask at 1 km resolution was compiled by requesting experts within the network (see the Data Creators section) to identify primary, secondary and unsuitable land cover classes. For the phase two models (Ph. ariasi and Ph. papatasi) environmental limiting factors which are derived from remotely sensed imagery were also identified and used in the mask.
Environmental limits masks were created using altitude measures and temperature limits derived from the SRTM 100m Digital Elevation Model [4] and BIOCLIM [5] temperature layers respectively. Phlebotomus ariasi limits were set as the minimum altitude within a 1km square must be below 1700m. While the temperature limits were using the BIOCLIM Tmax layer between 15–32 degrees centigrade. For Ph. papatasi minimum altitude limits must be below 2000m and between 20–30 degrees centigrade using the BIOCLIM Tmean layer. Whilst these values are loosely based on laboratory findings (Personal communication with Ozge Erisoz Kasap – See Data contributors) the values were assessed visually and calibrated to account for differences between laboratory measurements of species behaviour and recorded remotely sensed values at coarse resolution.
The land cover masks were defined utilising the 100m Corine land cover dataset [6] and the 300m GLOBCOVER [7] product where no Corine data was available. Definitions of land class suitability for each species as defined by experts can be found in Tables 1 and 2.
Table 1
Reclassed values defining the Corine [6].
CORINE LABEL | PHTO | PHPE | PHAR | PHPA |
---|---|---|---|---|
Continuous urban fabric | 0 | 0 | 0 | 1 |
Discontinuous urban fabric | 0 | 0 | 0 | 1 |
Industrial or commercial units | 0 | 0 | 0 | 1 |
Road and rail networks and associated land | 0 | 0 | 0 | 1 |
Port areas | 0 | 0 | 0 | 1 |
Airports | 0 | 0 | 0 | 1 |
Mineral extraction sites | 0 | 0 | 0 | 1 |
Dump sites | 0 | 0 | 0 | 1 |
Construction sites | 1 | 1 | 1 | 1 |
Green urban areas | 1 | 1 | 1 | 1 |
Sport and leisure facilities | 0 | 0 | 0 | 1 |
Non-irrigated arable land | 1 | 1 | 1 | 1 |
Permanently irrigated land | 1 | 1 | 1 | 1 |
Rice fields | 1 | 0 | 0 | 1 |
Vineyards | 1 | 1 | 1 | 1 |
Fruit trees and berry plantations | 1 | 1 | 1 | 1 |
Olive groves | 1 | 1 | 1 | 1 |
Pastures | 1 | 0 | 0 | 1 |
Annual crops associated with permanent crops | 1 | 1 | 1 | 1 |
Complex cultivation patterns | 1 | 1 | 1 | 1 |
Land principally occupied by agriculture, with significant areas of natural vegetation | 1 | 1 | 1 | 1 |
Agro-forestry areas | 1 | 1 | 1 | 1 |
Broad-leaved forest | 1 | 1 | 1 | 0 |
Coniferous forest | 1 | 1 | 1 | 1 |
Mixed forest | 1 | 1 | 1 | 1 |
Natural grasslands | 1 | 0 | 0 | 1 |
Moors and heathland | 1 | 1 | 1 | 1 |
Sclerophyllous vegetation | 1 | 1 | 1 | 1 |
Transitional woodland-shrub | 1 | 1 | 1 | 1 |
Beaches, dunes, sands | 0 | 0 | 0 | 0 |
Bare rocks | 1 | 1 | 1 | 0 |
Sparsely vegetated areas | 0 | 1 | 1 | 1 |
Burnt areas | 0 | 0 | 0 | 0 |
Glaciers and perpetual snow | 0 | 0 | 0 | 0 |
Inland marshes | 0 | 0 | 0 | 0 |
Peat bogs | 0 | 0 | 0 | 0 |
Salt marshes | 0 | 0 | 0 | 0 |
Salines | 0 | 0 | 0 | 0 |
Intertidal flats | 0 | 0 | 0 | 0 |
Water courses | 0 | 0 | 0 | 0 |
Water bodies | 0 | 0 | 0 | 0 |
Coastal lagoons | 0 | 0 | 0 | 0 |
Estuaries | 0 | 0 | 0 | 0 |
Sea and ocean | 0 | 0 | 0 | 0 |
suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.
Table 2
Reclassed values defining the Globcover [7].
GLOBCOVER LABEL | PHTO | PHPE | PHAR | PHPA |
---|---|---|---|---|
Post-flooding or irrigated croplands (or aquatic) | 1 | 0 | 0 | 1 |
Rainfed croplands | 1 | 0 | 0 | 1 |
Mosaic cropland (50–70%) / vegetation (grassland/shrubland/forest) (20–50%) | 1 | 1 | 1 | 1 |
Mosaic vegetation (grassland/shrubland/forest) (50–70%) / cropland (20–50%) | 1 | 1 | 1 | 1 |
Closed to open (>15%) broadleaved evergreen or semi-deciduous forest (>5m) | 1 | 1 | 1 | 1 |
Closed (>40%) broadleaved deciduous forest (>5m) | 1 | 1 | 1 | 1 |
Open (15–40%) broadleaved deciduous forest/woodland (>5m) | 1 | 1 | 1 | 1 |
Closed (>40%) needleleaved evergreen forest (>5m) | 1 | 1 | 1 | 1 |
Open (15–40%) needleleaved deciduous or evergreen forest (>5m) | 1 | 1 | 1 | 1 |
Closed to open (>15%) mixed broadleaved and needleleaved forest (>5m) | 1 | 1 | 1 | 1 |
Mosaic forest or shrubland (50–70%) / grassland (20–50%) | 1 | 1 | 1 | 1 |
Mosaic grassland (50–70%) / forest or shrubland (20–50%) | 1 | 1 | 1 | 1 |
Closed to open (>15%) (broadleaved or needleleaved, evergreen or deciduous) shrubland (<5m) | 0 | 1 | 1 | 1 |
Closed to open (>15%) herbaceous vegetation (grassland, savannas or lichens/mosses) | 0 | 1 | 1 | 1 |
Sparse (<15%) vegetation | 0 | 1 | 1 | 1 |
Closed to open (>15%) broadleaved forest regularly flooded (semi-permanently or temporarily) – Fresh or brackish water | 0 | 0 | 0 | 0 |
Closed (>40%) broadleaved forest or shrubland permanently flooded – Saline or brackish water | 0 | 0 | 0 | 0 |
Closed to open (>15%) grassland or woody vegetation on regularly flooded or waterlogged soil – Fresh, brackish or saline water | 0 | 0 | 0 | 0 |
Artificial surfaces and associated areas (Urban areas >50%) | 1 | 1 | 1 | 1 |
Bare areas | 0 | 0 | 0 | 0 |
Water bodies | 0 | 0 | 0 | 0 |
Permanent snow and ice | 0 | 0 | 0 | 0 |
No data (burnt areas, clouds, …) | 0 | 0 | 0 | 0 |
suitability layers 1 = suitable and 0 = unsuitable. PHPE = Phlebotomus perniciosus, PHTO = Phlebotomus tobbi, PHAR = Phlebotomus ariasi and PHPA = Phlebotomus papatasi.
Modelling procedure
A range of modelling techniques available in the VECMAP [3] system including Non Linear Discriminant Analysis [8], Logistic Regression [9] and Random Forests [10], using 10–25 repeated bootstraps per run, were used to provide a range of outputs for expert assessment.
The covariates offered to the modelling procedures were drawn from a standardised set of ecological parameters, and in particular a suite of Fourier processed MODIS satellite imagery [11] which provides a range of biologically interpretable variables related to levels and seasonality of temperature and vegetation related factors during the period 2000–2012. These are summarised in Table 3, and are all available to registered members of the VMerge/EDENext Data Website (www.vmergedata.com) [12].
Table 3
Covariates offered to modelling procedures.
1 ED1803A0: Middle infra-red mean | 38 ED1814P2: NDVI phase 2 |
2 ED1803A1: Middle infra-red amplitude 1 | 39 ED1814P3: NDVI phase 3 |
3 ED1803A2: Middle infra-red amplitude 2 | 40 ED1814VR: NDVI variance 41 ED1815A0: EVI mean |
4 ED1803A3: Middle infra-red amplitude 3 | 42 ED1815A1: EVI amplitude 1 |
5 ED1803MN: Middle infra-red minimum | 43 ED1815A2: EVI amplitude 2 |
6 ED1803MX: Middle infra-red maximum | 44 ED1815A3: EVI amplitude 3 |
7 ED1803P1: Middle infra-red phase 1 | 45 ED1815MN: EVI minimum |
8 ED1803P2: Middle infra-red phase 2 | 46 ED1815MX: EVI maximum |
9 ED1803P3: Middle infra-red phase 3 | 47 ED1815P1: EVI phase 1 |
10 ED1803VR: Middle infra-red variance | 48 ED1815P2: EVI phase 2 |
11 ED1807A0: Daytime LST mean | 49 ED1815P3: EVI phase 3 |
12 ED1807A1: Daytime LST amplitude 1 | 50 ED1815VR: EVI variance |
13 ED1807A2: Daytime LST amplitude 2 | 51 EDBC2K12: BioClim Annual Precipitation |
14 ED1807A3: Daytime LST amplitude 3 | 52 EDBC2K13: BioClim Precipitation of Wettest Month |
15 ED1807MN: Daytime LST minimum | 53 EDBC2K14: BioClim Precipitation of Driest Month |
16 ED1807MX: Daytime LST maximum | 54 EDBC2K15: BioClim Precipitation Seasonality (Coefficient of Variation) |
17 ED1807P1: Daytime LST phase 1 | 55 EDBC2K16: BioClim Precipitation of Wettest Quarter |
18 ED1807P2: Daytime LST phase 2 | 56 EDBC2K17: BioClim Precipitation of Driest Quarter |
19 ED1807P3: Daytime LST phase 3 | 57 EDBC2K18: BioClim Precipitation of Warmest Quarter |
20 ED1807VR: Daytime LST variance | 58 EDBC2K19: BioClim Precipitation of Coldest Quarter |
21 ED1808A0: Nighttime LST mean | 59 EDV590AS: DEM (Aspect) |
22 ED1808A1: Nighttime LST amplitude 1 | 60 EDV590EL: DEM (Elevation) |
23 ED1808A2: Nighttime LST amplitude 2 | 61 EDV590RG: DEM (Ruggedness) |
24 ED1808A3: Nighttime LST amplitude 3 | 62 EDWC57A0: WORLDCLIM precipitation mean |
25 ED1808MN: Nighttime LST minimum | 63 EDWC57A1: WORLDCLIM precipitation amplitude 1 |
26 ED1808MX: Nighttime LST maximum | 64 EDWC57A2: WORLDCLIM precipitation amplitude 2 |
27 ED1808P1: Nighttime LST phase 1 | 64 EDWC57A2: WORLDCLIM precipitation amplitude 2 |
28 ED1808P2: Nighttime LST phase 2 | 65 EDWC57A3: WORLDCLIM precipitation amplitude 3 |
29 ED1808P3: Nighttime LST phase 3 | 66 EDWC57MN: WORLDCLIM precipitation minimum |
30 ED1808VR: Nighttime LST variance | 67 EDWC57MX: WORLDCLIM precipitation maximum |
31 ED1814A0: NDVI mean | 68 EDWC57P1: WORLDCLIM precipitation phase 1 |
32 ED1814A1: NDVI amplitude 1 | 69 EDWC57P2: WORLDCLIM precipitation phase 2 |
33 ED1814A2: NDVI amplitude 2 | 70 EDWC57P3: WORLDCLIM precipitation phase 3 |
34 ED1814A3: NDVI amplitude 3 | 71 EDWC57VR: WORLDCLIM precipitation variance |
35 ED1814MN: NDVI minimum | 72 EDXXGRPD: GRUMP Population density |
36 ED1814MX: NDVI maximum | 73 EDXXGRPW: GRUMP Population weighted |
37 ED1814P1: NDVI phase 1 | 74 EDXXJRCA: JRC Access |
75 EDXXLPG1: Length of Growing Period LGP |
LST = Land Surface Temperature. NDVI Normalised Difference vegetation Index; EVI Enhanced Vegetation Index. DEM Digital Elevation. All files starting with ED18 are Fourier processed MODIS Satellite Imagery produced by the TALA research Group Oxford [11].
Files with Bioclim and Worldclim in filename derived from WORLDCLIM datasets [5].
GRUMP derived from population layers produced by [13].
JRC Accessibility downloaded from [14].
Length of growing Period derived from data provided by FAO, Rome. Available from www.vmerge.com [12].
All layers extracted and standardised by ERGO for EDENext (www.edenextdata.com) [15]
Output layers
The suitability masked modelled outputs are produced in the form of probability maps at the pixel level with a resolution of 1 kilometre for each species. Quick view for each vector species is available to view in Appendix 2 available within this data package.
Sampling strategy
Training sample point data for the model was extracted as follows:
- Random present points were created from any area within a NUTS 3 polygon recorded as present and where the suitability masked did not indicate unsuitability.
- Random absence points were selected areas from identified in the mask as unsuitable.
Quality Control
The model outputs were initially evaluated using the standard, and extensive, accuracy metrics (e.g. R-squared, AIC, Kappa, Confusion matrices) provided by the VECMAP [3] software. Providing the accuracy metrics indicated sufficient statistical reliability. The range of models were then sent to selected experts who were asked to choose from the selection provided. Experts included individuals listed in the Data Creator section of this paper.
In the first phase of modelling (Ph. perniciosus and Ph. tobbi) the best model selected by the experts was used as the final model for that species. During phase 2 of the modelling (Ph. ariasi and Ph. Papatasi), ensembles of the different model techniques were preferred to attempt to iron out any inherent bias within individual modelling methods. Naturally if a model was not approved by the network experts it was not included in the ensemble.
Ground truthing has yet to be completed on these models although fieldwork has been subsequently sponsored by the VectorNet project which will visit areas which have been modelled, but currently have no data available. So retrospective quality assessments should be completed in the future.
Constraints
There were no constraints in the data production.
Privacy
N/A.
Ethics
N/A.
4. Dataset description
Object name
Data type
Processed data; Interpretation of data.
Ontologies
N/A.
Format names and versions
JPG. JP2, TIF, TFW, XML.
Creation dates
The start and end dates of when the data was created (13/04/2013).
Dataset creators
The contributors listed in the table below all contributed data into the VBORNET database for one or all of the species detailed in this paper. Bulent Alten and Ozge Erisoz Kasap were key in defining the unsuitable habitat and environmental limits used in the input suitability masks. While Bulent Alten’s extensive experience of research in the field of Phlebomines in and around Europe were extremely useful when assessing the maps and the success of the model outputs.
Contributor | Affiliation |
---|---|
Alten, Bulent | Hacettepe University, Ankara, Turkey |
Dikolli, Enkelejda | Institute of Public Health, Tirana, Albania |
Falcuta, Elena | Cantacuzino Institute, Bucharest, Romania |
Gunay, Filiz | Hacettepe University, Ankara, Turkey |
Hendrickx, Guy | Avia-GIS, Belgium |
Ivovic, Vladimir | University of Primorska, Koper, Slovenia |
Karakus, Mehmet | Hacettepe University, Ankara, Turkey |
Kasap, Ozge Erisoz | Hacettepe University, Ankara, Turkey |
Kavur, Hakan | Cukurova University, Dept of Medical Parasitology, Adana, Turkey |
Ognyan, Mikov | National Centre of Infectious and Parasitic Diseases, Parasitology and Tropical Medicine, Sofia, Bulgaria |
Oguz, Gizem | Hacettepe University, Ankara, Turkey |
Ozbel, Yusuf | Ege University Faculty of Medicine Department of Parasitology, Izmir, Turkey |
Pajovic, Igor | University of Montenegro, Biotechnical Faculty, Montenegro |
Petric, Dusan | Faculty of Agriculture, University of Novi Sad, Serbia |
Saska, Aleksandra | Science and Research Center, Koper, Slovenia |
Schaffner, Francis | Consultancy, France |
Sousa, Carla A. | Instituto de Higiene e Mdecicina Tropical, Lisbon, Portugal |
Zygutiene, Milda | Centre for Communicable diseases and AIDS, Vilnius, Lithuania |
Language
English.
Programming language
N/A.
Licence
The open licence under which the data has been deposited CC-BY.
Accessibility criteria
The data are distributed as GIS raster GeoTIFF formats. Which is a standard proprietary GIS raster format. To access and analyse the raster data directly GeoTIFFs can be read by most GIS software and some other software packages. These formats are compatible with proprietary (ESRI ArcGIS) and open source Quantum GIS (QGIS) or R-project raster package). If the user has no suitable software already installed the authors suggest downloading the open source QGIS software free of charge from http://www.qgis.org to view these data.
A simple schematic of the data layers and directories found within this data package is shown below with descriptions where filenames are not self-explanatory:
-
Appendices – Directory containing the appendices for this document.
- ◦ ohd_VBNPhBD_AltenEtAl_Appendix1.pdf
- ◦ ohd_VBNPhBD_AltenEtAl_Appendix2.pdf
-
Quickview – Directory containing small JPEG files allowing the reader to view the data visually without specialist software.
- ◦ appendix1mapsPHAR.jpg – VBORNET Status Phlebotomus ariasi
- ◦ appendix1mapsPHPA.jpg – VBORNET Status Phlebotomus papatasi
- ◦ appendix1mapsPHPE.jpg – VBORNET Status Phlebotomus perniciosus
- ◦ appendix1mapsPHTO.jpg – VBORNET Status Phlebotomus tobbi
- ◦ appendix2mapsPHAR.jpg – Model output Phlebotomus ariasi
- ◦ appendix2mapsPHPA.jpg – Model output Phlebotomus papatasi
- ◦ appendix2mapsPHPE.jpg – Model output Phlebotomus perniciosus
- ◦ appendix2mapsPHTO.jpg – Model output Phlebotomus tobbi
-
Tiff – Directory containing model output data for display and interrogation within GIS and geostatistical software.*
- ◦ pharMskense.tif – Model output Phlebotomus ariasi
- ◦ phpaMskense.tif – Model output Phlebotomus papatasi
- ◦ phpemodelMsk.tif – Model output Phlebotomus perniciosus
- ◦ phtomodelMsk.tif – Model output Phlebotomus tobbi
*Only the tif files within this directory are listed. Other file formats of the same name within the directory are ancillary files that provide additional data to the GIS software and as a rule should be copied along with the TIFF file of the same name if you are moving the data between directories.
Repository location
Publication date
If already known, the date the dataset was published in the repository (23/08/2016).
5. Reuse potential
These layers have been created in an attempt to identify probable areas of species distribution where there are currently no sample data. These maps therefore could be useful in identifying suitable areas for further sampling in an attempt to identify the true distribution of the species. The VectorNet project plans to utilise these datasets in such a way.
The covariates of the models are also mainly climate orientated. A possible avenue of further work therefore could be to use the models to assess the potential change in distribution after a shift in climate parameters.