VBORNET Gap Analysis: Sand Fly Vector Distribution Models Utilised to Identify Areas of Potential Species Distribution in Areas Lacking Records

The known distributions of these species within the project area (Europe, the Mediterranean Basin, North Africa, and Eurasia) are currently incomplete to a greater or lesser degree. The models are designed to fill the gaps with predicted distributions, to provide a) assistance in targeting surveys to collect distribution data for those areas with no field validated information, and b) a first indication of project wide distributions.


Overview
Introduction/Study Description VBORNET [1] was an initiative of the European Centre for Disease Prevention and Control (ECDC), which ran from 2009 to 2014. The project established a European network of entomological and public health specialists in order to assist ECDC in its preparedness activities on vector-borne diseases (VBD). As part of this work a database collating validated records of key species distributions were commissioned. This data paper focusses on four sand fly species Phlebotomus ariasi, Phlebotomus papatasi, Phlebotomus perniciosus and Phlebotomus tobbi vectors of Leishmaniasis.
VectorNet [2] has continued this work and builds upon VBORNET by supporting the collection of data on vectors and pathogens in vectors, related to both animal and human health. VectorNet is a joint initiative of the European Food Safety Authority (EFSA) and the European Centre for Disease Prevention and Control (ECDC), which started in May 2014.
Whilst VBORNET and VectorNet have made substantial progress collating European data on key vector species, the coverage is still incomplete. The 'Gap Analysis' work within these projects aims to identify those areas of likely species distribution within the project extent where there are no current data. These estimates produced by spatial modelling techniques are intended to meet two objectives: firstly to help direct extensive VectorNet sampling efforts in the field; and secondly to provide first indications of the current likely extent and distribution of key vector species within continental Europe and its surrounding regions. It is hoped that publishing these models will aid the VectorNet network of experts to engage the wider research and professional community in the drive to expand and validate the VectorNet database. Readers are encouraged to contact the authors or visit the VectorNet website [2].
For each species probability of presence maps at the resolution of 1km were generated using a variety of well-established spatial modelling techniques available through the VECMAP system [3]. Both the input data and the resulting models were iteratively assessed by project experts and the best performing are included in this data package.

Methods
For each of the species the following method was followed.

Identifying presence and absence training data
The reported distributions of each of the four sand fly species by VBORNET were used as the basis for species present training data for the analysis. Data reported from the VBORNET map published January 2013 were used for Phlebotomus perniciosus and Phlebotomus tobbi, and January 2014 for Phlebotomus ariasi and Phlebotomus papatasi. Maps of the recorded known distributions at that time are presented in Appendix 1 available within this data package. These reported distributions were recorded in VBORNET at a coarse NUTS 3 polygon scale. The data originates from a combination of both aggregated data contributed by the authors and listed contributers. As well as a literature review completed by the VBORNET vector group leaders. The full data set and sources are available to contributors of VBORNET and VectorNet.

Habitat suitability and environmental limits
The recorded distributions were too coarse to be utilised by the model framework. In addition, the selected modelling methods required information on both presence and absence to calibrate the modelling process. It was therefore necessary to identify areas of absence within NUTS 3 regions assigned as present. To do this a suitability mask at 1 km resolution was compiled by requesting experts within the network (see the Data Creators section) to identify primary, secondary and unsuitable land cover classes. For the phase two models (Ph. ariasi and Ph. papatasi) environmental limiting factors which are derived from remotely sensed imagery were also identified and used in the mask.
Environmental limits masks were created using altitude measures and temperature limits derived from the SRTM 100m Digital Elevation Model [4] and BIOCLIM [5] temperature layers respectively. Phlebotomus ariasi limits were set as the minimum altitude within a 1km square must be below 1700m. While the temperature limits were using the BIOCLIM Tmax layer between 15-32 degrees centigrade. For Ph. papatasi minimum altitude limits must be below 2000m and between 20-30 degrees centigrade using the BIOCLIM Tmean layer. Whilst these values are loosely based on laboratory findings (Personal communication with Ozge Erisoz Kasap -See Data contributors) the values were assessed visually and calibrated to account for differences between laboratory measurements of species behaviour and recorded remotely sensed values at coarse resolution.
The land cover masks were defined utilising the 100m Corine land cover dataset [6] and the 300m GLOBCOVER [7] product where no Corine data was available. Definitions of land class suitability for each species as defined by experts can be found in Tables 1 and 2.

Modelling procedure
A range of modelling techniques available in the VECMAP [3] system including Non Linear Discriminant Analysis [8], Logistic Regression [9] and Random Forests [10], using 10-25 repeated bootstraps per run, were used to provide a range of outputs for expert assessment.
The covariates offered to the modelling procedures were drawn from a standardised set of ecological parameters, and in particular a suite of Fourier processed MODIS satellite imagery [11] which provides a range of biologically interpretable variables related to levels and seasonality of temperature and vegetation related factors during the period 2000-2012. These are summarised in Table 3, and are all available to registered members of the VMerge/ EDENext Data Website (www.vmergedata.com) [12].

Output layers
The suitability masked modelled outputs are produced in the form of probability maps at the pixel level with a resolution of 1 kilometre for each species. Quick view    for each vector species is available to view in Appendix 2 available within this data package.

Sampling strategy
Training sample point data for the model was extracted as follows: • Random present points were created from any area within a NUTS 3 polygon recorded as present and where the suitability masked did not indicate unsuitability. • Random absence points were selected areas from identified in the mask as unsuitable.

Quality Control
The model outputs were initially evaluated using the standard, and extensive, accuracy metrics (e.g. R-squared, AIC, Kappa, Confusion matrices) provided by the VECMAP [3] software. Providing the accuracy metrics indicated sufficient statistical reliability. The range of models were then sent to selected experts who were asked to choose from the selection provided. Experts included individuals listed in the Data Creator section of this paper.
In the first phase of modelling (Ph. perniciosus and Ph. tobbi) the best model selected by the experts was used as the final model for that species. During phase 2 of the modelling (Ph. ariasi and Ph. Papatasi), ensembles of the different model techniques were preferred to attempt to iron out any inherent bias within individual modelling methods. Naturally if a model was not approved by the network experts it was not included in the ensemble.
Ground truthing has yet to be completed on these models although fieldwork has been subsequently sponsored by the VectorNet project which will visit areas which have been modelled, but currently have no data available. So retrospective quality assessments should be completed in the future.

Constraints
There were no constraints in the data production.

Creation dates
The start and end dates of when the data was created (13/04/2013).

Dataset creators
The contributors listed in the table below all contributed data into the VBORNET database for one or all of the species detailed in this paper. Bulent Alten and Ozge Erisoz Kasap were key in defining the unsuitable habitat and environmental limits used in the input suitability masks. While Bulent Alten's extensive experience of research in the field of Phlebomines in and around Europe were extremely useful when assessing the maps and the success of the model outputs.
The data are distributed as GIS raster GeoTIFF formats. Which is a standard proprietary GIS raster format. To access and analyse the raster data directly GeoTIFFs can be read by most GIS software and some other software packages. These formats are compatible with proprietary (ESRI ArcGIS) and open source Quantum GIS (QGIS) or R-project raster package). If the user has no suitable software already installed the authors suggest downloading the open source QGIS software free of charge from http:// www.qgis.org to view these data.
A simple schematic of the data layers and directories found within this data package is shown below with descriptions where filenames are not self-explanatory: *Only the tif files within this directory are listed. Other file formats of the same name within the directory are ancillary files that provide additional data to the GIS software and as a rule should be copied along with the TIFF file of the same name if you are moving the data between directories.

Publication date
If already known, the date the dataset was published in the repository (23/08/2016).

Reuse potential
These layers have been created in an attempt to identify probable areas of species distribution where there are currently no sample data. These maps therefore could be useful in identifying suitable areas for further sampling in an attempt to identify the true distribution of the species. The VectorNet project plans to utilise these datasets in such a way.
The covariates of the models are also mainly climate orientated. A possible avenue of further work therefore could be to use the models to assess the potential change in distribution after a shift in climate parameters.

Additional Files
The additional files for this article can be found as follows: