Wild boar Sus scrofa are an important component of the ecological and epidemiological systems within which vector-borne diseases persist. Wild boar are hosts to a number of vector species, and they can therefore impact on disease cycles as reservoirs of pathogens. Information on wild boar distribution and abundance could therefore make an important contribution to models of vector-borne disease risk.
With a single exception , the many studies that have focussed on the distribution, abundance and habitat-use of wild boar were generally carried out in relatively small areas such as national parks or at country level. Given the broader, continental scale required for effectively advising European policy on disease management, an attempt has been made to produce a continental scale distribution and abundance map.
This study combines a review of the existing literature along with abundance-related data from a range of sources, including national hunting organisations, international and national distribution databases, to provide a continental dataset and perspective of boar distribution and abundance.
To create the final European 1km resolution boar map, the combined quantitative data described above were constrained using a habitat suitability mask derived from the GlobCover land cover database informed by published descriptions of habitat preference as well as expert opinion. A number of spatial distribution modelling tools available from the VECMAP  Modelling suite were used to produce three final modelled distribution outputs for Europe using the Random Forest approach. These comprise a 1km probability of presence/absence layer, a 1km abundance index based on presence and habitat availability, and a 1km ranked abundance map based on regional abundance studies and national hunting figures.
Description: Continental Europe, including European Russia.
Northern boundary: 72.
Southern boundary: 10.
Eastern boundary: –24.5.
Western boundary: 60.
Sus scrofa, wild boar, pig (feral).
Binary presence and absence
Five independent sets of distribution data were combined to produce a single presence absence mask. The data sets used were as follows:
- The EMMA Database : Mapping Europe’s mammals using data from the Atlas of European Mammals.
- The Global Biodiversity Information Facility (GBIF) .
- IUCN Red List Dataset .
- The National Biodiversity Network  UK 10k Data.
- Spanish Ministry of Agriculture National Inventory of Biodiversity .
For much of the indicated range, the distributions detailed above were, by their nature indications of current presence limits. Within these designated boundaries there was no indication of absence. In order to introduce absences within these limits, suitability masks were defined using species-specific habitat preferences derived from land cover classes, using GLOBCOVER  at 1 km resolution Downloaded from the EDENext Data Portal . These suitability definitions are recorded in Table 1.
|11||Post-flooding or irrigated croplands (or aquatic)||0|
|20||Mosaic cropland (50–70%) / vegetation (grassland/shrubland/forest) (20–50%)||1|
|30||Mosaic vegetation (grassland/shrubland/forest) (50–70%) / cropland (20–50%)||1|
|40||Closed to open (>15%) broadleaved evergreen or semi-deciduous forest (>5m)||1|
|50||Closed (>40%) broadleaved deciduous forest (>5m)||1|
|60||Open (15–40%) broadleaved deciduous forest/woodland (>5m)||1|
|70||Closed (>40%) needleleaved evergreen forest (>5m)||1|
|90||Open (15–40%) needleleaved deciduous or evergreen forest (>5m)||1|
|100||Closed to open (>15%) mixed broadleaved and needleleaved forest (>5m)||1|
|110||Mosaic forest or shrubland (50–70%) / grassland (20–50%)||1|
|120||Mosaic grassland (50–70%) / forest or shrubland (20–50%)||1|
|130||Closed to open (>15%) (broadleaved or needleleaved, evergreen or deciduous) shrubland (<5m)||1|
|140||Closed to open (>15%) herbaceous vegetation (grassland, savannas or lichens/mosses)||0|
|150||Sparse (<15%) vegetation||0|
|160||Closed to open (>15%) broadleaved forest regularly flooded (semipermanently or temporarily)||1|
|170||Closed (>40%) broadleaved forest or shrubland permanently flooded – Saline or brackish water||0|
|180||Closed to open (>15%) grassland or woody vegetation on regularly flooded or waterlogged soil||1|
|190||Artificial surfaces and associated areas (Urban areas >50%)||0|
|220||Permanent snow and ice||0|
|230||No data (burnt areas, clouds, …)||0|
The presence absence data described in the previous section were combined with the suitability layer and aggregated to a 10km grid as a proportion of suitable habitat. The values of which were sampled and offered up to the Random Forest modelling framework within VECMAP  outlined later in this paper.
Boar Abundance Inputs
A comprehensive literature review of Sus scrofa abundance studies was undertaken [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33] which unearthed a piecemeal collection of abundance data focused mainly on small areas such as national parks or in some cases up to country level. These were recorded by different methods and across different time periods and has a spatial coverage across Europe which was far from regular. A notable exception was a recent review of wild boar population trends in 18 countries in Europe, based on hunting statistics .
To complement these abundance data, hunting figures were also identified for a number of countries at both national level and sub-national level [34, 35, 36, 37, 38]. After discussion with boar specialists it was agreed that, at least within a single country, hunting data could be considered as a valid proxy for abundance. In order to get the most complete coverage across the continent, it was decided to convert the available data to relative abundance indices that could be compared across countries by normalising the available number according to known national abundance figures.
The data were thus categorised into quantiles, with a fifth category of 0 or negligible boar numbers where known or inferred in areas defined as unsuitable habitat. The resulting database provided categorical boar abundance ranging from 0–4 (0 = none/negligible boar abundance to 4 = high abundance).
Model predictor suite
A suite of spatial covariate layers of environmental data were used by the VECMAP  model tools to define statistical relationships with the variable to be modelled. This predictor suite included a wide range of remotely sensed variables as follows:
- Remotely sensed climatic indicators derived by Temporal Fourier Analysis (TFA) of MODIS satellite imagery of several temperature parameters, and vegetation indices for the period 2001–2008 .
- Digital Elevation from the Shuttle Radar Topography Mission, together with derived aspect and ruggedness .
- Temporal Fourier Analysis (TFA) of Precipitation, and allied Bioclimatic Indicator (Bioclim) precipitation variables from the WORLDCLIM datasets .
- Length of Growing Period from United Nations Food and Agriculture Organisation .
- Travel Time to major towns from the Joint Research Centre at ISPRA .
- Human population density derived from the Global Rural Urban Mapping project at CEISIN .
- A distance weighted human population index layer  representing the likelihood of human visits based on the population within 30km.
Random Forest Spatial Modelling
Three measures of distribution/abundance were offered to the Random Forest module  using R-project  modules embedded within the VECMAP [3software. This flexible modelling framework can utilise either categorical or continuous input. In this case a presence absence (Boolean data) layer was chosen which resulted in: a probability surface output; a percentage of suitable habitat where presence is recorded, which resulted in a direct RF regression continuous output; a classified boar abundance index, which resulted in a RF categorical model output.
Sample points were extracted for input into the three different Random Forest models from a 10km matrix defining each of the three input variables within known distributions. Overall there were ~12000 random points used across Europe. The following VECMAP  default sample parameters were used to define the Random Forest prediction for each of the models:
- Prediction forest forest size: 100.
- Prediction forest sample size: 90.
- Prediction forest node size: 7.
These models are a first attempt at quantifying the boar distribution at this scale and there has been no ground truth validation of these maps so far. All the model outputs l, however, satisfy standard accuracy metrics (R squared or Cohen’s kappa coefficient where relevant) assuring statistical reliability. Model outputs have also been informally reviewed by project boar experts.
There were no constraints involved in data production.
Research involving human participants should be approved by your institutional review board or equivalent committee(s) and that board must be named here. In addition, the research must have been conducted in accordance with the Declaration of Helsinki.
Non-human research on vertebrates must comply with institutional, national, or international guidelines, and where available should have been approved by an appropriate ethics committee.
4. Dataset description
primary data, processed data, interpretation of data.
Format names and versions
TIF, JPEG, JPEG2000, XML.
As per author list.
All three layers have been provided as a quick look map in JPEG format to view from any image viewer.
The data itself are distributed as GIS Raster data in two formats. GeoTIFFs which is a standard proprietary GIS raster format. GeoJP2 (JPEG 2000 format) which is a non-proprietary format. To access and analyse the Raster data directly GeoTIFFs and GeoJPGs can be read by most GIS software and some other software packages These formats are compatible with proprietary (ESRI ArcGIS) and open source Quantum GIS (QGIS)  or R-project  raster package). If the user has no suitable software already installed the authors suggest downloading the open source QGIS software free of charge from http://www.qgis.org to view these data.
Retrieved 12:12, Aug 05, 2015 (GMT).
5. Reuse potential
Wild boar is a large mammal and a species for which numbers and distribution are increasing in mainland Europe. The species’ potential impact to environment, human activities and farming practices ensure the model outputs will be of interest to ecologists, human and animal health authorities and policy makers in a number of fields beyond that of the epidemiological goal of this study.
The authors declare that they have no competing interests.