IAP-25-051
Point process methods for mapping wildlife distributions from biased and incomplete data
Wildlife distributions are influenced by environmental factors and species interactions. Accurate wildlife distribution maps show variation across space and time and are crucial for ecology, conservation, biodiversity monitoring and managing human–wildlife conflicts. Typically, these maps are derived from a combination of multiple data sources, including small-scale structured surveys, sparse historical records, large-scale but potentially biased citizen science contributions, limited GPS tracking, and low-resolution satellite observations. As each of these sources is incomplete and biased, there is a pressing need for statistical methods that can integrate multiple data types and correct for their limitations.
Previous research has demonstrated that combining data types can correct for sampling bias (Fithian et al., 2015). Furthermore, reviews in the field highlight the need for models that capture complex spatial and temporal dynamics (Martínez-Minaya et al., 2018) while balancing complexity with ecological interpretability (Laxton et al., 2023). Building on these ideas, this PhD project will develop point process methods to integrate multiple biased and incomplete data sources, model detection processes and produce reliable wildlife distribution maps with quantified uncertainty
Click on an image to expand
Methodology
Spatial point pattern models provide a robust statistical approach to integrate multiple data sources, account for biases in the data and estimate the true distribution map of wildlife species. In this approach, the distribution map is represented by the intensity function, which describes the expected density of individuals at a given space-time location. The modelling framework assumes that wildlife occurrences are governed by an underlying point process with an unknown intensity function. Then, for each data source, there exists an unknown detection probability function that describes how likely it is for an individual to be recorded at a given time point in space. Instead of observing the actual occurrence process, each data source provides an observation from a thinned version of the process with a detection probability that represent sampling bias, imperfect detection and data incompleteness.
This PhD project aims to develop robust and reliable statistical methods (e.g., via the Integrated Laplace Approximation – INLA) to combine multiple biased data sources to estimate the distribution map of wildlife occurrences and provide uncertainty measures for the estimated map.
The analysis will be conducted primarily on butterfly species data collected across the UK over multiple years (e.g., Butterflies for the new millennium and UKBMS data recording schemes). To evaluate the impact of climate change and land use changes, the climate data from the UK Climate Projection 2018 (UKCP18) project and land use data from Spatially explicit Projections of EnvironmEntal Drivers (SPEED) project will be used, as well as data from other UKCEH sources
Project Timeline
Year 1
• Data preparation: standardise and clean data, perform georeferencing and extract relevant environmental covariates (land cover, elevation, climate, vegetation, roads, population density)
• Start assessing potential biases in the species distribution data.
• Develop initial space-time models.
• Familiarised with Bayesian inference approaches such as INLA.
• Explore integrated statistical models for combining multiple data sources.
Year 2
• Develop statistical approaches to model detection probabilities and human activities/environmental information to correct for uneven sampling effort and reporting bias using hierarchical/joint models while balancing complexity and interpretability.
• Explore and apply Bayesian and likelihood-based approaches (e.g., INLA, MCMC or variational methods) for estimation and uncertainty quantification and assess estimation methods via simulation studies
• Prepare research papers and reports for publication.
• Submit research papers to scientific journals.
• Address feedback and revisions from peer reviews.
• Present findings at conferences and workshops
Year 3
• Develop statistical models to combine multiple data sources while addressing the bias of each source.
• Develop methods to evaluate the accuracy and predictive abilities of the proposed models.
• Produce high-resolution spatio-temporal maps with quantified uncertainty
• Prepare research papers and reports for publication.
• Submit research papers to scientific journals.
• Address feedback and revisions from peer reviews.
• Present findings at conferences and workshops.
• Finalize research findings and conclusions.
Year 3.5
• Carry out comparison of the proposed method results with the existing methods’ results.
• Create reproducible code and software in R or Python for use in ecology and conservation applications
• Write recommendations for the management and conservation of the populations of interest.
• Submit research papers to scientific journals.
• Address feedback and revisions from peer reviews.
• Produce a final PhD thesis summarizing all findings.
• Reflect on the project’s impact and contributions to the field.
Training
& Skills
The PhD student will receive training in statistical analysis and modelling, with a focus on spatio-temporal modelling techniques like INLA for analysing ecological data.
The student will also receive training in effective coding practices such as version control and large-scale computing using clusters. This training will help the PhD student to enhance statistical, modelling and ecological skills. In addition, collaboration skills will be enhanced by working with experts from various fields, including ecology, statistics, and climate science, fostering interdisciplinary teamwork. The PhD student will gain expertise in preparing research papers and presentations, enabling effective communication of findings to both scientific and non-scientific audiences. The PhD program will encourage critical thinking and problem-solving skills to address challenges and uncertainties that may arise during the research process.
References & further reading
• Baddeley, A., Rubak, E., & Turner, R. (2015). Spatial point patterns: methodology
and applications with R. CRC Press.
• Backstrom, L. J., Callaghan, C. T., Worthington, H., Fuller, R. A., & Johnston, A.
(2025). Estimating sampling biases in citizen science datasets. International Journal
of Avian Science, 167(1), 73-87.
• Fithian, W., Elith, J., Hastie, T., & Keith, D. A. (2015). Bias correction in species
distribution models: pooling survey and collection data for multiple species. Methods
in ecology and evolution, 6(4), 424-438.
• Illian, J., Penttinen, A., Stoyan, H., & Stoyan, D. (2008). Statistical analysis and
modelling of spatial point patterns. John Wiley & Sons.
• Laxton, M. R., Rodríguez de Rivera, Ó., Soriano‐Redondo, A., & Illian, J. B. (2023).
Balancing structural complexity with ecological insight in Spatio‐temporal species
distribution models. Methods in Ecology and Evolution, 14(1), 162-172.
• Martínez-Minaya, J., Cameletti, M., Conesa, D., & Pennino, M. G. (2018). Species dis-
tribution modeling: a statistical review with focus in spatio-temporal issues. Stochas-
tic environmental research and risk assessment, 32(11), 3227-3244.
• Møller, J., & Waagepetersen, R. P. (2003). Statistical inference and simulation for
spatial point processes. CRC press.
• Paradinas, I., Illian, J., & Smout, S. (2023). Understanding spatial effects in species
distribution models. Plos one, 18(5), e0285463.
• Tang, B., Clark, J. S., & Gelfand, A. E. (2021). Modeling spatially biased citizen
science effort through the eBird database. Environmental and Ecological Statistics,
28(3), 609-630
