Cory Merow, Matthew J. Smith, Thomas C. Edwards, Antoine Guisan, Sean McMahon, Signe Normand, Wilfried Thuiller, Rafael O. Wuest, Niklaus E. Zimmerman, and Jane Elith
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species’ occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g., generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building ‘under fit’ models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building ‘over fit’ models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Greg McInerny and Drew Purves. Fine-scale environmental variation in species distribution modelling: regression dilution, latent variables and neighbourly advice, Methods in Ecology and Evolution, British Ecological Society, 25 January 2011.
Greg J. McInerny and Rampal S. Etienne. Ditch the niche – is the niche a useful concept in ecology or species distribution modelling?, Journal of Biogeography, 2012.
Daniel Montoya, Drew W Purves, Itziar Rodriguez, and Miguel A Zavala. Do species distribution models explain spatial structure within tree species ranges?, Global Ecology and Biogeography, August 2009.
Greg J. McInerny and Rampal S. Etienne. Pitch the niche – taking responsibility for the concepts we use in ecology and species distribution modelling, Journal of Biogeography, 2012.
Glenn Marion, Greg J. McInerny, Jörn Pagel, Stephen Catterall, Alex R. Cook, Florian Hartig, and Robert B. O'Hara. Parameter and uncertainty estimation for process-oriented population and distribution models: data, statistics and the niche, Journal of Biogeography, 2012.
Mindy M. Syfert, Matthew J. Smith, and David A. Coomes. The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models , PLOS One, PLoS, February 2013.
Mindy M Syfert, Lucas N Joppa, Matthew J Smith, David A Coomes, Steven P Bachman, and Neil A Brummitt. Using species distribution models to inform IUCN Red List assessments, Biological Conservation, Elsevier, June 2014.