Partitioning environment and space in species-by-site matrices: a comparison of methods for community ecology and macroecology

2019 
Community ecologists and macroecologists have long sought to evaluate the importance of environmental conditions in determining species composition across sites (hereafter species-environment relationship; SER). Different methods have been used to estimate SERs, but their differences and respective reliability remain poorly known. We compared the performance of four families of statistical methods in estimating the contribution of the environment to explain variation in the occurrence and abundance of co-occurring species while accounting for spatial correlation. These methods included distance-based regression (MRM), constrained ordination (RDA and CCA), generalised linear, mixed, and additive models (GLM, GLMM, GAM), and tree-based machine learning (regression trees, boosted regression trees, and random forests). We first used a simple process-based simulation model of community assembly to generate data with a known strength of (i) niche processes driven by environmental conditions and (ii) spatial processes driven by environmental autocorrelation and dispersal limitation. Then we applied the different methods to infer the spatially-explicit SER and compared their performance in partitioning the environmental and spatial fractions of variation. We found that machine learning methods, namely boosted regression trees and random forests, most accurately recreated the true trends of both occurrence and abundance data. GAM was also a reliable method, though likelihood optimisation did not converge for low sample sizes. The latter is a good option if a priori hypotheses on the functional type of individual species-environment relationships are considered. The remaining methods performed worse under virtually all simulated conditions. Our results suggest that tree-based machine learning is a robust and user-friendly approach that can be widely used for partitioning explained variation in species-by-site matrices. The appropriate use of methods to estimate SERs and assess the importance of drivers of community assembly and species distributions across studies, spatial scales, and disciplines will contribute towards synthesis in community ecology and biogeography.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    56
    References
    4
    Citations
    NaN
    KQI
    []