vignettes/Ostats-multivariate.Rmd
Ostats-multivariate.Rmd
This vignette on multi-variable community-level trait overlap statistics is intended for users who are already familiar with the single-variable O-statistics we introduce in the first package vignette.
We decided to include functions in our package to estimate community-level niche overlap statistics (O-statistics) in multivariate space because of ecologists’ long-standing interest in how species’ niches take shape across multiple environmental dimensions. Since the niche of a species can be determined by several abiotic and biotic variables, Hutchinson (1957) introduced the mathematical formulation of a niche as an n-dimensional hypervolume. Since then, many approaches have been developed to measure the multivariate niche of a species (Green 1971; Swanson et al. 2015). Quantifying the niche in multiple dimensions can help understand how life strategy trade-offs and interactions among multiple traits shape organisms’ structure and function.
Blonder and colleagues rekindled interest in using hypervolumes to
describe species’ niches in multiple dimensions. They worked out the
underlying theory (Blonder et al. 2014; Lamanna et al. 2014; Blonder
2018) and released the R package hypervolume.
Their package uses stochastic geometry to estimate multivariate kernel
densities and . The multivariate O-statistic estimation in the
Ostats
package relies on functions imported from hypervolume.
In this vignette, we demonstrate how you can use functions in the
Ostats
package to study trait overlap in multiple
dimensions.
The dataset we use in this vignette includes traits collected on
carnivorous pitchers of the northern pitcher plant, Sarracenia
purpurea, at 5 sites in North America and represent a subset of
data from Freedman et al. (2021). Pitcher traits measured include
orthogonal rosette diameters, pitcher length, pitcher width, keel width,
mouth diameter, and lip thickness. See Ellison and Gotelli (2002) for a
diagram illustrating traits measured, or see the dataset documentation
page for more details by typing ?pitcher_traits
in your
console. Site locations and identifying information have been removed
for this example.
Here we show a few randomly selected rows from the pitcher traits
dataset. Here, all individuals are of the same species. Therefore we
will use populations (sites) as the sp
variable in our
analysis, with a single value for plots
.
set.seed(1)
idx <- sort(sample(nrow(pitcher_traits), 10, replace = FALSE))
knitr::kable(pitcher_traits[idx,])
site_id | plant_id | rosette_diameter_1 | rosette_diameter_2 | pitcher_length | pitcher_width | keel_width | mouth_diameter | lip_thickness | |
---|---|---|---|---|---|---|---|---|---|
1 | FLK | 1 | 16 | 14.0 | 15.4 | 42.40 | 13.25 | 29.31 | 4.48 |
14 | FLK | 14 | 13 | 3.0 | 9.7 | 30.63 | 9.77 | 20.05 | 3.05 |
60 | MYR | 10 | 20 | 10.0 | 15.0 | 44.88 | 12.75 | 26.67 | 4.04 |
65 | MYR | 15 | 29 | 16.0 | 16.2 | 48.28 | 19.74 | 22.09 | 3.13 |
69 | MYR | 19 | 5 | 9.0 | 17.9 | 50.55 | 17.62 | 24.50 | 3.27 |
203 | CRI | 3 | 32 | 17.0 | 21.5 | 32.79 | 11.07 | 21.49 | 2.39 |
211 | CRI | 11 | 11 | 9.5 | 17.5 | 37.42 | 16.48 | 20.73 | 1.87 |
220 | CRI | 20 | 52 | 31.0 | 19.0 | 38.00 | 13.78 | 25.13 | 2.57 |
282 | SAC | 10 | 11 | 8.5 | 15.5 | 42.05 | 17.60 | 21.55 | 2.90 |
287 | SAC | 15 | 23 | 23.0 | 13.0 | 34.77 | 12.75 | 22.15 | 1.11 |
Here we calculate the univariate O-statistics for each of the pitcher traits separately, then a single multivariate O-statistic representing overlap of the trait hypervolumes.
When estimating hypervolumes, as with any other multivariate analysis, it is critical to ensure that the hypervolume axes are not too highly correlated with one another. If traits are too highly correlated, the calculated hypervolume will not accurately represent the distribution in trait space. In this case, we resolve the issue by excluding a few of the traits with high correlations. You could also choose to use a variance maximization procedure like PCA instead.
First ensure that the data are in the appropriate form for
Ostats()
and Ostats_multivariate()
. For this
example we will remove individuals with incomplete data. Using
cor()
on the cleaned dataset, we can see the relatively
high correlations among some of the traits.
pitcher_traits <- pitcher_traits[complete.cases(pitcher_traits), ]
pitcher_sites <- as.factor(pitcher_traits$site_id)
round(cor(pitcher_traits[, -(1:2)]), 2)
#> rosette_diameter_1 rosette_diameter_2 pitcher_length
#> rosette_diameter_1 1.00 0.83 0.64
#> rosette_diameter_2 0.83 1.00 0.71
#> pitcher_length 0.64 0.71 1.00
#> pitcher_width 0.38 0.44 0.70
#> keel_width 0.46 0.51 0.69
#> mouth_diameter 0.38 0.35 0.43
#> lip_thickness -0.11 -0.19 -0.15
#> pitcher_width keel_width mouth_diameter lip_thickness
#> rosette_diameter_1 0.38 0.46 0.38 -0.11
#> rosette_diameter_2 0.44 0.51 0.35 -0.19
#> pitcher_length 0.70 0.69 0.43 -0.15
#> pitcher_width 1.00 0.82 0.49 0.12
#> keel_width 0.82 1.00 0.29 -0.03
#> mouth_diameter 0.49 0.29 1.00 0.47
#> lip_thickness 0.12 -0.03 0.47 1.00
We select four traits that represent dimensions of different parts of the plant’s anatomy. As a result they are relatively uncorrelated with one another: rosette diameter, pitcher width, mouth diameter, and lip thickness.
traits_to_use <- c("rosette_diameter_1", "pitcher_width", "mouth_diameter", "lip_thickness")
pitcher_trait_matrix <- as.matrix(pitcher_traits[, traits_to_use])
round(cor(pitcher_trait_matrix), 2)
#> rosette_diameter_1 pitcher_width mouth_diameter
#> rosette_diameter_1 1.00 0.38 0.38
#> pitcher_width 0.38 1.00 0.49
#> mouth_diameter 0.38 0.49 1.00
#> lip_thickness -0.11 0.12 0.47
#> lip_thickness
#> rosette_diameter_1 -0.11
#> pitcher_width 0.12
#> mouth_diameter 0.47
#> lip_thickness 1.00
Next we center and scale all the columns to ensure that the hypervolume O-statistic is not influenced by different magnitudes of each trait.
apply(pitcher_trait_matrix, 2, range)
#> rosette_diameter_1 pitcher_width mouth_diameter lip_thickness
#> [1,] 4 22.28 9.43 0.60
#> [2,] 65 83.01 36.66 6.44
pitcher_trait_matrix_scaled <- scale(pitcher_trait_matrix, center = TRUE, scale = TRUE)
round(apply(pitcher_trait_matrix_scaled, 2, range), 2)
#> rosette_diameter_1 pitcher_width mouth_diameter lip_thickness
#> [1,] -1.31 -1.96 -2.64 -1.81
#> [2,] 3.62 4.36 2.31 3.21
For demonstration purposes, let’s calculate the univariate O-statistics for each of the four traits so that we can compare them to the multivariate statistic.
pitcher_univariate <- Ostats(traits = pitcher_trait_matrix_scaled,
plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
sp = pitcher_sites,
run_null_model = FALSE
)
Now calculate the multivariate O-statistic for the traits combined.
We still specify the argument random_seed
even in the
absence of a null model, because the hypervolumes are constructed with a
stochastic algorithm.
pitcher_multivariate <- Ostats_multivariate(traits = pitcher_trait_matrix_scaled,
plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
sp = pitcher_sites,
random_seed = 333,
run_null_model = FALSE,
hypervolume_args = list(method = 'box'),
hypervolume_set_args = list(num.points.max = 1000)
)
Notice that Ostats_multivariate()
accepts additional
arguments to hypervolume::hypervolume()
, which it uses to
construct the hypervolumes, and
hypervolume::hypervolume_set()
, which it uses to estimate
the hypervolume overlaps. For example, the default method for
hypervolume construction is method = "gaussian"
, and the
default maximum number of points to use for hypervolume overlap
estimation is num.points.max = 10^(3+sqrt(n))
, where
n
is the number of dimensions. In this vignette, we
modified the defaults to use a different method ("box"
) for
hypervolume construction and fewer points (1000
) for
overlap estimation:
hypervolume_args = list(method = 'box'), hypervolume_set_args = list(num.points.max = 1000).
Those options speed processing time dramatically. See the help
documentation for hypervolume::hypervolume
and
hypervolume::hypervolume_set
for more details.
Compare the univariate and multivariate O-statistics.
pitcher_univariate$overlaps_norm
#> rosette_diameter_1 pitcher_width mouth_diameter lip_thickness
#> 1 0.528308 0.7419213 0.8198156 0.5733967
pitcher_multivariate$overlaps_norm
#> [,1]
#> 1 0.7784172
The single multivariate O-statistic is not comparable to the univariate statistics because of differing dimensionality. Therefore we need to evaluate them against null models.
If run_null_model
is set to TRUE
,
Ostats_multivariate()
will do nperm
iterations
of a null model, defaulting to 99
. Here we use a smaller
number for demonstration purposes.
First, for comparison purposes, run the null models for the univariate O-statistics for each of the four traits.
pitcher_univariate_withnull <- Ostats(traits = pitcher_trait_matrix_scaled,
plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
sp = pitcher_sites,
random_seed = 666,
run_null_model = TRUE,
nperm = 50
)
Next, do the same for the multivariate overlap.
pitcher_multivariate_withnull <- Ostats_multivariate(traits = pitcher_trait_matrix_scaled,
plots = factor(rep(1, nrow(pitcher_trait_matrix_scaled))),
sp = pitcher_sites,
random_seed = 555,
run_null_model = TRUE,
nperm = 50,
hypervolume_args = list(method = 'box'),
hypervolume_set_args = list(num.points.max = 1000)
)
Comparing the standardized effect sizes, normalized to remove the effect of varying sample sizes across sites, we can see that three of the four individual traits have O-statistics significantly lower than the null expectation (lower than the 2.5%ile of the null distribution), while the multivariate O-statistic is not significantly different from the null expectation:
O-statistic | rosette diameter | pitcher width | mouth diameter | lip thickness | multivariate |
---|---|---|---|---|---|
observed | 0.528 | 0.742 | 0.820 | 0.573 | 0.768 |
null 2.5% | 0.722 | 0.760 | 0.767 | 0.731 | 0.766 |
null 97.5% | 0.866 | 0.875 | 0.888 | 0.869 | 0.812 |
We can visualize the overlap of the different hypervolumes using the
function Ostats_multivariate_plot()
. This function returns
a list of plot objects, one for each community. Each plot displays an
upper triangular matrix of panels, one for each pair of traits, with the
cross-sections of the hypervolumes in those two dimensions, as well as
the raw data points.
For example, we can plot the overlap of the hypervolumes in
4-dimensional space formed by the individual traits of the pitcher
plants at each site. Note that although we calculated the hypervolume
overlaps on the scaled traits, we are plotting the hypervolumes using
the unscaled trait axes. However, this requires us to set the argument
contour_level = 0.0001
because the densities of the
unscaled hypervolumes are too low to plot using the default
contour_level = 0.01
.
Ostats_multivariate_plot(plots = factor(rep(1, nrow(pitcher_trait_matrix))),
sp = pitcher_sites,
traits = pitcher_trait_matrix,
contour_level = 0.0001,
overlap_dat = pitcher_multivariate
)
We can customize the plot by supplying a color palette to the
colorvalues
argument (the default is a colorblind-friendly
palette imported from viridis),
and add additional spacing around the hypervolume contours using the
axis_expansion
argument. The default value of this argument
is 0.01
, or 1% in all directions. Finally, we can suppress
plotting the raw data and only plot the contours using
plot_points = FALSE
.
Ostats_multivariate_plot(plots = factor(rep(1, nrow(pitcher_trait_matrix))),
sp = pitcher_sites,
traits = pitcher_trait_matrix,
contour_level = 0.0001,
overlap_dat = pitcher_multivariate,
colorvalues = rainbow(length(unique(pitcher_sites))),
axis_expansion = 0.05,
plot_points = FALSE
)
The code underlying Ostats_multivariate_plot()
was
adapted from the plot.Hypervolumelist()
function in the hypervolume
package.
When working with multivariate trait data, it is critical to examine correlations among traits in the dataset. If you construct hypervolumes with traits that are highly correlated and measure their overlap, you will get spurious results. In addition, it’s important to consider the issue of dimensionality. It is difficult if not impossible to estimate hypervolume overlap in more than five or six dimensions. A study that quantified trait space across a wide variety of systems found that using three to six dimensions jointly maximizes trait space quality and computational feasibility (Mouillot et al., in press). Others have critiqued using hypervolume kernel density estimation to quantify species niches (Qiao et al. 2017); the technical issues only get worse as more dimensions are added. Using a variance maximization technique like PCA could be helpful to reduce dimensionality. Regardless of what method you choose, the absolute amount of overlap can’t be compared among hypervolumes with different dimensionality, making it especially important to compare the O-statistics with null distributions.
Finally, some work has shown that trait-based inference derived from analyzing multiple traits may not improve our understanding of biological systems in many cases, relative to inference based on one well-chosen biologically meaningful trait, especially when many commonly-measured traits are different manifestations of the same few underlying biological processes (Swenson 2013, but see Kraft et al. 2015). In light of that, you should carefully consider which trait or traits are relevant to your research question. Whenever possible, compare inference based on single trait overlap to the inference you get when considering multiple traits, and select the most parsimonious model.