This document gives a demonstration how to use the package to obtain a maximum-likelihood estimate of the protracted birth-death speciation model.
First thing is to load the PBD package itself:
rm(list = ls())
library(PBD)
We will also need ape for branching.times
:
library(ape)
Here we simulate a tree with known parameters:
seed <- 43
set.seed(seed)
b_1 <- 0.3 # speciation-initiation rate of good species
la_1 <- 0.2 # speciation-completion rate
b_2 <- b_1 # the speciation-initiation rate of incipient species
mu_1 <- 0.1 # extinction rate of good species
mu_2 <- mu_1 # extinction rate of incipient species
pars <- c(b_1, la_1, b_2, mu_1, mu_2)
age <- 15 # the age for the simulation
phylogenies <- pbd_sim(pars = pars, age = age)
plot(phylogenies$recontree)
plot(phylogenies$igtree.extant)
## no colors provided. using the following legend:
## g i
## "black" "red"
plot(phylogenies$tree)
names(phylogenies)
## [1] "tree" "stree_random" "stree_oldest" "stree_youngest"
## [5] "L" "sL_random" "sL_oldest" "sL_youngest"
## [9] "igtree.extinct" "igtree.extant" "recontree" "reconL"
## [13] "L0"
Now we try to recover the parameters by maximum likelihood estimation:
brts <- branching.times(phylogenies$recontree) # branching times
init_b <- 0.2 # speciation-initiation rate
init_mu_1 <- 0.05 # extinction rate of good species
init_la_1 <- 0.3 # speciation-completion rate
#init_mu_2 <- 0.05 # extinction rate of incipient species # not used
# The initial values of the parameters that must be optimized
initparsopt <- c(init_b, init_mu_1, init_la_1)
# The extinction rates between incipient and good species are equal
exteq <- TRUE
# The first element of the branching times is the crown age (and not the stem age)
soc <- 2
# Conditioning on non-extinction of the phylogeny
# as I actively selected for a nice phylogeny
cond <- 1
# Give the likelihood of the phylogeny (instead of the likelihood of the branching times)
btorph <- 1
Maximum likelihood estimation can now be performed:
r <- pbd_ML(
brts = brts,
initparsopt = initparsopt,
exteq = exteq,
soc = soc,
cond = cond,
btorph = btorph,
verbose = FALSE
)
The ML parameter estimates are:
knitr::kable(r)
b | mu_1 | lambda_1 | mu_2 | loglik | df | conv |
---|---|---|---|---|---|---|
0.2817166 | 0.1003351 | 0.20635 | 0.1003351 | -37.1221 | 3 | 0 |
Comparing the known true value with the recovered values:
loglik_true <- PBD::pbd_loglik(pars, brts = brts)
## Parameters: 0.3, 0.2, 0.3, 0.1, 0.1, Loglikelihood: -37.324677
df <- as.data.frame(r)
df <- rbind(df, c(b_1, mu_1, la_1, mu_2, loglik_true, NA, NA))
row.names(df) <- c("ML", "true")
knitr::kable(df)
b | mu_1 | lambda_1 | mu_2 | loglik | df | conv | |
---|---|---|---|---|---|---|---|
ML | 0.2817166 | 0.1003351 | 0.20635 | 0.1003351 | -37.12210 | 3 | 0 |
true | 0.3000000 | 0.1000000 | 0.20000 | 0.1000000 | -37.32468 | NA | NA |
Ideally, all parameter columns should have the same values.
To test for the uncertainty of our ML estimate, we can do a parametric bootstrap.
The function pbd_bootstrap
consists of a few steps:
endmc <- 10 # Sets the number of simulations for the bootstrap
b <- pbd_bootstrap(
brts = brts,
initparsopt = initparsopt,
exteq = exteq,
soc = soc,
cond = cond,
btorph = btorph,
plotltt = FALSE,
endmc = endmc,
seed = seed
)
## Finding the maximum likelihood estimates ...
##
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -37.25841
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.281715, mu_1: 0.100335, lambda_1: 0.206352, mu_2: 0.100335
## Maximum loglikelihood: -37.122103
## The expected duration of speciation for these parameters is: 2.807343
## The median duration of speciation for these parameters is: 2.206200
## Bootstrapping ...
##
## Simulated data set 1 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -22.03909
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.375753, mu_1: 0.311977, lambda_1: 0.230678, mu_2: 0.311977
## Maximum loglikelihood: -21.596081
## The expected duration of speciation for these parameters is: 2.030928
## The median duration of speciation for these parameters is: 1.543204
## Simulated data set 2 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -57.99061
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.312123, mu_1: 0.081059, lambda_1: 0.190419, mu_2: 0.081059
## Maximum loglikelihood: -57.720450
## The expected duration of speciation for these parameters is: 2.943368
## The median duration of speciation for these parameters is: 2.365253
## Simulated data set 3 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -115.261
## Optimizing the likelihood - this may take a while.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
## Parameter values have been used that cause numerical problems.
##
## Maximum likelihood parameter estimates: b: 751.298215, mu_1: 751.000348, lambda_1: 0.004574, mu_2: 751.000348
## Maximum loglikelihood: -111.629201
## The expected duration of speciation for these parameters is: 0.386462
## The median duration of speciation for these parameters is: 0.310353
## Simulated data set 4 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -107.9402
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.271860, mu_1: 0.000000, lambda_1: 0.126555, mu_2: 0.000000
## Maximum loglikelihood: -106.870092
## The expected duration of speciation for these parameters is: 4.218409
## The median duration of speciation for these parameters is: 3.570808
## Simulated data set 5 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -47.84195
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.709877, mu_1: 0.574456, lambda_1: 0.225125, mu_2: 0.574456
## Maximum loglikelihood: -46.601652
## The expected duration of speciation for these parameters is: 1.643556
## The median duration of speciation for these parameters is: 1.279479
## Simulated data set 6 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -187.144
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 13.979806, mu_1: 13.679004, lambda_1: 0.020678, mu_2: 13.679004
## Maximum loglikelihood: -183.432908
## The expected duration of speciation for these parameters is: 1.413933
## The median duration of speciation for these parameters is: 1.171087
## Simulated data set 7 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -41.38109
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.492110, mu_1: 0.309936, lambda_1: 0.059444, mu_2: 0.309936
## Maximum loglikelihood: -40.448749
## The expected duration of speciation for these parameters is: 4.546741
## The median duration of speciation for these parameters is: 3.828187
## Simulated data set 8 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -87.55681
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.300024, mu_1: 0.000001, lambda_1: 0.153513, mu_2: 0.000001
## Maximum loglikelihood: -86.151794
## The expected duration of speciation for these parameters is: 3.610683
## The median duration of speciation for these parameters is: 3.031341
## Simulated data set 9 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -100.522
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.209810, mu_1: 0.000000, lambda_1: 0.393479, mu_2: 0.000000
## Maximum loglikelihood: -99.916686
## The expected duration of speciation for these parameters is: 2.036930
## The median duration of speciation for these parameters is: 1.540704
## Simulated data set 10 out of 10
## You are optimizing b mu_1 lambda_1
## You are fixing nothing
## Extinction rate of incipient species is exactly the same as for good species.
## The likelihood for the initial parameter values is -61.11511
## Optimizing the likelihood - this may take a while.
##
## Maximum likelihood parameter estimates: b: 0.173103, mu_1: 0.000000, lambda_1: 0.890545, mu_2: 0.000000
## Maximum loglikelihood: -60.368468
## The expected duration of speciation for these parameters is: 1.026129
## The median duration of speciation for these parameters is: 0.738873
knitr::kable(b[[3]])
ntips | b | mu_1 | lambda_1 | mu_2 | loglik | df | conv | exp_durspec | median_durspec |
---|---|---|---|---|---|---|---|---|---|
9 | 0.3757529 | 0.3119773 | 0.2306781 | 0.3119773 | -21.59608 | 3 | 0 | 2.0309277 | 1.5432035 |
23 | 0.3121229 | 0.0810590 | 0.1904194 | 0.0810590 | -57.72045 | 3 | 0 | 2.9433681 | 2.3652530 |
40 | 751.2982150 | 751.0003476 | 0.0045737 | 751.0003476 | -111.62920 | 3 | 0 | 0.3864617 | 0.3103533 |
41 | 0.2718601 | 0.0000000 | 0.1265550 | 0.0000000 | -106.87009 | 3 | 0 | 4.2184093 | 3.5708078 |
19 | 0.7098773 | 0.5744562 | 0.2251248 | 0.5744562 | -46.60165 | 3 | 0 | 1.6435563 | 1.2794793 |
69 | 13.9798056 | 13.6790037 | 0.0206780 | 13.6790037 | -183.43291 | 3 | 0 | 1.4139334 | 1.1710870 |
15 | 0.4921098 | 0.3099361 | 0.0594435 | 0.3099361 | -40.44875 | 3 | 0 | 4.5467409 | 3.8281874 |
35 | 0.3000241 | 0.0000011 | 0.1535128 | 0.0000011 | -86.15179 | 3 | 0 | 3.6106829 | 3.0313409 |
38 | 0.2098101 | 0.0000000 | 0.3934793 | 0.0000000 | -99.91669 | 3 | 0 | 2.0369305 | 1.5407041 |
23 | 0.1731033 | 0.0000000 | 0.8905452 | 0.0000000 | -60.36847 | 3 | 0 | 1.0261295 | 0.7388728 |
From the bootstrap analysis, we get
Putting this in a table:
dg <- rbind(df,
list(
b = b[[1]]$b,
mu_1 = b[[1]]$mu_1,
lambda_1 = b[[1]]$lambda_1,
mu_2 = b[[1]]$mu_2,
loglik = b[[1]]$loglik,
df = b[[1]]$df,
conv = b[[1]]$conv
),
list(
b = b[[3]]$b,
mu_1 = b[[3]]$mu_1,
lambda_1 = b[[3]]$lambda_1,
mu_2 = b[[3]]$mu_2,
loglik = b[[3]]$loglik,
df = b[[3]]$df,
conv = b[[3]]$conv
)
)
dg
## b mu_1 lambda_1 mu_2 loglik df conv
## ML 0.2817166 1.003351e-01 0.206350020 1.003351e-01 -37.12210 3 0
## true 0.3000000 1.000000e-01 0.200000000 1.000000e-01 -37.32468 NA NA
## 3 0.2817148 1.003351e-01 0.206351748 1.003351e-01 -37.12210 3 0
## 4 0.3757529 3.119773e-01 0.230678073 3.119773e-01 -21.59608 3 0
## 5 0.3121229 8.105904e-02 0.190419381 8.105904e-02 -57.72045 3 0
## 6 751.2982150 7.510003e+02 0.004573677 7.510003e+02 -111.62920 3 0
## 7 0.2718601 3.331886e-08 0.126555041 3.331886e-08 -106.87009 3 0
## 8 0.7098773 5.744562e-01 0.225124828 5.744562e-01 -46.60165 3 0
## 9 13.9798056 1.367900e+01 0.020678046 1.367900e+01 -183.43291 3 0
## 10 0.4921098 3.099361e-01 0.059443526 3.099361e-01 -40.44875 3 0
## 11 0.3000241 1.094309e-06 0.153512850 1.094309e-06 -86.15179 3 0
## 12 0.2098101 4.833861e-08 0.393479254 4.833861e-08 -99.91669 3 0
## 13 0.1731033 9.079251e-10 0.890545200 9.079251e-10 -60.36847 3 0
row.names(dg) <- c("ML", "true", "ML2", paste("BS", 1:endmc, sep = ""))
knitr::kable(dg)
b | mu_1 | lambda_1 | mu_2 | loglik | df | conv | |
---|---|---|---|---|---|---|---|
ML | 0.2817166 | 0.1003351 | 0.2063500 | 0.1003351 | -37.12210 | 3 | 0 |
true | 0.3000000 | 0.1000000 | 0.2000000 | 0.1000000 | -37.32468 | NA | NA |
ML2 | 0.2817148 | 0.1003351 | 0.2063517 | 0.1003351 | -37.12210 | 3 | 0 |
BS1 | 0.3757529 | 0.3119773 | 0.2306781 | 0.3119773 | -21.59608 | 3 | 0 |
BS2 | 0.3121229 | 0.0810590 | 0.1904194 | 0.0810590 | -57.72045 | 3 | 0 |
BS3 | 751.2982150 | 751.0003476 | 0.0045737 | 751.0003476 | -111.62920 | 3 | 0 |
BS4 | 0.2718601 | 0.0000000 | 0.1265550 | 0.0000000 | -106.87009 | 3 | 0 |
BS5 | 0.7098773 | 0.5744562 | 0.2251248 | 0.5744562 | -46.60165 | 3 | 0 |
BS6 | 13.9798056 | 13.6790037 | 0.0206780 | 13.6790037 | -183.43291 | 3 | 0 |
BS7 | 0.4921098 | 0.3099361 | 0.0594435 | 0.3099361 | -40.44875 | 3 | 0 |
BS8 | 0.3000241 | 0.0000011 | 0.1535128 | 0.0000011 | -86.15179 | 3 | 0 |
BS9 | 0.2098101 | 0.0000000 | 0.3934793 | 0.0000000 | -99.91669 | 3 | 0 |
BS10 | 0.1731033 | 0.0000000 | 0.8905452 | 0.0000000 | -60.36847 | 3 | 0 |
We expect rows ML and ML2 to be identical. Their values are indeed very similar.
We can calculate the loglikelihood for
ml_b <- b[[1]]$b
ml_mu_1 <- b[[1]]$mu_1
ml_la_1 <- b[[1]]$lambda_1
ml_mu_2 <- b[[1]]$mu_2
ml_pars1 <- c(ml_b, ml_mu_1, ml_la_1, ml_mu_2)
ml_pars2 <- c(cond, btorph, soc, 0, "lsoda")
l <- pbd_loglik(
pars1 = ml_pars1,
pars2 = ml_pars2,
brts = brts
)
print(l)
## [1] -37.1221
# Create .md, .html, and .pdf files
setwd(paste(getwd(), "vignettes", sep = "/"))
knit("PBD_ML_demo.Rmd")
markdown::markdownToHTML('PBD_ML_demo.md', 'PBD_ML_demo.html', options=c("use_xhml"))
system("pandoc -s PBD_ML_demo.html -o PBD_ML_demo.pdf")