Introduction to rWCVP

The World Checklist of Vascular Plants (WCVP) is a global consensus view of vascular plant species. It provides names, synonymy, taxonomy, and distributions for the > 340,000 vascular plant species known to science.

We have developed rWCVP to make some common tasks that use the WCVP easier. These include standardising a list of taxon names against the WCVP, getting and mapping the distribution of a species, and creating a checklist of taxa found in a particular region.

Accessing the WCVP

For the functions in rWCVP to work, you need to have access to a copy of the WCVP.

One way to load the WCVP is to install the associated data package, rWCVPdata:

if (!require(rWCVPdata)) {
  install.packages("rWCVPdata",
    repos = c(
      "https://matildabrown.github.io/drat",
      "https://cloud.r-project.org"
    )
  )
}
# taxonomy data
names <- rWCVPdata::wcvp_names

# distribution data
distributions <- rWCVPdata::wcvp_distributions

If the data package isn’t available, or you’d prefer to use a different version of the WCVP, you can provide a local copy of the data to the main functions in this package. For instance, to generate a checklist:

names <- read_csv("/path/to/wcvp_names.csv")
distributions <- read_csv("/path/to/wcvp_distributions.csv")

checklist <- wcvp_checklist("Acacia",
  taxon_rank = "genus", area_codes = "CPP",
  wcvp_names = names, wcvp_distributions = distributions
)

Be careful if you’re using your own WCVP version! The structure of the WCVP tables sometimes changes between versions. rWCVP should be set up to work with the latest version of WCVP, and any previous versions that share the same structure.

Filtering the WCVP

Some rWCVP functions involve filtering the WCVP to generate lists or summaries of vascular plant species in particular areas. These functions accept two arguments for filtering the WCVP:

These arguments can be combined in the wcvp_checklist, wcvp_occ_mat, and wcvp_summary to generate outputs for focal taxa in a desired area. For example, Myrtaceae species in Brazil:

checklist <- wcvp_checklist("Myrtaceae",
  taxon_rank = "family",
  area_codes = c("BZC", "BZN", "BZS", "BZE", "BZL")
)

A note on filtering by taxon

When filtering by taxon, you need to tell the function what taxonomic rank the name you’re providing is using the taxon.rank argument.

For example, making a summary table for the genus Poa:

wcvp_summary("Poa", taxon_rank = "genus")
#> $Taxon
#> [1] "Poa"
#> 
#> $Area
#> [1] "the world"
#> 
#> $Grouping_variable
#> [1] "area_code_l3"
#> 
#> $Total_number_of_species
#> [1] 573
#> 
#> $Number_of_regionally_endemic_species
#> [1] 573
#> 
#> $Summary
#> # A tibble: 280 × 6
#>    area_code_l3 Native Endemic Introduced Extinct Total
#>    <chr>         <int>   <int>      <int>   <int> <int>
#>  1 ABT              21       0          5       0    26
#>  2 AFG              23       1          0       0    23
#>  3 AGE              13       2          5       0    18
#>  4 AGS              24       0         10       0    34
#>  5 AGW              34       8          3       0    37
#>  6 ALA               6       0          3       0     9
#>  7 ALB              17       0          0       0    17
#>  8 ALG               8       0          1       0    11
#>  9 ALT              30       3          1       0    31
#> 10 ALU               7       0          3       0    10
#> # … with 270 more rows

You can provide taxon names of the ranks species, genus, family, order, or higher. The WCVP only provides taxonomic information up to the family rank. We have included a table called taxonomic_mapping to map families to orders and higher taxonomies, based on APG IV.

head(taxonomic_mapping)
#>        higher       order          family
#> 1 Angiosperms    Acorales       Acoraceae
#> 2 Angiosperms Alismatales    Alismataceae
#> 3 Angiosperms Alismatales Aponogetonaceae
#> 4 Angiosperms Alismatales         Araceae
#> 5 Angiosperms Alismatales      Butomaceae
#> 6 Angiosperms Alismatales   Cymodoceaceae

We use this table behind the scenes to allow you to filter using these higher taxonomic ranks.

For example, making a summary table for all Poales:

wcvp_summary("Poales", taxon_rank = "order")
#> $Taxon
#> [1] "Poales"
#> 
#> $Area
#> [1] "the world"
#> 
#> $Grouping_variable
#> [1] "area_code_l3"
#> 
#> $Total_number_of_species
#> [1] 23770
#> 
#> $Number_of_regionally_endemic_species
#> [1] 23770
#> 
#> $Summary
#> # A tibble: 368 × 6
#>    area_code_l3 Native Endemic Introduced Extinct Total
#>    <chr>         <int>   <int>      <int>   <int> <int>
#>  1 ABT             359       0         70       0   429
#>  2 AFG             485      16         15       0   504
#>  3 AGE             908      31        166       0  1074
#>  4 AGS             389      20         88       0   477
#>  5 AGW             890     130        105       0   995
#>  6 ALA             711       2        153       0   864
#>  7 ALB             363       4         14       0   387
#>  8 ALD              44       7         13       0    57
#>  9 ALG             445      11         41       0   497
#> 10 ALT             383       9          6       0   389
#> # … with 358 more rows

A note on filtering by area

The WCVP lists taxon distributions using the World Geographic Scheme for Recording Plant Distributions (WGSRPD) at level 3. This level corresponds to “botanical countries”, which mostly follow the boundaries of countries, except where large countries are split or outlying areas omitted.

The functions in rWCVP expect area to be provided as a vector of WGSRPD level 3 codes. These can be annoying to look up for the entire region you’re interested in. For example, to filter by species in Brazil you need to provide a vector of 5 codes.

To make things easier, rWCVP has a function that converts the name of a region to a vector of WGSRPD level 3 codes.

get_wgsrpd3_codes("Brazil")
#> [1] "BZC" "BZE" "BZL" "BZN" "BZS"

This can be input directly into functions that filter the WCVP by area.

wcvp_summary("Poa", taxon_rank = "genus", area = get_wgsrpd3_codes("Southern Hemisphere"))
#> $Taxon
#> [1] "Poa"
#> 
#> $Area
#> [1] "Southern Hemisphere (incl. equatorial Level 3 areas)"
#> 
#> $Grouping_variable
#> [1] "area_code_l3"
#> 
#> $Total_number_of_species
#> [1] 264
#> 
#> $Number_of_regionally_endemic_species
#> [1] 237
#> 
#> $Summary
#> # A tibble: 74 × 6
#>    area_code_l3 Native Endemic Introduced Extinct Total
#>    <chr>         <int>   <int>      <int>   <int> <int>
#>  1 AGE              13       2          5       0    18
#>  2 AGS              24       0         10       0    34
#>  3 AGW              34       8          3       0    37
#>  4 ANT               0       0          1       0     1
#>  5 ASC               0       0          1       0     1
#>  6 ASP               2       1          3       0     5
#>  7 ATP               8       1          3       0    11
#>  8 BOL              31       2          3       0    34
#>  9 BOR               2       1          0       0     2
#> 10 BUR               3       0          0       0     3
#> # … with 64 more rows