pestr Workflow

Michal Jan Czyz

2021-01-16

Overview

pestr Package is a set of functions and wrappers that allow painless and quick data retrieval on pests and their hosts from EPPO Data Services and EPPO Global Database. First of all, it allows extraction of scientific names of organisms (and viruses), as well as synonyms and common names from SQLite database. The data base can be easily downloaded with eppo_database_download() function. Second, there are four functions in the package that use REST API to extract data on hosts, categorization and taxonomy and pests. Further, there is a function that downloads data csv files containing information on organisms (and viruses) distribution. Important feature is that the csv are never saved onto hard drive, instead they are directly used to create data.frame that can be assigned to a variable in R. Beside above features, this package provides some other helper functions e.g. connecting to database or storing EPPO token as variable.

Example workflow

Setting up token and connecting to SQLite database

In order to start working with pestr package, you should register yourself (free of charge) to EPPO Data Services. Then run create_eppo_token and assign results to a variable which will be used by functions that connect to REST API.

eppo_token <- create_eppo_token('<<your_EPPO_token>>')

Next, you can run eppo_database_download function that will download (by default to your working directory, which can be override with filepath argument) archive with the SQLite file. If you are on Linux operating system, file will be extracted into your working directory (or other directory provided in filepath argument). On Windows you will be asked to extract the database file manually.

eppo_database_download()

Last step of setup is to connect to database file, which can be easily done with eppo_database_connect function.

eppo_SQLite <- eppo_database_connect()

With this three short steps you are ready to go.

Extracting tables with scientific, common and synonym names

Currently searching for pest names supports scientific names, synonyms and common names. By default search will work with partial names – e.g. when you query for Cydia packardi you get information related to this species, while when you query for Cydia you get information on whole genus. Likewise, when you search for Droso you will get information on all species that contain Droso in their names. Moreover you can pass whole vector of terms in one shot, e.g. c('Xylella', 'Cydia packardi', 'Droso').

# Create vector of names that you are looking for
pests_query <- c('Cydia', "Triticum aestivum", "abcdef", "cadang")

Than you should start with querying for names and assigning your results to a variable. This variable will contain eppocodes that are used by other functions to extract data from EPPO REST API. eppo_names_tables takes two arguments: first is a vector of names to query the database, second is variable with connection to SQLite database.

pest_names <- eppo_names_tables(pests_query, eppo_SQLite)
names(pest_names)
#names that exist in database
head(pest_example[[1]], 5)
#names that were not found
head(pest_example[[2]], 5)
#preferred names for eppocodes from first table
head(pest_example[[3]], 5)
#all names that are associated to eppocodes from first data frame
head(pest_example[[4]], 5)
#> [1] "exist_in_DB"          "not_in_DB"            "pref_names"          
#> [4] "all_associated_names"
Names that exist in database
codeid fullname
6698 Cydia pomonella
8607 Cydia inopinata
8608 Cydia leucostoma
8609 Cydia sp.
9907 Ephialtes cydiae
Preferred names for eppocodes from first table
codeid fullname eppocode
6698 Cydia pomonella CARPPO
8607 Grapholita inopinata CYDIIN
8608 Cydia leucostoma CYDILE
8609 Cydia sp. CYDISP
9907 Ephialtes cydiae EPHICY
All names that are associated to eppocodes from first data frame
codeid fullname preferred codelang eppocode
6698 æblevikler 0 da CARPPO
6698 Obstmade 0 de CARPPO
6698 carpocapse des pommes 0 fr CARPPO
6698 pyrale de la pomme 0 fr CARPPO
6698 ver des pommes et des poires 0 fr CARPPO

As a result you will get list containing 3 data.frames and vector:

REMEMBER: Other eppo_tabletools_ functions use results of this function or raw eppocodes to access data from EPPO Global Database and EPPO Data Services.

eppo_tabletools_ functions to extract categorization, hosts, taxonomy, distribution and pests

This functions works separately from each other, thus there is no need to use all of them. There is no need to use them in any particular order. Functions for categorization, hosts taxonomy and pests takes two arguments:

OR three arguments:

Categorization of pests

As result eppo_tabletools_cat you will get list with two elements:

  • data.frame with categorization tables
  • data.frame with categorization for each eppocode condensed to single cell.
pests_cat <- eppo_tabletools_cat(pest_names, eppo_token)
#long format table
head(pests_cat[[1]], 5)
#comapct table with information for each eppocode condensed into one cell
head(pests_cat[[2]],5)
Long format table with pests categorization
eppocode nomcontinent isocode country qlist qlistlabel yr_add yr_del yr_trans
CARPPO Africa EG Egypt 2 A2 list 2018 NA NA
CARPPO Africa 3G Southern Africa 2 A2 list 2001 NA NA
CARPPO America CA Canada X Quarantine pest 2019 NA NA
CARPPO Asia BH Bahrain 1 A1 list 2003 NA NA
CARPPO Asia CN China 1 A1 list 1993 NA NA
Compact table with condensed information on categorization
eppocode categorization
CARPPO Africa: Egypt: A2 list: add/del/trans: 2018/NA/NA; Southern Africa: A2 list: add/del/trans: 2001/NA/NA | America: Canada: Quarantine pest: add/del/trans: 2019/NA/NA | Asia: Bahrain: A1 list: add/del/trans: 2003/NA/NA; China: A1 list: add/del/trans: 1993/NA/NA | RPPO/EU: APPPC: A2 list: add/del/trans: 1993/NA/NA
CYDIIN Africa: Egypt: A1 list: add/del/trans: 2018/NA/NA; Morocco: Quarantine pest: add/del/trans: 2018/NA/NA; Tunisia: Quarantine pest: add/del/trans: 2012/NA/NA | America: Canada: Quarantine pest: add/del/trans: 2019/NA/NA; Mexico: Quarantine pest: add/del/trans: 2018/NA/NA | Asia: Bahrain: A1 list: add/del/trans: 2003/NA/NA; Israel: Quarantine pest: add/del/trans: 2009/NA/NA; Jordan: A1 list: add/del/trans: 2013/NA/NA | Europe: Turkey: A1 list: add/del/trans: 2016/NA/NA; Ukraine: A1 list: add/del/trans: 2019/NA/NA | RPPO/EU: EPPO: A2 list: add/del/trans: 1994/NA/1999; EU: A1 Quarantine pest (Annex II A): add/del/trans: 2019/NA/NA
CYDILE NA: NA: NA: add/del/trans: NA/NA/NA
CYDISP NA: NA: NA: add/del/trans: NA/NA/NA
EPHICY NA: NA: NA: add/del/trans: NA/NA/NA

If you will to limit the data received from EPPO Data Services, and you are confident that you know exactly what you are looking for, you can use eppocodes directly.

pests_cat <- eppo_tabletools_cat(token = eppo_token,
                                 raw_eppocodes = c("LASPPA", "TRZAX", "CCCVD0"),
                                 use_raw_codes = TRUE)
pest_cat[[2]]
Limited results of using eppocodes LASPPA, TRZAX, CCCVD0
eppocode categorization
LASPPA Africa: Egypt: A1 list: add/del/trans: 2018/NA/NA; Morocco: Quarantine pest: add/del/trans: 2018/NA/NA; Tunisia: Quarantine pest: add/del/trans: 2012/NA/NA | America: Mexico: Quarantine pest: add/del/trans: 2018/NA/NA | Asia: Bahrain: A1 list: add/del/trans: 2003/NA/NA; Israel: Quarantine pest: add/del/trans: 2009/NA/NA; Jordan: A1 list: add/del/trans: 2013/NA/NA | Europe: Russia: A1 list: add/del/trans: 2014/NA/NA; Turkey: A1 list: add/del/trans: 2016/NA/NA; Ukraine: A1 list: add/del/trans: 2019/NA/NA | RPPO/EU: EAEU: A1 list: add/del/trans: 2018/NA/NA; EPPO: A1 list: add/del/trans: 1995/NA/NA; EU: A1 Quarantine pest (Annex II A): add/del/trans: 2019/NA/NA
TRZAX NA: NA: NA: add/del/trans: NA/NA/NA
CCCVD0 Africa: Egypt: Regulated non-quarantine pest: add/del/trans: 2018/NA/NA; Morocco: Quarantine pest: add/del/trans: 2018/NA/NA | America: Argentina: A1 list: add/del/trans: 2019/NA/NA; Brazil: A1 list: add/del/trans: 2018/NA/NA; Chile: A1 list: add/del/trans: 2019/NA/NA; Mexico: Quarantine pest: add/del/trans: 2018/NA/NA; United States of America: Quarantine pest: add/del/trans: 1989/NA/NA | Asia: Bahrain: A1 list: add/del/trans: 2003/NA/NA; China: A2 list: add/del/trans: 1988/NA/NA; Israel: Quarantine pest: add/del/trans: 2009/NA/NA | Europe: Turkey: A1 list: add/del/trans: 2016/NA/NA | RPPO/EU: APPPC: A2 list: add/del/trans: 1988/NA/NA; CAHFSA: A1 list: add/del/trans: 1990/NA/NA; COSAVE: A2 list: add/del/trans: 2018/NA/NA; EPPO: A1 list: add/del/trans: 1994/NA/NA; EU: A1 Quarantine pest (Annex II A): add/del/trans: 2019/NA/NA; PPPO: A2 list: add/del/trans: 1993/NA/NA

Hosts of pests

eppo_tabletools_hosts as a result returns a list of two data.frame:

  • long table format with all the data, for all pests combined;
  • with hosts are combined into single cell for each eppocode.
pests_hosts <- eppo_tabletools_hosts(pest_names, eppo_token)

head(pests_hosts[[1]], 5)
head(pests_hosts[[2]], 5)
Long format table with pests hosts
eppocode codeid host_eppocode idclass labelclass full_name
CARPPO 37021 MABSD 1 Major host Malus domestica
CARPPO 29214 CYDOB 9 Host Cydonia oblonga
CARPPO 35259 IUGRE 9 Host Juglans regia
CARPPO 41521 PRNAR 9 Host Prunus armeniaca
CARPPO 41563 PRNDO 9 Host Prunus domestica
Compact table with condensed information on hosts
eppocode hosts
CARPPO Major host: Malus domestica; Host: Cydonia oblonga, Juglans regia, Prunus armeniaca, Prunus domestica, Prunus dulcis, Prunus persica, Pyrus communis
CYDIIN Major host: Malus domestica; Wild/Weed: Malus baccata; Host: Cydonia oblonga, Malus, Pyrus, Pyrus communis; Experimental: Prunus
CYDILE Host: NA
CYDISP Host: NA
EPHICY Host: NA

Taxonomy of Pests and hosts

eppo_tabletools_taxo as other functions from this family returns a list with two data.frame:

  • fist is a long format taxonomy table;
  • second is table with main category of each eppocode.

Suppose, that from previous name query we are interested only in viroids and viruses. As they usually have a viroid or virus phrase in their name, we can simply limit the query to certain eppocodes.

virs_eppocodes <- pest_names$all_associated_names %>%
  dplyr::filter(grepl("viroid", fullname) | grepl("virus", fullname)) %>%
  .[,5] %>% #eppocodes are in 5th column
  unique()

We can now pass virs_eppocodes as raw_eppocodes argument, and in consequence receive taxonomy of viroids and viruses only.

virs_taxonomy <- eppo_tabletools_taxo(token = eppo_token,
                                      raw_eppocodes = virs_eppocodes,
                                      use_raw_codes = TRUE)
virs_taxonomy$long_table #you can also access list elements by their names
virs_taxonomy$compact_table
Long table of viruses and viroids taxonomy
codeid eppocode prefname level
60969 CPGV00 Viruses and viroids 1
64582 CPGV00 Baculoviridae 2
84121 CPGV00 Betabaculovirus 3
65443 CPGV00 Cydia pomonella granulovirus 4
60969 CCCVD0 Viruses and viroids 1
111354 CCCVD0 Riboviria 2
65268 CCCVD0 Pospiviroidae 3
65799 CCCVD0 Cocadviroid 4
64718 CCCVD0 Coconut cadang-cadang viroid 5
Compact table of viruses and viroids taxonomy
eppocode taxonomy
CPGV00 Baculoviridae
CCCVD0 Riboviria

Pests of hosts

It is possible to obtain data on pests of particular hosts with function eppo_tabletools_pests. Lets say we want to know all the pests associated with Abies alba (eppocode: ABIAL).

abies_pests <- eppo_tabletools_pests(token = eppo_token,
                                     raw_eppocodes = "ABIAL",
                                     use_raw_codes = TRUE)
head(abies_pests[[1]], 5)
head(abies_pests[[2]], 5)
Long table of Abies alba pests
eppocode pests_eppocode idclass labelclass fullname
ABIAL MELMME 10 Experimental Melampsora medusae (as Abies)
ABIAL MELMMD 10 Experimental Melampsora medusae f. sp. deltoidis (as Abies)
ABIAL ACLRGL 9 Host Acleris gloverana (as Abies)
ABIAL ACLRVA 9 Host Acleris variana (as Abies)
ABIAL AREAB 9 Host Arceuthobium abietinum (as Abies)
Compact table of Abies alba pests
eppocode pests
ABIAL Experimental: Melampsora medusae (as Abies), Melampsora medusae f. sp. deltoidis (as Abies); Host: Acleris gloverana (as Abies), Acleris variana (as Abies), Arceuthobium abietinum (as Abies), Arceuthobium douglasii (as Abies), Arceuthobium laricis (as Abies), Arceuthobium tsugense (as Abies), Bursaphelenchus xylophilus (as Abies), Chionaspis pinifoliae, Chionaspis pinifoliae (as Abies), Choristoneura freemani (as Abies), Choristoneura fumiferana (as Abies), Chrysomyxa abietis (as Abies), Coniferiporia weirii (as Pinaceae), Crisicoccus pini (as Abies), Dendroctonus micans, Dendrolimus sibiricus (as Abies), Dendrolimus spectabilis (as Abies), Dendrolimus superans (as Abies), Dothistroma septosporum, Dryocoetes confusus (as Abies), Gnathotrichus sulcatus (as Pinaceae), Gremmeniella abietina (as Abies), Heterobasidion irregulare (as Abies), Ips amitinus, Ips amitinus (as Abies), Ips subelongatus (as Abies), Ips typographus, Leptoglossus occidentalis (as Abies), Malacosoma disstria (as Abies), Monochamus alternatus (as Abies), Monochamus marmorator (as Abies), Monochamus obtusus (as Abies), Monochamus saltuarius (as Abies), Monochamus scutellatus (as Abies), Monochamus sutor (as Abies), Monochamus titillator (as Abies), Monochamus urussovi (as Abies), Phacidium coniferarum (as Abies), Phytophthora cinnamomi (as Pinaceae), Phytophthora ramorum, Pissodes castaneus, Polygraphus proximus (as Abies), Sirex ermak (as Abies), Sirex noctilio (as Abies), Tetropium gracilicorne (as Abies), Trichoferus campestris (as Abies); Major host: Chrysomyxa abietis, Monochamus sutor, Neonectria neomacrospora

Distribution of pests

eppo_tabletools_distri does not connect to REST API, but it downloads information from csv files directly from EPPO Global Database. As a consequence there is no token argument (since it does not need the EPPO token) – a variable containing result of eppo_names_tables. The function returns a two element list:

  • first one contains dataframe with distribution for organism/virus, including invalid records and eradicated status;
  • second contains single cell with condensed distribution for each eppocode, however including only present status.
pest_distri <- eppo_tabletools_distri(pest_names)
head(pestr_distri[[1]], 5)
head(pestr_distri[[2]], 5)
Long table with distribution of pests
eppocode continent country state country.code state.code Status
CARPPO Africa Algeria NA DZ NA Present, no details
CARPPO Africa Egypt NA EG NA Present, no details
CARPPO Africa Libya NA LY NA Present, no details
CARPPO Africa Mauritius NA MU NA Present, no details
CARPPO Africa Morocco NA MA NA Present, no details
Compact table with distribution of pests
eppocode distribution
CARPPO Africa: Algeria, Egypt, Libya, Mauritius, Morocco, South Africa, Tunisia; America: Argentina, Bolivia, Brazil, Canada, Chile, Colombia, Mexico, Peru, United States of America, Uruguay; Asia: Afghanistan, China, India, Iran, Iraq, Israel, Jordan, Kazakhstan, Kyrgyzstan, Lebanon, Pakistan, Syria, Tajikistan, Turkmenistan, Uzbekistan; Europe: Albania, Armenia, Austria, Azerbaijan, Belarus, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Georgia, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Malta, Moldova, Netherlands, Norway, Poland, Portugal, Romania, Russia, Serbia, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey, Ukraine, United Kingdom; Oceania: Australia, New Zealand
CYDIIN Asia: China, Japan; Europe: Russia
CYDILE NA: NA
CYDISP NA: NA
EPHICY NA: NA

Whole condensed table in one shot (currently does not include eppo_tabletools_pests):

Last, but not least, package offers a simple wrapper over above mentioned functions. If you want to make one table with all the informations: names, categorization, hosts, distribution and taxonomy – condensed to one cell per pest, please use eppo_table_full function that takes arguments:

eppo_fulltable <- eppo_table_full(c("Meloidogyne ethiopica", "Crataegus mexicana"),
                                  eppo_SQLite,
                                  eppo_token)

eppo_fulltable
codeid eppocode Preferred_name Other_names hosts categorization distribution taxonomy
84193 CSCME Crataegus mexicana Other languages: aubépine du Mexique, Mexican hawthorn, tejocote Host: NA NA: NA: NA: add/del/trans: NA/NA/NA NA: NA Plantae
79276 MELGET Meloidogyne ethiopica Other languages: root-knot nematode Major host: Actinidia chinensis, Actinidia deliciosa, Solanum lycopersicum, Vitis labrusca, Vitis vinifera; Wild/Weed: Ageratum conyzoides, Datura stramonium, Solanum nigrum; Host: Acacia mearnsii, Agave sisalana, Asparagus officinalis, Beta vulgaris, Brassica oleracea, Capsicum frutescens, Citrullus lanatus, Cucumis melo, Cucumis sativus, Cucurbita, Ensete ventricosum, Glycine max, Lactuca sativa, Nicotiana tabacum, Phaseolus vulgaris, Polymnia sonchifolia, Prunus persica, Saccharum officinarum, Sida rhombifolia, Solanum tuberosum, Vicia faba, Vigna unguiculata Africa: Morocco: Quarantine pest: add/del/trans: 2018/NA/NA | RPPO/EU: EPPO: Alert list: add/del/trans: 2011/NA/NA Africa: Ethiopia, Kenya, Mozambique, South Africa, Tanzania, Zimbabwe; America: Brazil, Chile, Peru Nematoda