PopVar
To make progress in breeding, populations should have a favorable mean and high genetic variance (Bernardo 2010). These two parameters can be combined into a single measure called the usefulness criterion (Schnell and Utz 1975), visualized in Figure 1.
Ideally, breeders would identify the set of parent combinations that,
when realized in a cross, would give rise to populations meeting these
requirements. PopVar
is a package that uses phenotypic and
genomewide marker data on a set of candidate parents to predict the
mean, genetic variance, and superior progeny mean in bi-parental or
multi-parental populations. Thre package also contains functionality for
performing cross-validation to determine the suitability of different
statistical models. More details are available in Mohammadi, Tiede, and
Smith (2015). A dataset think_barley
is included for
reference and examples.
You can install the released version of PopVar from CRAN with:
And the development version from GitHub with:
Below is a description of the functions provided in
PopVar
:
Function | Description |
---|---|
pop.predict |
Uses simulations to make predictions in recombinant inbred line populations; can internally perform cross-validation for model selections; can be quite slow. |
pop.predict2 |
Uses deterministic equations to make predictions in
populations of complete or partial selfing and with or without the
induction of doubled haploids; is much faster than
pop.predict ; does not perform cross-validation or model
selection internally. |
pop_predict2 |
Has the same functionality as
pop.predict2 , but accepts genomewide marker data in a
simpler matrix format. |
x.val |
Performs cross-validation to estimate model performance. |
mppop.predict |
Uses deterministic equations to make predictions in 2- or 4-way populations of complete or partial selfing and with or without the induction of doubled haploids; does not perform cross-validation or model selection internally. |
mpop_predict2 |
Has the same functionality as
mppop.predict , but accepts genomewide marker data in a
simpler matrix format. |
Below are some example uses of the functions in
PopVar
:
The code below simulates a single population of 1000 individuals for each of 150 crosses. For the sake of speed, the marker effects are predicted using RR-BLUP and no cross-validation is performed.
out <- pop.predict(G.in = G.in_ex, y.in = y.in_ex, map.in = map.in_ex,
crossing.table = cross.tab_ex,
nInd = 1000, nSim = 1,
nCV.iter = 1, models = "rrBLUP")
#> Warning in read.cross.csv(dir, file, na.strings, genotypes, estimate.map, : There is no genotype data!
#> Warning in summary.cross(cross): Some markers at the same position on chr
#> 1,2,3,4,5,6,7; use jittermap().
#> Warning in summary.cross(cross): Strange genotype pattern on chr 7.
The function returns a list, one element of which is called
predictions.
This element is itself a list of matrices
containing the predictions for each trait. They can be combined as
such:
predictions1 <- lapply(X = out$predictions, FUN = function(x) {
x1 <- as.data.frame(apply(X = x, MARGIN = 2, FUN = unlist), stringsAsFactors = FALSE)
cbind(x1[,c("Par1", "Par2")], sapply(X = x1[,-1:-2], as.numeric))
})
# Display the first few lines of the predictions for grain yield
knitr::kable(head(predictions1$Yield_param.df))
Par1 | Par2 | midPar.Pheno | midPar.GEBV | pred.mu | pred.mu_sd | pred.varG | pred.varG_sd | mu.sp_low | mu.sp_high | low.resp_FHB | low.resp_DON | low.resp_Height | high.resp_FHB | high.resp_DON | high.resp_Height | cor_w/_FHB | cor_w/_DON | cor_w/_Height |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FEG66-08 | MN97-125 | 99.200 | 100.65607 | 100.69374 | NA | 7.775435 | NA | 95.81105 | 105.3767 | 23.26397 | 23.80253 | 78.47839 | 25.28640 | 25.26019 | 75.84409 | 0.3153217 | 0.2925117 | -0.3240946 |
MN99-71 | FEG90-31 | NaN | 99.17635 | 99.08537 | NA | 5.211866 | NA | 95.16167 | 102.9523 | 21.69551 | 23.78150 | 77.63352 | 24.49852 | 24.74633 | 75.79760 | 0.4556383 | 0.1435835 | -0.1690035 |
MN96-141 | FEG183-52 | NaN | 101.28761 | 101.22351 | NA | 5.246749 | NA | 97.32545 | 105.2152 | 23.75849 | 23.98108 | 79.86153 | 27.34374 | 28.54024 | 75.92728 | 0.7173738 | 0.7423061 | -0.5477210 |
MN99-02 | FEG183-52 | NaN | 100.78451 | 100.70508 | NA | 9.942299 | NA | 95.26656 | 106.3046 | 25.58743 | 23.86081 | 74.82438 | 26.19262 | 26.60713 | 73.76061 | 0.1184871 | 0.4397981 | -0.1501410 |
FEG99-10 | FEG148-56 | NaN | 98.68505 | 98.65199 | NA | 4.355991 | NA | 95.10603 | 102.3650 | 19.45197 | 20.18099 | 85.84220 | 22.86477 | 24.37191 | 79.85329 | 0.5651892 | 0.6177282 | -0.5894789 |
MN99-62 | MN01-46 | 105.775 | 102.29724 | 102.34664 | NA | 5.171023 | NA | 98.58555 | 106.0047 | 26.13869 | 28.50720 | 72.82744 | 26.42637 | 28.32423 | 74.19458 | 0.1548802 | -0.2145202 | 0.4339962 |
Generating predictions via simulated populations can become
computationally burdensome when many thousands or hundreds of thousands
of crosses are possible. Fortunately, deterministic equations are
available to generate equivalent predictions in a fraction of the time.
These equations are provided in the pop.predict2
and
pop_predict2
functions.
The pop.predict2
function takes arguments in the same
format as pop.predict
. We have eliminated the arguments for
marker filtering and imputation and cross-validation, as the
pop.predict2
function does not support these steps. (You
may continue to conduct cross-validation using the x.val
function.) Therefore, the genotype data input for
pop.predict2
must not contain any missing
data. Further, these predictions assume fully inbred parents,
so marker genotypes must only be coded as -1 or 1. The data
G.in_ex_imputed
contains genotype data that is formatted
properly.
Below is an example of using the pop.predict2
function:
out2 <- pop.predict2(G.in = G.in_ex_imputed, y.in = y.in_ex, map.in = map.in_ex,
crossing.table = cross.tab_ex, models = "rrBLUP")
knitr::kable(head(subset(out2, trait == "Yield")))
parent1 | parent2 | trait | pred_mu | pred_varG | pred_musp_low | pred_musp_high | cor_w_FHB | cor_w_DON | cor_w_Yield | cor_w_Height | pred_cor_musp_low_FHB | pred_cor_musp_low_DON | pred_cor_musp_low_Yield | pred_cor_musp_low_Height | pred_cor_musp_high_FHB | pred_cor_musp_high_DON | pred_cor_musp_high_Yield | pred_cor_musp_high_Height | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | FEG66-08 | MN97-125 | Yield | 100.41166 | 7.768362 | 95.52021 | 105.3031 | 0.3412831 | 0.3303378 | NA | -0.4004154 | 22.27035 | 23.76433 | NA | 78.84012 | 24.96750 | 25.91641 | NA | 75.11838 |
7 | MN99-71 | FEG90-31 | Yield | 98.98546 | 5.298105 | 94.94591 | 103.0250 | 0.4959745 | 0.2776666 | NA | -0.1612311 | 21.86563 | 23.41697 | NA | 77.04669 | 24.85938 | 25.18159 | NA | 75.75153 |
11 | MN96-141 | FEG183-52 | Yield | 101.07137 | 5.571655 | 96.92884 | 105.2139 | 0.6132612 | 0.6961326 | NA | -0.4603399 | 24.36629 | 23.67733 | NA | 79.59345 | 27.88028 | 28.40806 | NA | 75.16009 |
15 | MN99-02 | FEG183-52 | Yield | 100.82967 | 9.499737 | 95.42053 | 106.2388 | 0.0249962 | 0.4211875 | NA | -0.1929275 | 26.15174 | 23.73807 | NA | 74.76672 | 26.29180 | 26.60344 | NA | 72.86270 |
19 | FEG99-10 | FEG148-56 | Yield | 98.01593 | 4.121803 | 94.45292 | 101.5789 | 0.5959334 | 0.6401323 | NA | -0.5080571 | 19.67552 | 19.58928 | NA | 85.81854 | 23.38585 | 24.23917 | NA | 80.69997 |
23 | MN99-62 | MN01-46 | Yield | 102.51094 | 4.673197 | 98.71709 | 106.3048 | 0.0755600 | -0.0827658 | NA | 0.4010907 | 26.07342 | 28.37124 | NA | 72.67125 | 26.20580 | 28.16102 | NA | 74.51340 |
Note that the output of
pop.predict2
is no longer a list, but a data frame containing the combined predictions for all traits.
The formatting requirements of G.in
for
pop.predict
and pop.predict2
are admittedly
confusing. Marker genotype data is commonly stored as a n x
p matrix, where n is the number of entries and
p the number of markers. The function pop_predict2
accommodates this general marker data storage. Here is an example:
out3 <- pop_predict2(M = G.in_ex_mat, y.in = y.in_ex, map.in = map.in_ex,
crossing.table = cross.tab_ex, models = "rrBLUP")
knitr::kable(head(subset(out2, trait == "Yield")))
parent1 | parent2 | trait | pred_mu | pred_varG | pred_musp_low | pred_musp_high | cor_w_FHB | cor_w_DON | cor_w_Yield | cor_w_Height | pred_cor_musp_low_FHB | pred_cor_musp_low_DON | pred_cor_musp_low_Yield | pred_cor_musp_low_Height | pred_cor_musp_high_FHB | pred_cor_musp_high_DON | pred_cor_musp_high_Yield | pred_cor_musp_high_Height | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | FEG66-08 | MN97-125 | Yield | 100.41166 | 7.768362 | 95.52021 | 105.3031 | 0.3412831 | 0.3303378 | NA | -0.4004154 | 22.27035 | 23.76433 | NA | 78.84012 | 24.96750 | 25.91641 | NA | 75.11838 |
7 | MN99-71 | FEG90-31 | Yield | 98.98546 | 5.298105 | 94.94591 | 103.0250 | 0.4959745 | 0.2776666 | NA | -0.1612311 | 21.86563 | 23.41697 | NA | 77.04669 | 24.85938 | 25.18159 | NA | 75.75153 |
11 | MN96-141 | FEG183-52 | Yield | 101.07137 | 5.571655 | 96.92884 | 105.2139 | 0.6132612 | 0.6961326 | NA | -0.4603399 | 24.36629 | 23.67733 | NA | 79.59345 | 27.88028 | 28.40806 | NA | 75.16009 |
15 | MN99-02 | FEG183-52 | Yield | 100.82967 | 9.499737 | 95.42053 | 106.2388 | 0.0249962 | 0.4211875 | NA | -0.1929275 | 26.15174 | 23.73807 | NA | 74.76672 | 26.29180 | 26.60344 | NA | 72.86270 |
19 | FEG99-10 | FEG148-56 | Yield | 98.01593 | 4.121803 | 94.45292 | 101.5789 | 0.5959334 | 0.6401323 | NA | -0.5080571 | 19.67552 | 19.58928 | NA | 85.81854 | 23.38585 | 24.23917 | NA | 80.69997 |
23 | MN99-62 | MN01-46 | Yield | 102.51094 | 4.673197 | 98.71709 | 106.3048 | 0.0755600 | -0.0827658 | NA | 0.4010907 | 26.07342 | 28.37124 | NA | 72.67125 | 26.20580 | 28.16102 | NA | 74.51340 |
The code below compares the functions pop.predict
and
pop.predict2
with respect to computation time and
results:
time1 <- system.time({
capture.output(pop.predict.out <- pop.predict(
G.in = G.in_ex_imputed, y.in = y.in_ex, map.in = map.in_ex, crossing.table = cross.tab_ex,
nInd = 1000, nSim = 1, nCV.iter = 1, models = "rrBLUP"))
})
#> Warning in read.cross.csv(dir, file, na.strings, genotypes, estimate.map, : There is no genotype data!
#> Warning in summary.cross(cross): Some markers at the same position on chr
#> 1,2,3,4,5,6,7; use jittermap().
#> Warning in summary.cross(cross): Strange genotype pattern on chr 7.
time2 <- system.time({pop.predict2.out <- pop.predict2(
G.in = G.in_ex_imputed, y.in = y.in_ex, map.in = map.in_ex,
crossing.table = cross.tab_ex,model = "rrBLUP")})
# Print the time (seconds) required for each function.
c(pop.predict = time1[[3]], pop.predict2 = time2[[3]])
#> pop.predict pop.predict2
#> 226.059 0.495
Plot results
predictions1 <- lapply(X = pop.predict.out$predictions, FUN = function(x) {
x1 <- as.data.frame(apply(X = x, MARGIN = 2, FUN = unlist), stringsAsFactors = FALSE)
cbind(x1[,c("Par1", "Par2")], sapply(X = x1[,-1:-2], as.numeric))
})
pop.predict.out1 <- predictions1$Yield_param.df[,c("Par1", "Par2", "pred.varG")]
pop.predict2.out1 <- subset(pop.predict2.out, trait == "Yield", c(parent1, parent2, pred_varG))
toplot <- merge(pop.predict.out1, pop.predict2.out1, by.x = c("Par1", "Par2"),
by.y = c("parent1", "parent2"))
plot(pred.varG ~ pred_varG, toplot,
xlab = "pop.predict2", ylab = "pop.predict",
main = "Comparsion of predicted genetic variance")
PopVar
also includes functions for predicting the mean,
genetic variance, and superior progeny mean of multi-parent populations.
The mppop.predict
function takes the same inputs as
pop.predict
or pop.predict2
, and the
mppop_predict2
function takes the same inputs as
pop_predict2
. Both require the additional argument
n.parents
, which will determine whether the populations are
formed by 2- or 4-way matings (support for 8-way populations is
forthcoming.)
Below is an example of using the mppop.predict
function:
# Generate predictions for all possible 4-way crosses of 10 sample parents
sample_parents <- sample(unique(unlist(cross.tab_ex)), 10)
mp_out <- mppop.predict(G.in = G.in_ex_imputed, y.in = y.in_ex, map.in = map.in_ex,
parents = sample_parents, n.parents = 4, models = "rrBLUP")
knitr::kable(head(subset(mp_out, trait == "Yield")))
parent1 | parent2 | parent3 | parent4 | trait | pred_mu | pred_varG | pred_musp_low | pred_musp_high | FHB | DON | Yield | Height | pred_cor_musp_low_FHB | pred_cor_musp_low_DON | pred_cor_musp_low_Yield | pred_cor_musp_low_Height | pred_cor_musp_high_FHB | pred_cor_musp_high_DON | pred_cor_musp_high_Yield | pred_cor_musp_high_Height | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | FEG105-33 | FEG183-25 | FEG43-82 | M119 | Yield | 97.29426 | 8.158764 | 92.28141 | 102.3071 | 0.0713393 | 0.3085768 | NA | -0.2212458 | 25.62796 | 22.45121 | NA | 81.83201 | 26.10974 | 24.57670 | NA | 79.41704 |
7 | FEG105-33 | FEG183-25 | FEG43-82 | MN00-51 | Yield | 100.00013 | 9.442453 | 94.60732 | 105.3929 | 0.2330148 | 0.3650999 | NA | -0.4313815 | 25.03099 | 21.53116 | NA | 83.20797 | 26.61280 | 24.07334 | NA | 78.64447 |
11 | FEG105-33 | FEG183-25 | FEG43-82 | MN01-87 | Yield | 98.96977 | 8.776115 | 93.77072 | 104.1688 | 0.2269710 | 0.3567679 | NA | -0.4463608 | 25.82477 | 22.15697 | NA | 82.79400 | 27.18251 | 24.41795 | NA | 77.39818 |
15 | FEG105-33 | FEG183-25 | FEG43-82 | MN96-186 | Yield | 100.06909 | 9.815362 | 94.57082 | 105.5674 | 0.2314497 | 0.3752859 | NA | -0.4381169 | 24.97936 | 21.87328 | NA | 83.44308 | 26.54520 | 24.51590 | NA | 78.94722 |
19 | FEG105-33 | FEG183-25 | FEG43-82 | MN98-34 | Yield | 98.79925 | 9.162671 | 93.48693 | 104.1116 | 0.1589555 | 0.2866203 | NA | -0.2997278 | 26.05402 | 22.33769 | NA | 81.56223 | 26.99360 | 24.12245 | NA | 77.64547 |
23 | FEG105-33 | FEG183-25 | FEG43-82 | MN98-60 | Yield | 97.95432 | 9.913708 | 92.42857 | 103.4801 | 0.0573674 | 0.3384932 | NA | -0.2880704 | 25.68047 | 22.64719 | NA | 81.50776 | 26.02667 | 24.99225 | NA | 78.18482 |
Below is an example of using the mppop_predict2
function:
# Generate predictions for all possible 4-way crosses of 10 sample parents
mp_out2 <- mppop_predict2(M = G.in_ex_mat, y.in = y.in_ex, map.in = map.in_ex,
parents = sample_parents, n.parents = 4, models = "rrBLUP")
knitr::kable(head(subset(mp_out2, trait == "Yield")))
parent1 | parent2 | parent3 | parent4 | trait | pred_mu | pred_varG | pred_musp_low | pred_musp_high | FHB | DON | Yield | Height | pred_cor_musp_low_FHB | pred_cor_musp_low_DON | pred_cor_musp_low_Yield | pred_cor_musp_low_Height | pred_cor_musp_high_FHB | pred_cor_musp_high_DON | pred_cor_musp_high_Yield | pred_cor_musp_high_Height | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | FEG105-33 | FEG183-25 | FEG43-82 | M119 | Yield | 97.29426 | 8.158764 | 92.28141 | 102.3071 | 0.0713393 | 0.3085768 | NA | -0.2212458 | 25.62796 | 22.45121 | NA | 81.83201 | 26.10974 | 24.57670 | NA | 79.41704 |
7 | FEG105-33 | FEG183-25 | FEG43-82 | MN00-51 | Yield | 100.00013 | 9.442453 | 94.60732 | 105.3929 | 0.2330148 | 0.3650999 | NA | -0.4313815 | 25.03099 | 21.53116 | NA | 83.20797 | 26.61280 | 24.07334 | NA | 78.64447 |
11 | FEG105-33 | FEG183-25 | FEG43-82 | MN01-87 | Yield | 98.96977 | 8.776115 | 93.77072 | 104.1688 | 0.2269710 | 0.3567679 | NA | -0.4463608 | 25.82477 | 22.15697 | NA | 82.79400 | 27.18251 | 24.41795 | NA | 77.39818 |
15 | FEG105-33 | FEG183-25 | FEG43-82 | MN96-186 | Yield | 100.06909 | 9.815362 | 94.57082 | 105.5674 | 0.2314497 | 0.3752859 | NA | -0.4381169 | 24.97936 | 21.87328 | NA | 83.44308 | 26.54520 | 24.51590 | NA | 78.94722 |
19 | FEG105-33 | FEG183-25 | FEG43-82 | MN98-34 | Yield | 98.79925 | 9.162671 | 93.48693 | 104.1116 | 0.1589555 | 0.2866203 | NA | -0.2997278 | 26.05402 | 22.33769 | NA | 81.56223 | 26.99360 | 24.12245 | NA | 77.64547 |
23 | FEG105-33 | FEG183-25 | FEG43-82 | MN98-60 | Yield | 97.95432 | 9.913708 | 92.42857 | 103.4801 | 0.0573674 | 0.3384932 | NA | -0.2880704 | 25.68047 | 22.64719 | NA | 81.50776 | 26.02667 | 24.99225 | NA | 78.18482 |
Bernardo, Rex. 2010. Breeding for Quantitative Traits in Plants. 2nd ed. Woodbury, Minnesota: Stemma Press.
Mohammadi, Mohsen, Tyler Tiede, and Kevin P. Smith. 2015. “PopVar: A Genome-Wide Procedure for Predicting Genetic Variance and Correlated Response in Biparental Breeding Populations.” Crop Science 55 (5): 2068–77. https://doi.org/10.2135/cropsci2015.01.0030.
Schnell, F. W., and H. F. Utz. 1975. “F1-leistung und elternwahl euphyder züchtung von selbstbefruchtern.” In Bericht über Die Arbeitstagung Der Vereinigung Österreichischer Pflanzenzüchter, 243–48. Gumpenstein, Austria: BAL Gumpenstein.