Bayes模型到底是个啥?我请教了BGLR的作者

发布时间 2023-09-26 22:13:46作者: 生物信息与育种

看到很多基因组选择教程,包括文献资料、R包文档和demo,都是很学术化的:划分训练集和测试集后,一股脑儿建模预测跑下来,比较下准确性。

但在实际应用中,建模和预测并非连在一起,通常是分开进行的。用参考群构建好了模型,用测试群来代入模型做预测。不理解这个过程,你都不知道模型到底是个啥,长什么样,有哪些参数。测试群体到底怎么预测,基因型如何代入模型。

我们应该知道,RRBLUP的模型由一系列标记效应和beta(固定效应系数)组成,新数据预测时,用加性基因型矩阵乘以标记效应再加上beta,如果有显性效应的话(如杂交种预测),再加上显性效应乘以标记效应。

## addictive
AddValue = as.matrix(AddCode) %*% AddEffect
Predicted = as.data.frame(AddValue + Beta)

## addictive+domonant
AddValue = as.matrix(AddCode) %*% AddEffect
DomValue = as.matrix(DomCode) %*% DomEffect
Predicted = as.data.frame(AddValue + DomValue + Beta)

经典机器学习的模型预测更简单,一个函数对保存为对象的模型计算就行了。如SVM的例子:

library(kernlab)
model_classifier <- ksvm(letter ~ ., data = train_data,
                          kernel = "vanilladot")  

predict(model_classifier, test_data)

那么,Bayes模型的形式是什么?新数据如何预测?

一kun年前,我摸索过BGLR,这是一个功能非常丰富的R包,包含常见Bayes和Blup方法。关于如何使用,在它官方文档中描述得非常详细(https://github.com/gdlc/BGLR-R),这里不做介绍。我虽然会用,但利用新数据基于模型推测了下预测值,发现不对。特意询问了BGLR作者Paulino Perez Rodríguez,没想到很快就得到他的回复,也理解了上面的问题。

以下是一个小白咨询大佬的邮件往来内容,希望对新生有所帮助。

Q

Enviado: jueves, 15 de abril de 2021 3:39
Para: Paulino Perez Rodríguez
Asunto: Ask the R package BGLR how to predict the phenotype based on marker effect

Dear professor,

I am XX from XX. The R package BGLR you developed is very awesome and I have benefited a lot. But I don't know how to calculate the phenotype when I want to use the built model to predict my new data. I know that the return value fit$yHat of BGLR represents the predicted value, and fit$ETA[[1]]$b is the marker effect. In the R package rrBLUP, the result of m_valid %*% snp_effect + fit$beta is used to calculate the new phenotype prediction value.

So, how can I use the new genotype to get the predicted phenotype based on the marker effect in BGLR?

Thank you very much and looking forward to your reply.

Best Wishes

A

发件人: Paulino Perez Rodríguez [mailto:perpdgo@colpos.mx]
发送时间: 2021年4月15日 23:39
主题: Re: Ask the R package BGLR how to predict the phenotype based on marker effect

Hi, 


just assign missing values to the phenotypes that you want to predict 

when fitting the mode. fit$yHat will provide the predictions for both observed and unobserved

phenotypes.

Q

Enviado: jueves, 15 de abril de 2021 21:00
Para: Paulino Perez Rodríguez
Asunto: 答复: Ask the R package BGLR how to predict the phenotype based on marker effect

Dear Prof.

Thank you for your reply! But maybe I didn’t express my meaning clearly. I want to use the current data to build a model to predict the next data instead of modeling every batch of data. Therefore, I need to use the parameters of the model to calculate the phenotype of the next data. fit$Hat is only the result of the current data prediction, but what should I do if I want to calculate the phenotype of the next data based on the current model?

In other words, I have built a model with old data, and now I have a batch of genotype data of new materials, how do I predict their phenotype?

Thanks again!

Regards

A

发件人: Paulino Perez Rodríguez [mailto:perpdgo@colpos.mx]
发送时间: 2021年4月16日 11:02
主题: Re: Ask the R package BGLR how to predict the phenotype based on marker effect

Just obtain the marker effects for the training set and then with the maker for the 

new materials do as follows:

GEBV = Xnew*betaHat 

where GEBV are the Genomic estimated breeding value

Xnew the matrix of markers for new materials

betaHat the marker effects estimated using training data (old data).

This is an option, another option is to use GBLUP.

See attachments for equations, slides 17 and 19.

Q

Enviado: jueves, 15 de abril de 2021 22:45
Para: Paulino Perez Rodríguez
Asunto: 答复: Ask the R package BGLR how to predict the phenotype based on marker effect

Dear Prof.

Thank you for your reply so quickly! I probably understand. But what is the relationship between GEBV and yHat? I tried to calculate X%*%beta+varE, but it is not equal to yHat. Can GEBV be used for phenotype selection?

Thanks again
Regards

A

发件人: Paulino Perez Rodríguez [mailto:perpdgo@colpos.mx]
发送时间: 2021年4月17日 0:19
主题: Re: Ask the R package BGLR how to predict the phenotype based on marker effect

actually if you are going to rank individuals you do not need yHat,  just the GEVBs

in this particular model, 

yHat=muHat + GEVB
 
X%*%beta+varE does not make sense, 

Regards.

Over

I got it. Thank you very much!

Regards.

相信看了后,你明白我的疑惑跟答案了吧。