LDAEXC: LncRNA-Disease Associations Prediction with Deep Autoencoder and XGBoost Classifier.

发布时间 2023-12-08 09:50:26作者: 王闯wangchuang2017
LDAEXC: LncRNA-Disease Associations Prediction with Deep Autoencoder and XGBoost Classifier. 
Abstract / 摘要
MT翻译
Numerous scientific evidences have revealed that long non-coding RNAs (lncRNAs) are involved in the progression of human complex diseases and biological life activities. Therefore, identifying novel and potential disease-related lncRNAs is helpful to diagnosis, prognosis and therapy of many human complex diseases. Since traditional laboratory experiments are cost and time-consuming, a great quantity of computer algorithms have been proposed for predicting the relationships between lncRNAs and diseases. However, there are still much room for the improvement. In this paper, we introduce an accurate framework named LDAEXC to infer LncRNA-Disease Associations with deep autoencoder and XGBoost Classifier. LDAEXC utilizes different similarity views of lncRNAs and human diseases to construct features for each data sources. Then, the reduced features are obtained by feeding the constructed feature vectors into a deep autoencoder, and at last an XGBoost classifier is leveraged to calculate the latent lncRNA-disease-associated scores using reduced features. The fivefold cross-validation experiments on four datasets showed that LDAEXC reached AUC scores of 0.9676 ± 0.0043, 0.9449 ± 0.022, 0.9375 ± 0.0331 and 0.9556 ± 0.0134, respectively, significantly higher than other advanced similar computer methods. Extensive experiment results and case studies of two complex diseases (colon and breast cancers) further indicated the practicability and excellent prediction performance of LDAEXC in inferring unknown lncRNA-disease associations. TLDAEXC utilizes disease semantic similarity, lncRNA expression similarity, and Gaussian interaction profile kernel similarity of lncRNAs and diseases for feature construction. The constructed features are fed to a deep autoencoder to extract reduced features, and an XGBoost classifier is used to predict the lncRNA-disease associations based on the reduced features. The fivefold and tenfold cross-validation experiments on a benchmark dataset showed that LDAEXC could achieve AUC scores of 0.9676 and 0.9682, respectively, significantly higher than other state-of-the-art similar methods.
 
大量的科学证据表明,长链非编码RNA ( long non-coding RNAs,lncRNAs )参与了人类复杂疾病的进程和生物生命活动。因此,鉴定新的和潜在的疾病相关lncRNA有助于人类许多复杂疾病的诊断、预后和治疗。由于传统的实验室实验成本高、耗时长,大量的计算机算法被提出用于预测lncRNAs与疾病之间的关系。但是,仍有很大的提升空间。在本文中,我们引入了一个名为LDAEXC的精确框架,通过深度自编码器和XGBoost分类器来推断LncRNA -疾病关联。LDAEXC利用lncRNA和人类疾病的不同相似性视图为每个数据源构建特征。然后,将构建的特征向量输入到深度自编码器中得到约简后的特征,最后利用XGBoost分类器利用约简后的特征计算潜在的lncRNA -疾病关联分数。在4个数据集上的5折交叉验证实验表明,LDAEXC的AUC得分分别达到0.9676 ± 0.0043、0.9449 ± 0.022、0.9375 ± 0.0331和0.9556 ± 0.0134,显著高于其他先进的同类计算机方法。大量的实验结果和两个复杂疾病(结肠癌和乳腺癌)的案例研究进一步表明LDAEXC在推断未知lncRNA -疾病关联方面的实用性和出色的预测性能。TLDAEXC利用疾病语义相似度、lncRNA表达相似度、lncRNA与疾病的高斯交互轮廓核相似度进行特征构建。将构建的特征送入深度自编码器提取降维后的特征,并使用XGBoost分类器基于降维后的特征预测lncRNA -疾病关联。在基准数据集上的五折和十折交叉验证实验表明,LDAEXC能够取得0.9676和0.9682的AUC分数,显著高于其他先进的同类方法。