Identification of functionally methylated regions based on discriminant analysis through integrating methylation and gene expression data
Abstract
DNA methylation is essential not only in cellular differentiation but also in diseases. Identification of differentially methylated patterns between case and control groups is important in understanding the mechanism and possible functionality of complex diseases. We propose a method to find possible functionally methylated regions which not only are differentially methylated but also have an effect on gene expression. It integrates methylation and gene expression data and is based on distance discriminant analysis (DDA). In the procedure of identifying differentially methylated regions (DMRs), we do not need to cluster methylation sites or partition the genome in advance. Therefore, the identified DMRs have a larger coverage than those of bump hunting and Ong's methods. Furthermore, through incorporating gene expression data as a complementary source, whether these DMRs are functional is determined through estimating the difference of the corresponding genes. Through a comparison of our approach with bump hunting and Ong's methods for simulation data, it is shown that our method is more powerful in identifying DMRs which have a larger distance in the genome, or only consist of a few sites and have higher sensitivity and specificity. Also, our method is more robust to heterogeneity of data. Applied to different real datasets, we find that most of the functional DMRs are hyper-methylated and located at CpG rich regions (e.g. islands, TSS200 and TSS1500), consistent with the fact that the methylation levels of CpG islands are higher in tumors than normal. Through comparing and analyzing the results of different datasets, we find that the change of methylation in some regions may be related to diseases through changing expression of the corresponding genes, and show the effectiveness of our method.