Q155 : A model for identifying and eliminating noise and bias from Hi-C data
Thesis > Central Library of Shahrood University > Computer Engineering > MSc > 2018
Authors:
Saman Khakmardan [Author], Mohsen Rezvani[Supervisor], Ali Pouyan[Supervisor], Mansoor Fateh[Advisor], Hamid Alinejad Rokny [Advisor]
Abstarct: Today, medical information analysis can raise our understanding of the structure of the human body and its influential factors. Hence, researchers in the field of medicine are trying to identify these factors to prevent the spread of diseases. One of these influential factors is the structure and method of placing DNA strands in the three-dimensional space of chromosomes. The main idea behind this topic proposed where, in a complex DNA strand, interactions between two DNA regions that have a huge impact on body function when they are close to the spatial location. Therefore, various laboratory protocols have developed to obtain the structural information of the chromosomes. One of these protocols is Hi-C. This laboratory protocol has many systematic and laboratory errors. As a result, to use this information and to better understand body function, we need to remove noise and extract meaningful information from these data is essential. So far, many studies have been done in this regard, and various statistical methods have been developed to solve this problem. Generally, these methods have used a global statistical approach to detect noise. However, the process of noise recognition can be improved by considering a local approach. Therefore, in this abstract, merged local and global approaches together. For this purpose, proposed a method baxsed on a deep neural network that called Auto encoder, due to the ability of this network to eliminate noise. The proposed method initially models Hi-C data statistically. Then, this method improves the statistical model with respect to the impact of data on each other by using the neural network, in other words, generates a new model using the neural network. The simulation results show that the proposed method has had better performance with 3,771 credible interactions than the reference method (GOTHiC) with 3,772 valid interactions in the proposed method. Also, the correlation coefficient between the distance component and the number of interactions between the two regions in the proposed method and the reference method was respectively -0.1951 and -0.3100, respectively. As a result of the proposed method, the dependence of the number of interactions between the two area is more adherent to the distance between two areas. In general, in this study, we have shown that it is possible to simulate Hi-C data using the neural network and, baxsed on the model created by the neural network, eliminates the noise, as well as the verb Meaningful influence has been identified in Hi-C data. In addition, along with the development of the proposed method, we developed a tool called MHiC to eliminate noise and identify meaningful interactions in the R environment. In addition to the proposed method, we have implemented GOTHiC, HiCNorm and FitHiC methods to eliminate noise and detect meaningful interactions. Unlike existing implementations, each method implemented in this tool is capable of receiving input from HiC-Pro, HOMER, HiCUP sources as well as inputs designed for the HiCNorm method. The tool also displays Hi-C data using the Contact map diagram and the Arc diagram. The provided tool in this data enables the visualization of Hi-C data and provides a set of available techniques for noise elimination.
Keywords:
#Hi-C #Deep Neural Network #noise removal #Bioinformatics #regression Link
Keeping place: Central Library of Shahrood University
Visitor: