Data mining technique applied to DNA sequencing

Authors

  • R. Jamuna Department of Computer Science S.R. College, Bharathidasan University, Trichy

Keywords:

DNA, Genome, Technique, Sequencing, Human genome

Abstract

CpG islands (CGIs) play a vital role in genome analysis as genomic markers.  Identification of the CpG pair has contributed not only to the prediction of promoters but also to the understanding of the epigenetic causes of cancer. In the human genome wherever the dinucleotides CG occurs the C nucleotide (cytosine) undergoes chemical modifications. There is a relatively high probability of this modification that mutates C into a T. For biologically important reasons the mutation modification process is suppressed in short stretches of the genome, such as ‘start’ regions. In these regions, predominant CpG dinucleotides are found than elsewhere. Such regions are called CpG islands. DNA methylation is an effective means by which gene expression is silenced. In normal cells, DNA methylation functions to prevent the expression of imprinted and inactive X chromosome genes. In cancerous cells, DNA methylation inactivates tumor-suppressor genes, as well as DNA repair genes, can disrupt cell-cycle regulation. The most current methods for identifying CGIs suffered from various limitations and involved a lot of human interventions. This paper gives an easy searching technique with data mining of Markov Chain in genes. Markov chain model has been applied to study the probability of occurrence of C-G pair in the given gene sequence. Maximum Likelihood Estimators for the transition probabilities for each model and analogously for the model has been developed and log odds ratio that is calculated estimates the presence or absence of CpG islands in the given gene which brings in many  facts for the cancer detection in the human genome.

Downloads

Download data is not yet available.

References

Bridges, S. M., & Vaughn, R. B. (2000, October). Fuzzy data mining and genetic algorithms applied to intrusion detection. In Proceedings of 12th Annual Canadian Information Technology Security Symposium (pp. 109-122).

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.

Liao, S. H., Chu, P. H., & Hsiao, P. Y. (2012). Data mining techniques and applications–A decade review from 2000 to 2011. Expert systems with applications, 39(12), 11303-11311.

Ngai, E. W., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50(3), 559-569.

Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert systems with applications, 33(1), 135-146.

Published

2015-07-31

How to Cite

Jamuna, R. (2015). Data mining technique applied to DNA sequencing. International Research Journal of Management, IT and Social Sciences, 2(7), 15–19. Retrieved from https://sloap.org/journals/index.php/irjmis/article/view/314

Issue

Section

Peer Review Articles