Hadoop distributed file system mechanism for processing of large datasets across computers cluster using programming techniques
Keywords:
file, hadoop, memory, pipeline, systemAbstract
In this paper, we have proved that the HDFS I/O operations performance is getting increased by integrating the set associativity in the cache design and changing the pipeline topology using fully connected digraph network topology. In read operation, since there is huge number of locations (words) at cache compared to direct mapping the chances of miss ratio is very low, hence reducing the swapping of the data between main memory and cache memory. This is increasing the memory I/O operations performance. In Write operation instead of using the sequential pipeline we need to construct the fully connected graph using the data blocks listed from the NameNode metadata. In sequential pipeline, the data is getting copied to source node in the pipeline. Source node will copy the data to next data block in the pipeline. The same copy process will continue until the last data block in the pipeline. The acknowledgment process has to follow the same process from last block to source block. The time required to transfer the data to all the data blocks in the pipeline and the acknowledgment process is almost 2n times to data copy time from one data block to another data block (if the replication factor is n).
Downloads
References
Anuradha, J. (2015). A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia computer science, 48, 319-324. https://doi.org/10.1016/j.procs.2015.04.188
Bende, S., & Shedge, R. (2016). Dealing with small files problem in hadoop distributed file system. Procedia Computer Science, 79, 1001-1012. https://doi.org/10.1016/j.procs.2016.03.127
Cho, J. Y., Jin, H. W., Lee, M., & Schwan, K. (2014). Dynamic core affinity for high-performance file upload on Hadoop Distributed File System. Parallel Computing, 40(10), 722-737. https://doi.org/10.1016/j.parco.2014.07.005
Ghazi, M. R., & Gangodkar, D. (2015). Hadoop, MapReduce and HDFS: a developers perspective. Procedia Computer Science, 48, 45-50. https://doi.org/10.1016/j.procs.2015.04.108
Hua, X., Wu, H., Li, Z., & Ren, S. (2014). Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks. Journal of Parallel and Distributed Computing, 74(8), 2770-2779. https://doi.org/10.1016/j.jpdc.2014.03.010
Jach, T., Magiera, E., & Froelich, W. (2015). Application of HADOOP to store and process big data gathered from an urban water distribution system. Procedia Engineering, 119, 1375-1380. https://doi.org/10.1016/j.proeng.2015.08.988
Lee, C. W., Hsieh, K. Y., Hsieh, S. Y., & Hsiao, H. C. (2014). A dynamic data placement strategy for hadoop in heterogeneous environments. Big Data Research, 1, 14-22. https://doi.org/10.1016/j.bdr.2014.07.002
Liu, K., & Dong, L. J. (2012). Research on cloud data storage technology and its architecture implementation. Procedia Engineering, 29, 133-137. https://doi.org/10.1016/j.proeng.2011.12.682
O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781. https://doi.org/10.1016/j.jbi.2013.07.001
Saraladevi, B., Pazhaniraja, N., Paul, P. V., Basha, M. S., & Dhavachelvan, P. (2015). Big Data and Hadoop-A study in security perspective. Procedia computer science, 50, 596-601. https://doi.org/10.1016/j.procs.2015.04.091
Saranya, S., Sarumathi, M., Swathi, B., Paul, P. V., Kumar, S. S., & Vengattaraman, T. (2015). Dynamic Preclusion of Encroachment in Hadoop Distributed File System. Procedia Computer Science, 50, 531-536. https://doi.org/10.1016/j.procs.2015.04.027
Uskenbayeva, R., Im Cho, Y., Temirbolatova, T., & Kozhamzharova, D. (2015). Integrating of data using the Hadoop and R. Procedia Computer Science, 56, 145-149. https://doi.org/10.1016/j.procs.2015.07.187
Uzunkaya, C., Ensari, T., & Kavurucu, Y. (2015). Hadoop ecosystem and its analysis on tweets. Procedia-Social and Behavioral Sciences, 195, 1890-1897. https://doi.org/10.1016/j.sbspro.2015.06.429
Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., & Chen, D. (2013). G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Future Generation Computer Systems, 29(3), 739-750. https://doi.org/10.1016/j.future.2012.09.001
Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., ... & Georgakopoulos, D. (2014). A security framework in G-Hadoop for big data computing across distributed Cloud data centres. Journal of Computer and System Sciences, 80(5), 994-1007. https://doi.org/10.1016/j.jcss.2014.02.006
Published
How to Cite
Issue
Section
Articles published in the International Research Journal of Management, IT and Social sciences (IRJMIS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IRJMIS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IRJMIS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IRJMIS volumes 7 onwards. Please read about the copyright notices for previous volumes under Journal History.