Hadoop distributed file system mechanism for processing of large datasets across computers cluster using programming techniques
In this paper, we have proved that the HDFS I/O operations performance is getting increased by integrating the set associativity in the cache design and changing the pipeline topology using fully connected digraph network topology. In read operation, since there is huge number of locations (words) at cache compared to direct mapping the chances of miss ratio is very low, hence reducing the swapping of the data between main memory and cache memory. This is increasing the memory I/O operations performance. In Write operation instead of using the sequential pipeline we need to construct the fully connected graph using the data blocks listed from the NameNode metadata. In sequential pipeline, the data is getting copied to source node in the pipeline. Source node will copy the data to next data block in the pipeline. The same copy process will continue until the last data block in the pipeline. The acknowledgment process has to follow the same process from last block to source block. The time required to transfer the data to all the data blocks in the pipeline and the acknowledgment process is almost 2n times to data copy time from one data block to another data block (if the replication factor is n).
Anuradha, J. (2015). A brief introduction on Big Data 5Vs characteristics and Hadoop technology. Procedia computer science, 48, 319-324. https://doi.org/10.1016/j.procs.2015.04.188
Bende, S., & Shedge, R. (2016). Dealing with small files problem in hadoop distributed file system. Procedia Computer Science, 79, 1001-1012. https://doi.org/10.1016/j.procs.2016.03.127
Cho, J. Y., Jin, H. W., Lee, M., & Schwan, K. (2014). Dynamic core affinity for high-performance file upload on Hadoop Distributed File System. Parallel Computing, 40(10), 722-737. https://doi.org/10.1016/j.parco.2014.07.005
Ghazi, M. R., & Gangodkar, D. (2015). Hadoop, MapReduce and HDFS: a developers perspective. Procedia Computer Science, 48, 45-50. https://doi.org/10.1016/j.procs.2015.04.108
Hua, X., Wu, H., Li, Z., & Ren, S. (2014). Enhancing throughput of the Hadoop Distributed File System for interaction-intensive tasks. Journal of Parallel and Distributed Computing, 74(8), 2770-2779. https://doi.org/10.1016/j.jpdc.2014.03.010
Jach, T., Magiera, E., & Froelich, W. (2015). Application of HADOOP to store and process big data gathered from an urban water distribution system. Procedia Engineering, 119, 1375-1380. https://doi.org/10.1016/j.proeng.2015.08.988
Lee, C. W., Hsieh, K. Y., Hsieh, S. Y., & Hsiao, H. C. (2014). A dynamic data placement strategy for hadoop in heterogeneous environments. Big Data Research, 1, 14-22. https://doi.org/10.1016/j.bdr.2014.07.002
Liu, K., & Dong, L. J. (2012). Research on cloud data storage technology and its architecture implementation. Procedia Engineering, 29, 133-137. https://doi.org/10.1016/j.proeng.2011.12.682
O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5), 774-781. https://doi.org/10.1016/j.jbi.2013.07.001
Saraladevi, B., Pazhaniraja, N., Paul, P. V., Basha, M. S., & Dhavachelvan, P. (2015). Big Data and Hadoop-A study in security perspective. Procedia computer science, 50, 596-601. https://doi.org/10.1016/j.procs.2015.04.091
Saranya, S., Sarumathi, M., Swathi, B., Paul, P. V., Kumar, S. S., & Vengattaraman, T. (2015). Dynamic Preclusion of Encroachment in Hadoop Distributed File System. Procedia Computer Science, 50, 531-536. https://doi.org/10.1016/j.procs.2015.04.027
Uskenbayeva, R., Im Cho, Y., Temirbolatova, T., & Kozhamzharova, D. (2015). Integrating of data using the Hadoop and R. Procedia Computer Science, 56, 145-149. https://doi.org/10.1016/j.procs.2015.07.187
Uzunkaya, C., Ensari, T., & Kavurucu, Y. (2015). Hadoop ecosystem and its analysis on tweets. Procedia-Social and Behavioral Sciences, 195, 1890-1897. https://doi.org/10.1016/j.sbspro.2015.06.429
Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., & Chen, D. (2013). G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Future Generation Computer Systems, 29(3), 739-750. https://doi.org/10.1016/j.future.2012.09.001
Zhao, J., Wang, L., Tao, J., Chen, J., Sun, W., Ranjan, R., ... & Georgakopoulos, D. (2014). A security framework in G-Hadoop for big data computing across distributed Cloud data centres. Journal of Computer and System Sciences, 80(5), 994-1007. https://doi.org/10.1016/j.jcss.2014.02.006
Copyright (c) 2019 International research journal of management, IT and social sciences
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All articles in the IRJMIS is published under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA) which allows authors and users to copy, distribute, transmit and adapt the article. The Creative Commons Attribution-ShareAlike 4.0 International License of an open-access article, as long as the author is properly attributed. Copyright on any research article published by SLOAP is retained by the author(s). Authors grant to the SLOAP right to publish the article, to be cited as its original publisher in case of reuse, and to distribute it in all forms and media. Copyright and source must be acknowledged and a link must be made to the article's DOI.