Abstract: Increasing demand to provision real-time smart application services has led many organization to collect data continuously. As a result has led to huge amount of size. MapReduce is the preferred framework to process massive data. Hadoop is a widely used MapReduce framework across community due to its open source nature. Cloud service provider such as Microsoft azure HDInsight offers resources to its customer and only pays for their use. Minimizing execution time on such platform is most desired. This work present a novel makespan model for Hadoop MapReduce framework namely OHMR (Optimized Hadoop MapReduce) to process data in real-time and utilize system resource efficiently. The OHMR present accurate model to compute job makespan time and also present a model to provision the amount of cloud resource required to meet task deadline. Experiment are conducted on Microsoft Azure HDInsight cloud platform considering bioinformatics application to evaluate performance of OHMR of over existing model shows significant performance improvement in terms of computation time. Experiment are conducted on Microsoft Azure HDInsight cloud. Overall good correlation is reported between practical makespan values and theoretical makespan values.

Keywords: Big data, Cloud computing, Hadoop, MapReduce, Parallel computing, Schedular.


PDF | DOI: 10.17148/IARJSET.2020.7630

Open chat