Staff View: Hadoop MapReduce's InputSplit based Indexing for Join Query Processing

Hadoop MapReduce's InputSplit based Indexing for Join Query Processing

Join queries are amongst the most used form of queries, where records from two or more tables or files are retrieved in order to have a comprehensive, comparable and contrasted view of certain data. However, the processing of the join queries come with higher overhead since all the tables or files i...

Full description

Main Authors:	Ahmad, R., Zakaria, M.N., Abdullahi, A.U.
Format:	Conference or Workshop Item
Institution:	Universiti Teknologi Petronas
Record Id / ISBN-0:	utp-eprints.23589 /
Published:	Institute of Electrical and Electronics Engineers Inc. 2019
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85084312161&doi=10.1109%2fICCSCE47578.2019.9068559&partnerID=40&md5=3d99a4359ec8de94a0728a053c23d1ee http://eprints.utp.edu.my/23589/
Tags:	Add Tag No Tags, Be the first to tag this record!

id	utp-eprints.23589
recordtype	eprints
spelling	utp-eprints.235892021-08-19T07:55:59Z Hadoop MapReduce's InputSplit based Indexing for Join Query Processing Ahmad, R. Zakaria, M.N. Abdullahi, A.U. Join queries are amongst the most used form of queries, where records from two or more tables or files are retrieved in order to have a comprehensive, comparable and contrasted view of certain data. However, the processing of the join queries come with higher overhead since all the tables or files involved in the process have to be considered. It can easily be imagined how much the overhead could become when data contained in such tables/files is big data. The use of indexing on Hadoop and its abstractions have resulted in improved performance when processing queries. However, even with the use of some of the indexing approaches, the processing of join query indicates higher overhead, except when the amount of data to processed is reduced by the indexing techniques before the query processing even get started. One indexing technique that ensures this, is the InputSplit based index. This paper showcases how InputSplit based indexing can be implemented in Hadoop MapReduce as well the experimental results of running a join query using such index. The results show at least 50 reduction in runtime when compared to both normal Hadoop MapReduce and Clustered Index based on blockIds query processing approaches. Â© 2019 IEEE. Institute of Electrical and Electronics Engineers Inc. 2019 Conference or Workshop Item NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85084312161&doi=10.1109%2fICCSCE47578.2019.9068559&partnerID=40&md5=3d99a4359ec8de94a0728a053c23d1ee Ahmad, R. and Zakaria, M.N. and Abdullahi, A.U. (2019) Hadoop MapReduce's InputSplit based Indexing for Join Query Processing. In: UNSPECIFIED. http://eprints.utp.edu.my/23589/
institution	Universiti Teknologi Petronas
collection	UTP Institutional Repository
description	Join queries are amongst the most used form of queries, where records from two or more tables or files are retrieved in order to have a comprehensive, comparable and contrasted view of certain data. However, the processing of the join queries come with higher overhead since all the tables or files involved in the process have to be considered. It can easily be imagined how much the overhead could become when data contained in such tables/files is big data. The use of indexing on Hadoop and its abstractions have resulted in improved performance when processing queries. However, even with the use of some of the indexing approaches, the processing of join query indicates higher overhead, except when the amount of data to processed is reduced by the indexing techniques before the query processing even get started. One indexing technique that ensures this, is the InputSplit based index. This paper showcases how InputSplit based indexing can be implemented in Hadoop MapReduce as well the experimental results of running a join query using such index. The results show at least 50 reduction in runtime when compared to both normal Hadoop MapReduce and Clustered Index based on blockIds query processing approaches. Â© 2019 IEEE.
format	Conference or Workshop Item
author	Ahmad, R. Zakaria, M.N. Abdullahi, A.U.
spellingShingle	Ahmad, R. Zakaria, M.N. Abdullahi, A.U. Hadoop MapReduce's InputSplit based Indexing for Join Query Processing
author_sort	Ahmad, R.
title	Hadoop MapReduce's InputSplit based Indexing for Join Query Processing
title_short	Hadoop MapReduce's InputSplit based Indexing for Join Query Processing
title_full	Hadoop MapReduce's InputSplit based Indexing for Join Query Processing
title_fullStr	Hadoop MapReduce's InputSplit based Indexing for Join Query Processing
title_full_unstemmed	Hadoop MapReduce's InputSplit based Indexing for Join Query Processing
title_sort	hadoop mapreduce's inputsplit based indexing for join query processing
publisher	Institute of Electrical and Electronics Engineers Inc.
publishDate	2019
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85084312161&doi=10.1109%2fICCSCE47578.2019.9068559&partnerID=40&md5=3d99a4359ec8de94a0728a053c23d1ee http://eprints.utp.edu.my/23589/
_version_	1741196699993374720
score	11.62408

Hadoop MapReduce's InputSplit based Indexing for Join Query Processing

Similar Items