Join query enhancement processing (jqpro) with big rdf data on a distributed system using hashing-merge join technique

Nahla Mohammedelzein, Elawad Babiker (2021) Join query enhancement processing (jqpro) with big rdf data on a distributed system using hashing-merge join technique. PhD thesis, Universiti Malaysia Pahang (Contributors, Thesis advisor: Mazlina, Abdul Majid).

Join query enhancement processing (jqpro) with big rdf data on a distributed system using hashing-merge join - Accepted Version

Download (434kB) | Preview


Semantic web technologies have emerged in the last few years across different fields of study and their data are still growing rapidly. Specifically, the increased data storage and publishing capabilities in standard open web formats have made the technology much more successful. So, the data have become readable by humans, and they can be processed on a computer. The demand for complex multiple RDF queries is becoming significant with the increasing number of RDF triples. Such complex queries occasionally produce many common subexpressions. It is therefore extremely challenging to reduce the amount of RDF queries and transmission time for a vast number of related RDF data. Moreover, Recent literature shows that join query processing of Big RDF data has introduced many problems with respect to execution time and throughput. The hash-based encoding induces low execution time, which takes a long time to load and hence does not load all graphs. This is because the Resource Description Framework (RDF) collects and analyses large data in swarms, thereby having to deal with the inherent challenge of efficient swarm storage. The effective storage and data retrieval, which could be applied to high amounts of possible schema-less data, has also proven exceedingly difficult for RDF data storage. For instance, it is particularly difficult to view semantic and SPARQL query languages, as well as huge and complex graph patterns. To address this problem, a Join Query Processing Model (JQPro) is introduced for Big RDF data. The objectives of this research are: (i) formulate plan generator algorithms for join query processing on the basis of the previous research. (ii) develop an enhancement model of Join Query Processing (JQPro) based on SPARQL and Hadoop MapReduce using hashing-merge join technique to process Big RDF Data. (iii) evaluate and compare the performance based on the execution time, throughput, and CPU utilization of the JQPro model with existing models. On the other hand, the throughput was employed to measure the units of information that a system can process in each time frame. In addition, the CPU utilization was used in the big join query processing as an important resource element particularly during the map, to reduce phases. Furthermore, the hash-join and Sort-Merge algorithms were used to generate the join query processing, and this was employed due to their capacity to allow for more data sets to be joined. Both processes were sorted by algorithms on join attributes and the sorted relations was merged. Therefore, the join column sorted the groups of datasets with the same value. The sort–merge–join algorithm sorts the datasets on the joining attribute and then searches for tuples by merging the two datasets. Then, a processing framework for RDF queries was introduced and the benchmark was used for performance evaluation. Finally, the validation was conducted by standard statistical analysis to validate and compare the performance of the JQPro model with current models. In addition, the synthetic benchmarks Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) v06 were used for measurement. The experiment was carried out on three datasets ranging from 10 million to 1 billion RDF triples produced by the generator of WatDiv data with a scale factor of 10, 100 and 1000, respectively. A selective dataset for each experimental query was also used for the processing of RDFs with a LUBM benchmark in sizes 500, 1000 and 2000 million triples. The result revealed that there is a strong correlation between execution time and throughput with a strength of 99.9% percent as confirmed by the Pearson correlation coefficient. Furthermore, the findings show that the JQPro solution was comparable to gStore RDF-3X, RDFox and PARJ and the percentage of improved performance was 87.77% in terms of execution time. The CPU utilization was significantly increased by extensive mapping and reduced code computing. It is therefore inferred that the JQPro solution is timely and innovative, as it provides an efficient execution time and CPU utilization where users could perform better queries for Big RDF data processing in a seamless manner

Item Type: Thesis (PhD)
Additional Information: Thesis (Doctor of Philosophy) -- Universiti Malaysia Pahang – 2021, SV: Ts. Dr. Mazlina Binti Abdul Majid, NO.CD: 13247
Uncontrolled Keywords: Join query enhancement processing (jqpro), big rdf data, hashing-merge join technique
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Faculty/Division: Institute of Postgraduate Studies
Faculty of Computing
Depositing User: Mr. Nik Ahmad Nasyrun Nik Abd Malik
Date Deposited: 25 Aug 2023 02:15
Last Modified: 25 Aug 2023 02:15
Download Statistic: View Download Statistics

Actions (login required)

View Item View Item