DREAM

Distributed RDF Engine with Adaptive Query
Optimization & Minimal Communication

A Quadrant-IV Paradigm

Results



We fully implemented DREAM using C and MPICH 3.0.4 [1], and evaluated it on a private and a public (i.e., Amazon EC2) clouds. We further compared DREAM against two state-of-the-art distributed RDF systems, Huang et al. [2] and H2RDF+ [3]. As for workloads, we utilized the two standard benchmark suites, YAGO2 [4] and LUBM [5]. On average, DREAM outperformed Huang et al. and H2RDF+ by 81% and 91%, respectively. Besides, DREAM reduced network traffic by averages of 16% and 13.4% versus Huang et al. and H2RDF+, respectively. Lastly, we assessed the scalability of DREAM on Amazon EC2 using large-scale datasets varying from 3 billion (or 700 GB) to 7 billion (or 1.2 TB) triples. The observed runtime results show that DREAM scales very well with huge datasets. For more details on these experiments as well as other investigational studies, please refer to Section 4 in the “DREAM: Distributed RDF Engine with Adaptive Query Planner and Minimal Communication” paper.

[1] MPICH: mpich
[2] Jiewen Huang, Daniel J Abadi, and Kun Ren. Scalable SPARQL querying of large RDF graphs. PVLDB, 4(11), 2011.
[3] Nikolaos Papailiou, Ioannis Konstantinou, Dimitrios Tsoumakos, Panagiotis Karras, and Nectarios Koziris. H2RDF+: High-performance distributed joins over large-scale RDF graphs. In IEEE Big Data, 2013.
[4] J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-Kelham, G. De Melo, and G. Weikum. Yago2: exploring and querying world knowledge in time, space, context, and many languages. In WWW companion, 2011.
[5] The LUBM Benchmark: lehigh-projects