Optimal Joins Using Compressed Quadtrees

dc.contributor.authorArroyuelo, Diego
dc.contributor.authorNavarro, Gonzalo
dc.contributor.authorReutter, Juan L.
dc.contributor.authorRojas-Ledesma, Javiel
dc.date.accessioned2025-01-20T21:06:13Z
dc.date.available2025-01-20T21:06:13Z
dc.date.issued2022
dc.description.abstractWorst-case optimal join algorithms have gained a lot of attention in the database literature. We now count several algorithms that are optimal in the worst case, and many of them have been implemented and validated in practice. However, the implementation of these algorithms often requires an enhanced indexing structure: to achieve optimality one either needs to build completely new indexes or must populate the database with several instantiations of indexes such as B+-trees. Either way, this means spending an extra amount of storage space that is typically one or two orders of magnitude more than what is required to store the raw data.
dc.description.abstractWe show that worst-case optimal algorithms can be obtained directly from a representation that regards the relations as point sets in variable-dimensional grids, without the need of any significant extra storage. Our representation is a compressed quadtree for the static indexes and a quadtree built on the fly that shares subtrees (which we dub a qdag) for intermediate results. We develop a compositional algorithm to process full join queries under this representation, which simulates navigation of the quadtree of the output, and show that the running time of this algorithm is worst-case optimal in data complexity.
dc.description.abstractWe implement our index and compare it experimentally with state-of-the-art alternatives. Our experiments show that our index uses even less space than what is needed to store the data in raw form (and replaces it) and one or two orders of magnitude less space than the other indexes. At the same time, our query algorithm is competitive in time, even sharply outperforming other indexes in various cases.
dc.description.abstractFinally, we extend our framework to evaluate more expressive queries from relational algebra, including not only joins and intersections but also unions and negations. To obtain optimality on those more complex formulas, we introduce a lazy version of qdags we dub lqdags, which allow us navigate over the quadtree representing the output of a formula while only evaluating what is needed from its components. We show that the running time of our query algorithms on this extended set of operations is worst-case optimal under some constraints. Moving to full relational algebra, we also show that lqdags can handle selections and projections. While worst-case optimality is no longer guaranteed, we introduce a partial materialization scheme that extends results from Deep and Koutris regarding compressed representation of query results.
dc.fuente.origenWOS
dc.identifier.doi10.1145/3514231
dc.identifier.eissn1557-4644
dc.identifier.issn0362-5915
dc.identifier.urihttps://doi.org/10.1145/3514231
dc.identifier.urihttps://repositorio.uc.cl/handle/11534/93339
dc.identifier.wosidWOS:000807914000004
dc.issue.numero2
dc.language.isoen
dc.revistaAcm transactions on database systems
dc.rightsacceso restringido
dc.subjectJoin algorithms
dc.subjectcompact data structures
dc.subjectQuadtrees
dc.subjectAGM bound
dc.titleOptimal Joins Using Compressed Quadtrees
dc.typeartículo
dc.volumen47
sipa.indexWOS
sipa.trazabilidadWOS;2025-01-12
Files