Technical Details

1. The Tree Compressor

For small in-memory representation of trees, ComQu uses grammar-based tree compression. The advantage of such compression
over other compression methods is that queries can be executed directly on the compressed structure, without prior decompression.
The method is not only applicable to XML, but can be used in any scenario where large or many trees need to fit into memory for efficient processing.

BPLEX -- first implementation of grammar-based tree compressor; produces very small pointer-based tree representations.
Currently, no other compressor achieves comparable compression performance to BPLEX. For mor details see:

RePair -- new, much faster compressor than BPLEX; currently being developed in cooperation with University of Leipzig.
Produces similar compression ratios as BPLEX, but slashes time and memory uasge by factors between 10 and 100.

Collaborators: Markus Lohrey (University of Leipzig)


2. Succinct Binary Coding

Outputs of BPLEX or RePair can be further compressed using succinct binary representations, instead of pointers.
The resulting structures can be used for XML selectivity estimation, see VLDB and ICDE papers below.
They produce the smallest queriable XML representations currently available, see XANTEC paper.

Collaborators: Torsten Grust (TU Munich)
Jens Teubner (IBM Watson)


3. Efficient Processing over Compressed Trees

In general, a certain slowdown occurs when executing queries over compressed structures. However, for special tasks, such as
XML Schema validation or Core XPath queries, algorithms can be provided which run faster on the compressed structure than on the original,
see TCS paper below. We can also efficiently test membership in compressed sets of trees as used in the paper with Inaba, see below.

For high-speed query processing, ComQu will use an uncompressed but succinct tree representation. As an outcome of the
Dagstuhl Seminar on Structure-Based Compression of Complex Massive Data, we are currently developing such a representation,
together with an efficient XPath and XQuery implementation on top of it. An early (rough) draft can be found here

Collaborators: Angela Bonifati (ICAR - CNR - Rende)
Greogory Leighton (University of Calgary)
Veli Mäkinen (University of Helsinki)
Gonzalo Navarro (Universidad de Chile - Santiago)
Andrea Pugliese (University of Calabria)