Technical Details
1. The Tree Compressor
For small in-memory representation of trees, ComQu uses
grammar-based tree compression. The advantage of such compression
over other compression methods is that queries can be executed
directly on the compressed structure, without prior decompression.
The method is not only applicable to XML, but can be used in any
scenario where large or many trees need to fit into memory for
efficient processing.
BPLEX -- first implementation of grammar-based tree compressor;
produces
very small pointer-based tree representations.
Currently, no other compressor achieves comparable compression
performance to BPLEX. For mor details see:
RePair -- new, much faster compressor than BPLEX;
currently being developed in cooperation with University of Leipzig.
Produces similar compression ratios as BPLEX, but slashes time and memory
uasge by factors between 10 and 100.
Collaborators: Markus Lohrey (University of Leipzig)
2. Succinct Binary Coding
Outputs of BPLEX or RePair can be further compressed using succinct
binary representations, instead of pointers.
The resulting structures can be used for
XML selectivity estimation, see VLDB and ICDE papers below.
They produce the smallest queriable XML representations currently
available, see XANTEC paper.
-
XML Tree Structure Compression
S. Maneth, N. Mihaylov, and S. Sakr
To appear in
XANTEC'08,
DEXA Workshops, Proceedings, IEEE Computer Society, 2008.
-
Dependable Cardinality Forecasts for XQuery
J. Teubner, T. Grust, S. Maneth, and S. Sakr
To appear in
VLDB'08,
Proceedings, ACM Press, 2008.
-
Structural Selectivity Estimation for XML Documents
D. Fisher, S. Maneth
In
ICDE 2007,
Proceedings,
pp. 626-635,
IEEE Press, 2007.
Collaborators: Torsten Grust (TU Munich)
Jens Teubner (IBM Watson)
3. Efficient Processing over Compressed Trees
In general, a certain slowdown occurs when executing
queries over compressed structures.
However, for special tasks, such as
XML Schema validation or
Core XPath queries, algorithms can be provided which
run
faster on the compressed structure than on the
original,
see TCS paper below. We can also efficiently test membership
in
compressed sets of trees as used in the paper
with Inaba, see below.
For high-speed query processing,
ComQu will use an uncompressed but succinct tree representation.
As an outcome of the
Dagstuhl Seminar on
Structure-Based Compression of Complex Massive Data, we
are currently developing such a representation,
together
with an efficient
XPath and XQuery implementation on top of it.
Collaborators: Angela Bonifati (ICAR - CNR - Rende)
Greogory Leighton (University of Calgary)
Veli Mäkinen (University of Helsinki)
Gonzalo Navarro (Universidad de Chile - Santiago)
Andrea Pugliese (University of Calabria)