Technical Details
1. The Tree Compressor
For small in-memory representation of trees, ComQu uses grammar-based tree compression. The advantage of such compressionover other compression methods is that queries can be executed directly on the compressed structure, without prior decompression.
The method is not only applicable to XML, but can be used in any scenario where large or many trees need to fit into memory for efficient processing. BPLEX -- first implementation of grammar-based tree compressor; produces very small pointer-based tree representations.
Currently, no other compressor achieves comparable compression performance to BPLEX. For mor details see:
-
Efficient Memory Representation of XML Document Trees
G. Busatto, S. Maneth, and M. Lohrey
Information Systems 33:456-474 (2008).
Produces similar compression ratios as BPLEX, but slashes time and memory uasge by factors between 10 and 100. Collaborators: Markus Lohrey (University of Leipzig)
2. Succinct Binary Coding
Outputs of BPLEX or RePair can be further compressed using succinct binary representations, instead of pointers.The resulting structures can be used for XML selectivity estimation, see VLDB and ICDE papers below.
They produce the smallest queriable XML representations currently available, see XANTEC paper.
-
XML Tree Structure Compression
S. Maneth, N. Mihaylov, and S. Sakr
In XANTEC'08, DEXA Workshops, Proceedings, pp 243-247, IEEE Computer Society, 2008. -
Dependable Cardinality Forecasts for XQuery
J. Teubner, T. Grust, S. Maneth, and S. Sakr
In VLDB'08, Proceedings, pp. 463-477, VLDB Endowment, 2008. -
Structural Selectivity Estimation for XML Documents
D. Fisher, S. Maneth
In ICDE 2007, Proceedings, pp. 626-635, IEEE Press, 2007.
Jens Teubner (IBM Watson)
3. Efficient Processing over Compressed Trees
In general, a certain slowdown occurs when executing queries over compressed structures. However, for special tasks, such asXML Schema validation or Core XPath queries, algorithms can be provided which run faster on the compressed structure than on the original,
see TCS paper below. We can also efficiently test membership in compressed sets of trees as used in the paper with Inaba, see below.
-
The Complexity of Tree Transducer Output Languages
K. Inaba and S. Maneth
In FSTTCS 2008, Proceedings, IBFI, 2008. -
The Complexity of Tree Automata and XPath on Grammar-Compressed Trees
M. Lohrey and S. Maneth
Theoret. Comput. Sci. 363:196-210 (2006)
Dagstuhl Seminar on Structure-Based Compression of Complex Massive Data, we are currently developing such a representation,
together with an efficient XPath and XQuery implementation on top of it. An early (rough) draft can be found here Collaborators: Angela Bonifati (ICAR - CNR - Rende)
Greogory Leighton (University of Calgary)
Veli Mäkinen (University of Helsinki)
Gonzalo Navarro (Universidad de Chile - Santiago)
Andrea Pugliese (University of Calabria)
