XML Query TechnologiesXML is a popular data exchange format. It is similar to HTML (the markup language used for web pages), but more general as it allows to define custom markup tags. Today, much data is exchanged in XML and many disciplines have defined their own "dialects" of XML (e.g., XBRL for business data or SVG for vector graphics). Often, XML data needs to be queried and processed directly, i.e., no "host" application is available which can be used to alter or query the data. In reaction to this need, the big database vendors have implemented extensive XML add-ons to their database systems. These solutions work well if the XML data is small or shallow. However, for complex or large XML data sets, the performance of these databases degrades. The XML Query Technologies project develops tools for high performance XML query processing.
We have implemented a search engine for XML data which is memory efficient and supports ultra fast querying (it answers XPath queries faster than any system we have seen so far).
Under the hood of Ultra Fast XML Search?XML data does not work well with conventional database systems, because XML represents hierarchcal (tree shaped) data, which is not easily mapped into the table format of conventional databases. The philosophy of the XML Query Technologies project has been to design a query engine entirely from scratch, which stores XML natively so that the XML tree structure is preserved and therefore can be queried efficiently. Technically speaking, our current implementation (together with researchers from Chile and Finland) of XPath Search Engine is based on these three ingredients:
- Division of XML data into Fast Tree Index and
- Fast Text Index, plus
- state-of-the-art query engine which compiles XPath queres into the APIs of the Tree and Text Indexes.
Fast In-Memory XPath Search over Compressed Text and Tree Indexes.
D. Arroyuelo F. Claude, S. Maneth, V. Mäkinen, G. Navarro, K. Nguyen, J. Siren, N. Välimäki.
Available under CoRR abs/0907.2089.