Automated Data Analysis Outcomes
The Group: Overview : Team : Outcomes : Publications
Elefant Software: Overview : API and Architecture : Components : Toolkit
Main Achievements
Some theoretical and more applied outcomes of our recent research are listed here.
- Structured Topic Models: Text topic modelling is a statistical method that allows multiple independent topics to be discovered in text collections. It has generated increasing interest because of the enlightening semantic nature of the topics discovered. We have developed models for structured text such as groups of tweets, blogs with comments, patents, etc.
- Graphical Models and Structured Learning: Data progressively reveals itself in highly structured form, digital images being a characteristic example. In typical images, the ground is below the sky, ships show up floating in water and people are not upside down. How can we construct algorithms that systematically and rigorously exploit the structure of this domain in order to improve the ability of Computer Vision systems? This is an example of our research in Graphical Models and Structured Learning. We develop novel mathematical models for understanding structured data and apply them
to challenging problems in Computer Vision and other domains. - Decision-theoretic Methods for Information Retrieval: Rederiving information retrieval (IR) via decision-theoretic objectives reduces tuning parameters, increases robustness, and can optimize income in settings such as sponsored results retrieval. The end result of our work in this area is a reformalisation of traditional IR methods to improve retrieval results (time is money!) and take full advantage of the economic opportunities available on the web. See Guo and Sanner, 2010.
- Learning Graph Matching: Graph Matching is a fundamental problem in computer science which can model many real-world problems like image recognition and webpage ranking. We have devised the first method that enables efficient data leveraging in order to boost the performance of graph matching algorithms. Graph matching algorithms which were previously fast but inaccurate can now be made accurate if the proposed methodology is applied. This has already started to generate impact in the research community, since the method can be used to improve the quality of any existing graph matching algorithm. See Caetano et al. 2009 for details.
Past Achievements
- High Performance Text Topic Modelling: Text topic modelling is a statistical method that allows multiple independent topics to be discovered in text collections. It has generated increasing interest for the advertising domain where groupings of related web page content and related users can help in targeting adverts. We have developed a high performance software package that runs on gigabytes of text using sparse vector tricks.
- Knowledge-based Topic Classifier for Text: We have built a knowledge-based topic classifier based on Wikipedia content to identify relevant topics for natural language text sources, including web pages. The key technology contribution is the use of multiple named entities to jointly disambiguate the topic of a web page even when single named entities may have multiple interpretations. Empirically, it is much more robust than an SVM-based supervised topic classification system trained on 6 Gb of content covering 500 topics. The new system uses 1,600,000 topic labels and tends to produce fewer false positives in the highest rated results. Due to the efficiency and accuracy, potential applications include topic labeling for ad-serving, automated search engines, and opinion mining.
- Kernel Tests of Independence: The technology referred to as "Hilbert space embeddings of distributions" uses a statistic called "maximum mean discrepency" (MMD) to test if two probability distributions are different. This is a general automatic technique for testing when data comes from different sources, and works in the context of complex domains such as bioinformatics or medical imaging where previous statistical methods offered no alternative. This will allow more reliable analysis of such data. See Gretton et al. 2008, below.
- Recommender Systems: Search sites such as Google and Yahoo, and consumer recommendation sites such as Amazon and movies sites need ways of ranking the results they return to users. We have developed efficient ranking algorithms that can be adapted affectively from user data about preferences, and user click-through data. The algorithms were the first so developed to scale to the larger sizes required for internet use, so would allow intermediate-sized companies to provide quality search capability. See Le and Smola, 2007, and Weimer et al., 2008.
Software
As well as our academic publications, we have substantial output in the form of software released in the Elefant platform.Highlighted Publications
Full publications are available at the NICTA publication page or the SML publication page. Here we list a few of our more significant publications, with NICTA authors underlined. Other publications are available from machine learning related conferences such as NIPS, ICML, ECML, AAAI, KDD, ISMB, CVPR, SIGIR, ICCV and CIKM.2010
Downey, C. and Sanner, S., Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda, In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel 2010.
Guo, S. and Sanner, S., Probabilistic Latent Maximal Marginal Relevance, Proceedings of the 33rd Annual ACM SIGIR Conference, ACM, Geneva, Switzerland 2010.
Guo, S. and Sanner, S., Real-time Multiattribute Bayesian Preference Elicitation with Pairwise Comparison Queries, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Yee Whye Teh and Mike Titterington (Eds.), Chia Laguna Resort, Sardinia, Italy, pp.289-296, 2010.
McAuley, J. and Caetano, T. S., Exploiting within-clique factorizations in junction-tree algorithms, AISTATS, 2010.
McAuley, J. and Caetano, T. S., Exploiting Data-Independence for Fast Belief-Propagation, ICML, 2010.
Petterson, J., Caetano, T. S., McAuley, J., and Yu, Jin, Exponential Family Graph Matching and Ranking, NIPS, 2010.
McAuley, J., Campos, Teofilo de, and Caetano, T. S., Unified graph matching in Euclidean spaces, CVPR, 2010.
Quadrianto, N., Kersting, K., Tuytelaars, T., and Buntine, W., Beyond 2D-grids: a dependence maximization view on image browsing, MIR '10: Proceedings of the international conference on Multimedia information retrieval, ACM, New York, NY, USA pp.339--348, 2010.
Quadrianto, N., Smola, A. J., Song, L., and Tuytelaars, T., Kernelized Sorting, IEEE Transactions on Pattern Analysis and Machine Intelligence, 99 (PrePrints) , IEEE Computer Society, Los Alamitos, CA, USA 2010.
Shi, Q., Li, Hanxi, and Shen, Chunhua, Rapid Face Recognition Using Hashing, Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, USA 2010.
O. Thomas, P. Sunehag, G. Dror, S. Yun, S. Kim, M. Robards, A. Smola, D. Green, P. Saunders, Wearable-sensor activity analysis using semi-Markov models with a grammar, Pervasive and Mobile Computing, 2010.
Tuytelaars, T., Lampert, C. H., Blaschko, M. B., and Buntine, W., Unsupervised Object Discovery: A Comparison, International Journal of Computer Vision, 88 (2) , pp.282-302, 2010.
2009
Buntine, W., Estimating Likelihoods for Topic Models, Proceedings of the 1st Asian Conference on Machine Learning, Nanjing, China 2009.
Buntine, W. and Grobelnik, M. and Mladenic, D. and Shawe-Taylor, J. Machine Learning and Knowledge Discovery in Databases · European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part I and II. (Eds.), Lecture Notes in Artificial Intelligence, Springer, Berlin 2009. ISBN: 978-3-642-04173-0 + 978-3-642-04179-2.
Caetano, T. S., McAuley, J., Cheng, L., Le, Q. V., and Smola, A. J., Learning Graph Matching, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 30 (6) , pp.1048-1058, 2009.
Chen, L., McAuley, J.J., Feris, R.S., Caetano, T.S. and Turk, M., Shape Classification Through Structured Learning of Matching Measures, International Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
Hutter, M., Feature Markov Decision Processes, Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), Atlantis Press, Arlington, Virginia pp.61-66, 2009.
Hutter, M., Feature Dynamic Bayesian Networks, Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), Atlantis Press, pp.67-73, 2009.
Jakulin, A., Buntine, W., La Pira, T. and Brasher, H., Analyzing the U.S. Senate in 2003: Similarities, Clusters, and Blocs, Political Analysis, June 2009, doi:10.1093/pan/mpp006.
McAuley, J.J., Caetano, T. S., and Smola, A.J., Robust Near-Isometric Matching via Structured Learning of Graphical Models, Advances in Neural Information Processing Systems (NIPS), 2009.
Quadrianto, N., Kersting, K., Reid, M., Caetano, T. S., and Buntine,
W., Kernel Conditional Quantile Estimation via Reduction Revisited,
IEEE International Conference on Data Mining, IEEE Computer Society,
2009.
N. Quadrianto, A. J. Smola, T. S. Caetano, Q. V. Le, Estimating Labels
from Label Proportions, Journal of Machine Learning Research, 10 ,
MIT Press, pp.2349-2374, 2009.
Quadrianto, N., Song, L., and Smola, A.J., Kernelized Sorting, Neural Information Processing Systems (NIPS 21), pp.1289-1296, 2009.
Robards, M. and Sunehag, P., Semi-Markov kmeans clustering and
activity recognition from body-worn sensors, IEEE International
Conference on Data Mining, 2009.
Sanner, S. and Boutilier, C., Practical Solution Techniques for First-order MDPs, Artificial Intelligence Journal (AIJ), 2009.
Sanner, S., Goetschalckx, R., Driessens, K., and Shani, G., Bayesian Real-time Dynamic Programming, 21st International Joint Conference on Artificial Intelligence (IJCAI-09), Boutilier (Eds.), Pasadena, USA pp.1-8, 2009.
Yu, Jin, Vishwanathan, S.V.N., and Zhang, Jian, The Entire Quantile Path of a Risk-Agnostic SVM Classifier, Conference on Uncertainty in Artificial Intelligence (UAI-09), 2009.
2008
Astashkin, S. and Sunehag, P., Real method of interpolation on subcouples of codimension one, Studia Mathematica, 185 (2) , pp.151-168, 2008.
Gretton, A., Fukumizu, K., Teo, C.H., Song, L., Schölkopf, B., and Smola, A.J., A Kernel Statistical Test of Independence, Advances in Neural Information Processing Systems 20, J.C. Platt and D. Koller and Y. Singer and S. Roweis (Eds.), MIT Press, Cambridge, MA 2008.
Hofmann, T., Schölkopf, B., and Smola, A.J., Kernel methods in machine learning, Annals of Statistics, 36 (3) , pp.1171-1220, 2008.
Hutter, M., Algorithmic Complexity, Scholarpedia, 3 (1) , pp.2573, 2008.
McAuley, J., Caetano, T.S., and Barbosa, M.S., Graph rigidity, cyclic belief propagation and point pattern matching, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.Ryabko, D. and Hutter, M., Predicting Non-Stationary Processes, Applied Mathematics Letters, 21 (5) , pp.477-482, 2008.
Nurmi, P., Lagerspetz, E., Buntine, W., Floreen, P. and Kukkonen, J., Product Retrieval for Grocery Stores, SIGIR '08: 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, 2008.
Quadrianto, N., Smola, A.J., Caetano, T.S. and Le, Q.V. , Estimating labels from label proportions, International Conference on Machine Learning (ICML), 2008.
Shen, Hao, Hüper, Knut, and Kleinsteuber, Martin, Local Convergence Analysis of FastICA and Related Algorithms, IEEE Transactions on Neural Networks, 19 (6) , pp.1022-1032, 2008.
Vishwanathan, S.V.N., Borgwardt, K.M., Schraudolph, N., and Kondor, I.R., On Graph Kernels, Journal of Machine Learning Research, 2008.
Weimer, M., Karatzoglou, A., Le, Q.V., and Smola, A.J., COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking , Advances in Neural Information Processing Systems 20, J.C. Platt and D. Koller and Y. Singer and S. Roweis (Eds.), MIT Press, Cambridge, MA 2008.
2007
Bakir, G. and Taskar, B. and Vishwanathan, S.V.N. and Hofmann, T. and Schölkopf, B. and Smola, A.J. (Eds.), Machine Learning with Structured Outputs, MIT Press, 2007.
Bray, Matthieu, Koller-Meier, Esther, Schraudolph, N., and Van Gool, Luc, Fast Stochastic Optimization for Articulated Structure Tracking, Image and Vision Computing, 25 (3) , pp.352-364, 2007.
Caetano, T.S., Cheng, L., Le, Q.V., and Smola, A.J., Learning Graph Matching,
International Conference on Computer Vision (ICCV), 2007.
Chernov, Alexey, Hutter, M., and Schmidhuber, J., Algorithmic Complexity Bounds on Future Prediction Errors, Information and Computation, 205 (2) , pp.242-261, 2007.
Günter, S., Schraudolph, N., and Vishwanathan, S.V.N., Fast Iterative Kernel Principal Component Analysis, Journal of Machine Learning Research, 8 , pp.1893-1918, 2007.
Helmke, Uwe, Hüper, K., Lee, Pei Yean, and Moore, John, Essential Matrix Estimation Using Gauss-Newton Iterations on a Manifold, International Journal of Computer Vision, 74 (2) , pp. 117-136, 2007.
Hutter, M., Legg, S., and Vitányi, Paul M.B., Algorithmic Probability, Scholarpedia, 2 (8) , pp.2572, 2007.
Hutter, M. and Muchnik, Andrej A., On Semimeasures Predicting Martin-Löf Random Sequences, Theoretical Computer Science, 382 (3) , pp.247-261, 2007.
Hutter, M., On Universal Prediction and Bayesian Confirmation, Theoretical Computer Science, 384 (1) , pp.33-48, 2007.
Hutter, M., Exact Bayesian Regression of Piecewise Constant Functions, Bayesian Analysis, 2 (4) , pp.635-664, 2007.
Hüper, K. and Leite, Fatima Silva, On the Geometry of Rolling and Interpolation Curves on S^n , SO_n , and Grassmann Manifolds, Journal of Dynamical and Control Systems, 13 (4) , pp.467-502, 2007.
Le, Q.V. and Smola, A.J., Direct Optimization of Ranking Measures, Journal of Machine Learning Research, 2007.
McAuley, J., Costa, L. da F., and Caetano, T.S., Rich club phenomenon across complex network hierarchies, Applied Physics Letters, 91 (084103) , 2007.
Song, L., Smola, A.J., Gretton, A., Bedo, J., and Borgwardt, K.M., Feature selection via dependence maximization, Journal of Machine Learning Research, 2007.
Vishwanathan, S.V.N., Smola, A. J., and Vidal, R., Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes, International Journal of Computer Vision, 73 (1) , Springer-Verlag, Netherlands pp.95-119, 2007.
Zhou, J., Cheng, L., and Bischof, W.F., Online learning with Novelty Detection in Human-guided Road Tracking, IEEE Transactions on Geoscience and Remote Sensing (TGRS), 45 (12) , pp.3967-3977, 2007.
