Wray Buntine

Principal Researcher
Canberra Research Laboratory


Dr Buntine is a principal researcher at NICTA's Canberra Lab. in the Statistical Machine Learning group, and an Adjunct at the Australian National University. His interests are theoretical and applied research in machine learning, probabilistic methods, and information access, for instance document analysis and document retrieval.  A common thread for current research is non-parametric Bayesian methods for topic modelling.

Dr. Wray Buntine relocated from Helsinki Finland to Canberra Australia in April 2007 to join NICTA.  Recently of University of Helsinki and HIIT, and previously NASA Ames Research Center, UC Berkeley, and Google. 

Dr. Buntine's early work at NASA Ames research Center was in machine learning of decision trees and Bayesian networks, doing one of the first Bayesian non-parametric theses in machine learning.  He also has a development background using Linux, and has produced two GPL'd suites of machine learning software, IND for decision trees in the 90's and MPCA for component analysis mid 2000's.  His undergraduate was at University of Queensland where he did pure and applied mathematics and an MQual (1st class) in Computer Science.

full resume is available and a curated publication list, as well as slides from an overview research talk, "Discrete Non-Parametric Models for Natural Language Processing".

Prospective Students

Sample projects are at official ANU website.


Contact Details

Postal address: Locked Bag 8001, Canberra ACT 2601 AUSTRALIA

Office location: Building A, 7 London Circuit, Canberra



Current Research

Along with past and present PhD students Lan Du and Changyou Chen, I'm working on applications in topic models of Poisson-Dirichlet (or Pitman-Yor) processes and a near relative called Normalised Generalised Gamma processes.  The later are a kind of Poisson process and thus allows clean birth, death and transition operators.  We apply these to topic modelling as a way of handling different document and linguistic structures.

Good overview set of slides are:

Some recent publications are:


See the NICTA publications page under "Buntine".  Google Scholar and Uni. Trier's DBLP have good collections, including some of the oldies, as does Jie Tang's ArnetMiner entry for W.L.B.  I also have one patent, thanks to Jon Oliver. My Erdos number is 3.

Slides from my ALTA 2011 Keynote, "Discovery in Text: Visualisation, Topics and Statistics".  Slides from my Australasian AI 2012 tutorial, "Discrete Non-parametric Methods for Machine Learning and Linguistics".  Slides from my teaser talk on document analysis for AI students at ANU, "Document Analysis Outline".

W.r.t. Probabilistic Programming, here's a research proposal I did in 1995: We propose to develop a special purpose toolkit for programming with probabilities intended for use on machine learning problems.  The toolkit is to be implemented as an extension to C++.  The proposed toolkit includes a set of software libraries and an intelligent precompiler that allows probability definitions to be included in a program and probability statements to be efficiently spliced into the code.  An initial prototype shows that compact code for complex learning algorithms can be written quickly, including a significant number of popular learning algorithms and many more novel hybrids.

Software and Data

Released my long used library for computing generalised second-order Stirling numbers.  Also cleaned up my workhorse for preprocessing text, the DCA-Bags suite of Perl routines.  This automates the task of producing bags and lists as input to topic modelling software.  Data sets are available for this software too.

All available at the Software and Data page.


General Chair for Asian Machine Learning Conference (ACML) 2013, being held in Canberra, Nov 13-15th. Senior PC for IJCAI 2013 and ECIR 2013. PC for NAACL 2013, ECML-PKDD 2013, ICTIR 2013, CIKM 2013, UAI 2013, NIPS 2013.

Gave a tutorial "Discrete Non-parametric Methods for Machine Learning and Linguistics" at AI 2012 on Dec. 4th. Co-chair of Asian Machine Learning Conference (ACML) 2012, being held in Singapore, along with Steven H.-C. Hoi.   Senior PC for SDM 2012, ECIR 2012, AAI-25 (Australasian Joint Conf. on AI) and PC for UAI 2012, CIKM 2012, PGM 2012, NIPS 2012.

Co-organising MLSS Singapore 2011.  Senior PC for  IJCAI 2011, CIKM 2011, ACML 2011 and SWM2011 (workshop at IJCAI), PC for KDD 2011.  Local arrangements for ALTA 2011.

Co-chair for European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2009.

Gave a tutorial at 1st Asian Conference on Machine Learning (ACML'09), November 2-4, 2009, Nanjing, China.  Slides in "one per page with some URLs" and "4 per page but no URLs".

Editorial board for the Data Mining and Knowledge Discovery and New Generation Computing. Previously on editorial board for the Machine Learning Journal, Statistics and Computing, Applied Intelligence and the Journal of Artificial Intelligence Research.

Served on program committees and/or reviewing for AAAI, IJCAI, UAI, KDD, COLT, ICML, ECMLPKDD, ECIR, SIGIR, PAKDD, NIPS, Discovery Science, AI and Statistics, WSDM and WWW.  Co-Chair forInternational Workshop on Intelligent Information Access, Helsinki, July, 2006, Second International Workshop on Open Source Information Retrieval at SIGIR in Seattle, August 2006, Workshop on Web Search Technology - from Search to Semantic Search at the 1st Asian Semantic Web Conference, Beijing, September, 2006.