BuntineW

Wray Buntine

Principal Researcher
Canberra Research Laboratory

General

Dr Buntine is a principal researcher at NICTA's Canberra Lab. in the Statistical Machine Learning group, and an Adjunct at the Australian National University. His interests are theoretical and applied research in machine learning, probabilistic methods, and information access, for instance document analysis and document retrieval.  A common thread for current research is non-parametric Bayesian methods for topic modelling.

Dr. Wray Buntine relocated from Helsinki Finland to Canberra Australia in April 2007 to join NICTA.  Recently of University of Helsinki and HIIT, and previously NASA Ames Research Center, UC Berkeley, and Google. 

Dr. Buntine's early work at NASA Ames research Center was in machine learning of decision trees and Bayesian networks, doing one of the first Bayesian non-parametric theses in machine learning.  He also has a development background using Linux, and has produced two GPL'd suites of machine learning software, IND for decision trees in the 90's and MPCA for component analysis mid 2000's.  His undergraduate was at University of Queensland where he did pure and applied mathematics and an MQual (1st class) in Computer Science.

full resume is available, as well as slides from an overview research talk, "Discrete Non-Parametric Models for Natural Language Processing".

Prospective Students

Sample projects are at official ANU website.

Wray 

Contact Details

Postal address: Locked Bag 8001, Canberra ACT 2601 AUSTRALIA

Office location: Building A, 7 London Circuit, Canberra

Email:Wray<dot>Buntine<at>nicta.com.au 

 

Current Research

Along with past and present PhD students Lan Du and Changyou Chen, I'm working on applications in topic models of Poisson-Dirichlet (or Pitman-Yor) processes and a near relative called Normalised Generalised Gamma processes.  The later are a kind of Poisson process and thus allows clean birth, death and transition operators.  We apply these to topic modelling as a way of handling different document and linguistic structures.  A good overview set of slides is "Discrete Non-Parametric Models for Natural Language Processing".  Some recent publications are:

Publications 

See the NICTA publications page under "Buntine".  Google Scholar and Uni. Trier's DBLP have good collections, including some of the oldies, as does Jie Tang's ArnetMiner entry for W.L.B.  I also have 4 patents, mostly thanks to Jon Oliver. My Erdos number is 42.

Slides from my ALTA 2011 Keynote, "Discovery in Text: Visualisation, Topics and Statistics".  Slides from my Australasian AI 2012 tutorial, "Discrete Non-parametric Methods for Machine Learning and Linguistics".  Slides from my teaser talk on document analysis for AI students at ANU, "Document Analysis Outline".

Software 

Placed my long used library for computing generalised second-order Stirling numbers, the secret behind our great structured topic modelling software, onto NICTA Forge and MLOSS.org, calling it libstb.  The theory behind this is explained in our arXiv tutorial article "A Bayesian View of the Poisson-Dirichlet Process".

Also cleaned up my workhorse for preprocessing text, the DCA-Bags suite of Perl routines.  This automates the task of producing bags and lists as input to topic modelling software.  I've created output for 5 different programmes so far (ldac, Matlab toolbox, ...), and input processing includes all sorts of tricks including some fancy tokenisation, stop words, optional stemming and collocations (n-grams).  Used it so far for Wikipedia (the lot), PubMed and Reuters RCV1 (the lot).  Again at NICTA Forge.

Community

General Chair for Asian Machine Learning Conference (ACML) 2013, being held in Canberra, Nov 13-15th. Senior PC for IJCAI 2013 and ECIR 2013. PC for NAACL 2013, ECML-PKDD 2013, ICTIR 2013.

Gave a tutorial "Discrete Non-parametric Methods for Machine Learning and Linguistics" at AI 2012 on Dec. 4th. Co-chair of Asian Machine Learning Conference (ACML) 2012, being held in Singapore, along with Steven H.-C. Hoi.   Senior PC for SDM 2012, ECIR 2012, AAI-25 (Australasian Joint Conf. on AI) and PC for UAI 2012, CIKM 2012, PGM 2012, NIPS 2012.

Co-organising MLSS Singapore 2011.  Senior PC for  IJCAI 2011, CIKM 2011, ACML 2011 and SWM2011 (workshop at IJCAI), PC for KDD 2011.  Local arrangements for ALTA 2011.

Co-chair for European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2009.

Gave a tutorial at 1st Asian Conference on Machine Learning (ACML'09), November 2-4, 2009, Nanjing, China.  Slides in "one per page with some URLs" and "4 per page but no URLs".

Editorial board for the Data Mining and Knowledge Discovery and New Generation Computing. Previously on editorial board for the Machine Learning Journal, Statistics and Computing, Applied Intelligence and the Journal of Artificial Intelligence Research.

Served on program committees and/or reviewing for AAAI, IJCAI, UAI, KDD, COLT, ICML, ECMLPKDD, ECIR, SIGIR, PAKDD, NIPS, Discovery Science, AI and Statistics, WSDM and WWW.  Co-Chair forInternational Workshop on Intelligent Information Access, Helsinki, July, 2006, Second International Workshop on Open Source Information Retrieval at SIGIR in Seattle, August 2006, Workshop on Web Search Technology - from Search to Semantic Search at the 1st Asian Semantic Web Conference, Beijing, September, 2006.