Discrete Component Analysis

Discrete Component Analysis




The Discrete Component Analysis (DCA) software is being developed as a stand-alone package, and as a plug-in to the Elefant system, a machine learning toolbox from NICTA.  Currently the software is being run in stand-alone mode using the data streaming libraries from the older and now unsupported MPCA system, developed at Helsinki Institute for IT. The software itself is written in the C language and compiles on a Linux and a Mac OS X environment.

The models presented here are  known under many names, such as latent Dirichlet allocation, multi-aspect models, multinomial PCA, and non-negative matrix factorisation.

The following reports are available for the first public release, version 0.202:

Examples included in the release include:

The software itself, published under the MPL license, can be downloaded in "tar.gz" format.  It is built using the GNU configure and autotools.  A lot of the examples require installation of a number of Perl-based scripts for data massaging and HTML reporting.  Otherwise, installation mainly requires the GNU Scientific Library (GSL), and the Judy library, both are available for easy install on most distributions.