Find out how ICT can support biomedical and clinical researchFind out more. From Clever cars to clever farms... Embedded Systems
Latest Tweets
Automated Data Analysis

Problem

There are large quantities of data being generated in many areas, including medicine, business and the internet, but there is a lack of integrated software tools for building smart components to automatically analyze such data.

Solution

In this project, we are developing an architecture and platform with the help of our partners to act as an integrated software toolkit for automated data analysis. With this, we hope to have a practical and generic technology for data understanding, analysis and summarization and use it in key application areas, including medicine and the internet. Elefant (Efficient Learning, Large-scale Inference, and Optimization Toolkit) is an open source library for machine learning licensed under the Mozilla Public License (MPL). We aim to develop an open source data analysis platform for prototyping and deploying data analysis algorithms.

We're also developing algorithms, software and demonstrators for document analysis including decision-theoretic and probabilistic information retrieval, text classification and topic modelling. 

Team

Here you can find the Automated Data Analysis team.  Christfried Webers leads the Elefant (Code) sub-project and Wray Buntine leads the Documents sub-project.

    Recent News

    • Congratulations to PhD student Lan Du for getting his ECML-PKDD 2010 paper accepted into the Machine Learning Journal, making it one of the top submissions for the conference.
    • We'll be seeing you at AI&Stats, CVPR, SIGIR, ICML, ECML-PKDD, PGM and ICDM this year.
    • Our software Elefant is 1st rank in number of downloads from the Machine Learning software site MLOSS.org (2457 downloads), see download stats.
    • Congratulations to our PhD students Dmitry Kamenetsky and Qinfeng (Javen) Shi who have recently submitted PhD dissertations, and Drs. Choon Hui Teo, Jin Yu and Xinhua Zhang and who have recently been awarded their PhD.
    • We've just wrapped up a sizeable effort for the Bayesian section of Encyclopedia of Machine Learning.
    • Wray Buntine and Tiberio Caetano give two tutorials at ECML PKDD 2009 on document analysis and graphical models.
    Organisers at ECML PKDD 2009 Conference and programme chairs

    Highlighted Papers for 2010

    • Du, L., Buntine, W.L., and Jin, H.,  "A Segmented Topic Model based on the Two-parameter Poisson-Dirichlet Process,"  ECML-PKDD, Barcelona, 2010.
    • McAuley, J. and Caetano, T. S.,  "Exploiting within-clique factorizations in junction-tree algorithms,"  AISTATS, Sardinia, 2010.
    • Guo, S. and Sanner, S.,  "Probabilistic latent maximal marginal relevance,"  Proceedings of the 33rd Annual ACM SIGIR Conference, ACM, Geneva, Switzerland, 2010.
    • Quadrianto, N., Kersting, K., Tuytelaars, T., and Buntine, W.,  "Beyond 2D-grids: a dependence maximization view on image browsing,"  MIR '10: Proceedings of the International Conference on Multimedia Information Retrieval, ACM, New York, NY, USA, pp.339-348, 2010.