Data analysis tries to find patterns, anomalies and correlations in data of various forms: structured or unstructured, static or streamed data. From the computer science view, data analysis requires distributed platforms optimized for storing, retrieving and processing large amounts of data – larger than can be hold on a single computer. The processing itself builds on techniques known from data mining, machine learning, statistics, and predictive analytics. To deploy them efficiently on a large scale, we often need completely new approaches and algorithms.

With our background on massively parallel computing, modern computer architectures, high performance interconnects and upcoming hardware trends, we focus on developing and optimizing highly scalable algorithms and distributed platforms for data analysis.

We evaluate our algorithms and prototypes in various scientific domains like earth system science, high energy physics, medicine, and material sciences and develop domain-specific methods to efficiently analyze and handle very large data sets and high bandwidth data streams.