Data analysis tries to find patterns, anomalies and correlations in data of various forms: structured or unstructured, static or streamed data. From the computer science view, data analysis requires distributed platforms optimized for storing, retrieving and processing large amounts of data – larger than can be held on a single computer. The processing itself builds on techniques known from data mining, machine learning, statistics, and predictive analytics. To deploy them efficiently on a large scale, we often need entirely new approaches and algorithms.
With our background on massively parallel computing, modern computer architectures, high performance interconnects and upcoming hardware trends, we focus on developing and optimizing highly scalable algorithms and distributed platforms for data analysis.
We evaluate our algorithms and prototypes in various scientific domains such as high energy physics, earth system science, medicine, and material sciences and develop domain-specific methods to efficiently analyze and handle extensive data sets and high bandwidth data streams.