Generation N Data Management System
Community grid computing aims to provide researchers in a scientific field (i.e. climate research) with an integrated environment for efficient, and secure collaborative working and outreach to the general public in a way that matches the typical workflows and procedures of that community. Such a platform usually needs to support sharing of scientific data as well as sharing of available HPC compute and storage resources across administrative boundaries. Doing so requires the coordination of various activities between geographically distributed computing centers. This is commonly achieved with the help of community tailored grid middleware software solutions. We developed GNDMS, the Generation N Data Management System (GNMDS) in order to address data management aspects of community grids that need to provide distributed, computationally intense workflows with a data staging, replication and clean-up facility.
GNDMS is a set of Globus Toolkit 4 WSRF services and associated tools for grid data management based on staging and co-scheduling. GNDMS prepares, copies, replicates, caches, and deletes large datasets between supercomputer centers in an orchestrated, and secure way. GNDMS abstracts from data sources via a data integration layer and provides logical names, data transfers via GridFTP, proper handling of GSI certificate delegation and workspace management. Besides data management functionality the implementation provides components for remote logging, run-time reconfiguration, persistence, and fail-over beyond what is available from the underlying Globus Toolkit 4.
Originally, GNDMS was written and deployed for the data management needs of the Collaborative Climate Community Data and Processing Grid (C3-Grid) and is now being used in the Plasma-Technologie-Grid (PT-Grid) as part of the German D-Grid grid computing initiative. Nevertheless, the implementation is flexible and has been written for reuse by other grid projects with similiar data management requirements. Core components may be of use to developers of non-data management GT4 services as well.
GNDMS is available for commerical and non-commercial use free of charge as open source under the APL 2.0
Ulrike Golas, Florian Schintke
Jörg Bachmann, Maik Jorra, Stefan Plantikow