Efficient Synchronization of Distributed
|
|
Data ManagementData management plays an important role in efficient cluster and grid computing. One of its aspects is the distribution and synchronization of collections of files. Sample Configuration File:
<nsync_config>
In nsync the synchronization of a repository system is based on pairwise synchronization, composed to a synchronization of the whole system. Updates can be performed concurrently on any repository. Optimized Pairwise Synchronisationrsync allows to synchronize two files on different hosts without transmitting an entire file. This algorithm is used by nsync in case of low bandwidth connections. For new files nsync uses copy or compressed copy. Algorithm of Rsync
Performance Comparision
Host_B: 70MB Source |
Compositions based on GossipCompositions of pairwise synchronizations for basic topologies can be derived from the gossip problem. The solutions of the gossip problem are all-to-all communica- tions with minimal number of steps (parallel communication rounds). For complete graphs with n nodes it takes log n steps.
Topology Adaptive CompositionsThe pairwise synchronizations are composed to a synchronization of the whole system. The quality of this composition determines directly the runtime of the synchronization. Example: Low Bandwidth
Example: High Bandwidth
ReferencesT. Schütt, Synchronisation verteilter Verzeichnisstrukturen,Diploma Thesis, March 2002. A. Tridgell, Efficient Algorithms for Sorting and Synchronization, PhD Thesis, April 2000. Hromkovic et al, Dissemination of information in interconnection networks, Combinatorial Network Theory, 1996. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
More Details on nsync |
GeneralCurrent developments in high energy physics show that the management of large datasets plays an important role. For the DataGrid project it is necessary to distribute large datasets over several computing centers all over Europe and to synchronize these datasets. Within clusters tools for efficient synchronization and distribution of data become more important, too. For nsync, a method to synchronize distributed directory structures was developed and implemented which makes it possible to perform independent changes to arbitrary repositories simultaneously. This method needs no central instance and therefore the system achieves a better scalability than many existing systems. Knowledge from graph theory was used and improved to take the network topology and the network bandwidth between the computers into account. By using offline synchronization, changes will only be propagated when the user initiates it. This can be reasonable after a completed transaction which consists of changes on several files. nsync is written in C++. |
Papers
Related WorkContactThorsten Schütt |