Projects     
                           Resume     
             Publications           Talks                           Travel     
                   Notes              Software            Data Sets     
 

Science Commons - Sharing Scientific Data

In all brainstorming sessions in which I participated at the BIRS Workshop on Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration the question of missing standard datasets came up. To allow a sound reviewing process and a comparision of scientific achievments in the field of visualization they are necessary. Some of my thoughts on this are summarized below.

Obstacles to Making Datasets Available

Ownership of the dataset prevents publication

Datasets are most of the times owned by cooperations partners not from the field of scientific visualization. This prevents the vis groups from deciding on their own about publication of a dataset. Cooperation partners may have reasons to keep their data proprietary. Potential reasons are basically the same as the obstacles listed here. If the ownership of a dataset is unclear keeping them locked might be an easy way to avoid problems.

Commercial Interest

Companies might want to sell datasets. Datasets might contain valuable information.

Personal Rights

Medical datasets might be protected by personal rights which disallow a publication even if they are anonymized.

Scientific Interest

Researchers who created a dataset might want to analyze and publish their results before making it available. Researchers working on the visualization of a dataset might want to show some cool images before others can create them.

Cost of Publication

Some work must be put into a dataset before it can be published. Conversion to a specific data format and documentation of the context are two examples. Publication itself also requires some funds. Examples here are setting up a webserver and providing the download bandwidth.

Attribution

An owner of a dataset might fear that if it is once released to the public the attribution of the dataset might get lost.

Tackling the Obstacles

The most important point is to empower the owner of the dataset to decide about the publication of a dataset on a sound basis. She should be able to clearly state potential problems with a publication. The next step is to learn about solutions to avoid these. The decision to publish might get more likely if she learns about the benefits of a publication, too.

One brick might be to put restrictions on the usage of a dataset in form of a license. Releasing source code of software is now a well established process. A bunch of open source licenses [OSI] are available and well documented. More recently the Creative Commons [CC] established a license which is targeted at artists who plan to release their work. The Science Commons Proposal [SCP] sketches how this work can be extended to deal with the issues specific to science.

To get started the existing Creative Common Licenses [CCL] could perhaps be used. But this ideas seems not to be very widespread. A few examples can be found on Common Content [CCNT] by searching for Data. Another idea would be to create a new publication channel for scientific datasets similar to EG-Models [EGM] .

In the long term I believe the Science Commons could establish a well foundation for licensing issues in science. The available experience in legal issues will facilitate this process as would input from scientists. This vision is sketched in an interview with Lawrence Lessig [LL] .

January 1, 2005, Science Commons [SC] will be launched as a new project.

Datasets

Visualization 2004 Contest. Simulation of a hurricane from the National Center for Atmospheric Research,
http://www.cse.msstate.edu/~graphics/vis04contest/

Kwan-Liu Ma's Time-Varying Volume Data Repository,
http://www.cs.ucdavis.edu/~ma/ITR/tvdr.html

Links

[OSI] Open Source Licenses,
http://opensource.org/licenses/

[CC] Creative Commons,
http://creativecommons.org

[SCP] Science Commons Proposal @ Creative Commons,
http://creativecommons.org/projects/science/proposal

[SC] Science Commons,
http://science.creativecommons.org

[CCL] Creative Commons License,
http://creativecommons.org/license/

[CCNT] Common Content,
http://www.commoncontent.org/

[EGM] EG-Models,
http://www.eg-models.de/

[LL] Interview with Lawrence Lessig,
http://www.biomedcentral.com/openaccess/archive/?page=features&issue=16


Creative Commons License
All original works on this website
unless otherwise noted
are copyright protected and licensed under
a Creative Commons License.