| ||
In all brainstorming sessions in which I participated at the BIRS Workshop on Mathematical Foundations of Scientific Visualization, Computer Graphics, and Massive Data Exploration the question of missing standard datasets came up. To allow a sound reviewing process and a comparision of scientific achievments in the field of visualization they are necessary. Some of my thoughts on this are summarized below.
Ownership of the dataset prevents publication
Datasets are most of the times owned by cooperations partners not from the field of scientific visualization. This prevents the vis groups from deciding on their own about publication of a dataset. Cooperation partners may have reasons to keep their data proprietary. Potential reasons are basically the same as the obstacles listed here. If the ownership of a dataset is unclear keeping them locked might be an easy way to avoid problems.
Commercial Interest
Companies might want to sell datasets. Datasets might contain valuable information.
Personal Rights
Medical datasets might be protected by personal rights which disallow a publication even if they are anonymized.
Scientific Interest
Researchers who created a dataset might want to analyze and publish their results before making it available. Researchers working on the visualization of a dataset might want to show some cool images before others can create them.
Cost of Publication
Some work must be put into a dataset before it can be published. Conversion to a specific data format and documentation of the context are two examples. Publication itself also requires some funds. Examples here are setting up a webserver and providing the download bandwidth.
Attribution
An owner of a dataset might fear that if it is once released to the public the attribution of the dataset might get lost.
The most important point is to empower the owner of the dataset to decide about the publication of a dataset on a sound basis. She should be able to clearly state potential problems with a publication. The next step is to learn about solutions to avoid these. The decision to publish might get more likely if she learns about the benefits of a publication, too.
One brick might be to put restrictions on the usage of a dataset in form of a license. Releasing source code of software is now a well established process. A bunch of open source licenses [OSI] are available and well documented. More recently the Creative Commons [CC] established a license which is targeted at artists who plan to release their work. The Science Commons Proposal [SCP] sketches how this work can be extended to deal with the issues specific to science.
To get started the existing Creative Common Licenses [CCL] could perhaps be used. But this ideas seems not to be very widespread. A few examples can be found on Common Content [CCNT] by searching for Data. Another idea would be to create a new publication channel for scientific datasets similar to EG-Models [EGM] .
In the long term I believe the Science Commons could establish a well foundation for licensing issues in science. The available experience in legal issues will facilitate this process as would input from scientists. This vision is sketched in an interview with Lawrence Lessig [LL] .
January 1, 2005, Science Commons [SC] will be launched as a new project.
Visualization
2004 Contest. Simulation of a hurricane from the National Center for
Atmospheric Research,
http://www.cse.msstate.edu/~graphics/vis04contest/
Kwan-Liu Ma's Time-Varying Volume Data Repository,
http://www.cs.ucdavis.edu/~ma/ITR/tvdr.html
[OSI] Open Source Licenses,
http://opensource.org/licenses/
[CC] Creative Commons,
http://creativecommons.org
[SCP] Science Commons Proposal @ Creative Commons,
http://creativecommons.org/projects/science/proposal
[SC] Science Commons,
http://science.creativecommons.org
[CCL] Creative Commons License,
http://creativecommons.org/license/
[CCNT] Common Content,
http://www.commoncontent.org/
[EGM] EG-Models,
http://www.eg-models.de/
[LL] Interview with Lawrence Lessig,
http://www.biomedcentral.com/openaccess/archive/?page=features&issue=16