Data Archiving

So! You'd like to archive and/or share your data! But where do you begin?

The center of the Milky Way galaxy imaged by NASA's Spitzer Space Telescope is displayed on a quarter-of-a-billion-pixel, high-definition 23-foot-wide (7-meter) LCD science visualization screen at NASA's Ames Research Center in Moffett Field, Calif.
Image credit: NASA/Ames/JPL-Caltech

 

There are a large--and growing--number of options available to you for data preservation and sharing. One of our favorites is Zenodo, which was developed and is managed by CERN. Zenodo is an online repository that accepts research products in many forms—articles, datasets, images, posters, software, and much more! Deposit into Zenodo is free and open: There are no file format restrictions, and files up to 50GB can be uploaded (though Zenodo will accept larger files on a case-by-case basis). 

Advantages:

  • Your data will be assigned a DOI. This will mean that whatever you upload will have a unique identifier and will be more easily citable (and discoverable!). This is great for sharing!
  • You can control how accessible you want your dataset to be. Zenodo includes options to allow you to select access rights, access conditions, and licenses for your uploads
  • Your data will be preserved! As previously mentioned, CERN developed and manages Zenodo. Your data will live on CERN servers in the CERN Data Center.

Want to get started?

First, let’s talk about your data:

  • What format is your data in? Is there an accepted standard in your research community? Is there a format you can use that is platform agnostic for even greater accessibility? These are questions that you’ll want to consider. If possible, select an open format, like CSV, XML, or JSON
  • What type of data are you uploading? Is the data you’re uploading raw data, or has it been manipulated for, or modified due to, analysis? If you would like your data to be reused, you may want to consider uploading raw data rather than any modified data to the greatest extent possible. On the other hand, if you are archiving data associated with a particular publication, you may want to archive and/or share your derived data.
  • What does your data look like? If you’re uploading tabular data, it should be structured in a way that makes sense. Column headers should be your data’s variables, and each row should contain an individual observation.
  • What about your metadata? Metadata (data about your data) is helpful to include—though in a separate file. A separate metadata file can allow you to include comments, to include information about your data’s units, to specify how you’ve treated null values, etc.
  • You may have a number of different files to include in your dataset—perhaps you have several data files, a metadata file, and a readme file. You can package these up together in an archive folder, such as a ZIP file. Although Zenodo will allow you to upload multiple files, packaging them together might make them easier to retrieve.

Now that you have a file (or several!) to upload, navigate to www.zenodo.org. Select the “Login” or “Sign up” option at the top of the page. You should be brought to a login screen. Zenodo offers options to either sign in by signing up for a new account or to use a login from Github or ORCID.

Once you’ve logged in or created an account and then logged in, select the “Upload” tab at the top of the page. You’ll be brought to an upload landing page. From here, select “New Upload.”

The upload process is straightforward. The key is to include as much valid metadata with your upload as possible. Doing so increases the findability and usability of your dataset.

Files

Screenshot of Zenodo upload Files section
 

Start the upload process by selecting (or dragging and dropping) the file(s) you plan to upload. After selecting the files, you’ll have to press the “Start upload” button at the top-right of the box.

Upload type

Screenshot of Zenodo upload Upload Type section
 

The next section asks you to select the format of your upload. You’ll likely want to select the “dataset” option, but data comes in many shapes and sizes, and if another option works better (like “image”), you should select what best describes your upload.

Basic information

The next section asks you to fill in basic information about your upload. Here, you’ll fill in: 

Screenshot of Zenodo upload Basic Information section, part 1
 

  • The publication date you’d like associated with your dataset. The date you begin the upload process is the default.
  • The title of your dataset.
  • The author(s), along with affiliation information, of the dataset.

Screenshot of Zenodo upload Basic Information section, part 2
 

  • A description of the dataset.
  • Some keywords associated with your dataset. We recommend using appropriate terms from the Unified Astronomy Thesaurus (UAT) later in the “Subject” section, but if there are any terms you’d like to use that aren’t in the UAT, you should use them here in the “Keywords” field.

License

Screenshot of Zenodo upload License section
 

The following section requires you to select access rights and a license for your dataset. The options you select will depend on what you’d like to allow others to be able to do with your data. If you would like to allow others to access and/or use your work without seeking your permission, we recommended selecting Open Access and an appropriate license. Selecting Open Access also gives your upload higher visibility on Zenodo.

  • Open Access: By selecting this option, your dataset will be fully available according to the terms of the Creative Commons license you select. For more information about the available Creative Commons licenses and what they mean, please visit https://creativecommons.org/licenses/
  • Embargoed Access: This option is similar to Open Access, but allows you to select a future date on which your upload will be published. Select the date and the license if you’d like this option.
  • Restricted Access: Select this option if you plan to allow others to access and/or use your work on a case-by-case basis. Someone who would like access to your dataset will have to contact you for permission. Please note that it is against Zenodo’s terms to charge other users for access to your uploads.
  • Closed Access: Select this option if you would like to prevent others from accessing your upload. Only the metadata you include with your upload will be visible to others. This option may be appropriate if you would like to use Zenodo to archive or preserve your dataset but not necessarily to share it with anyone else.

Communities

Screenshot of Zenodo upload Communities section
 

Zenodo includes an option to allow you to request to have your upload included in a specific “community,” which function essentially as collections. If you have a specific community you’d like to upload your dataset to, type the community’s name into the box. Once you publish your upload, the community’s owner will be notified of your request to have your dataset added. 

You can also start your own community, especially if you’d like to colocate several different uploads (either your own or a group’s). You can create your own community to publish works like conference proceedings (check out how here!), or works on a specific group or topic. For a few examples of astronomy-relevant Zenodo communities, check out the CfA Historical Materials Collection or the Astronomy Thesis Collection.

Funding

Screenshot of Zenodo upload Funding section
 

You should enter information into the “Grants” box only if you’ve received funding from the European Commission.

Related/alternate identifiers

Screenshot of Zenodo upload Related identifiers section
 

Use this section to link your upload to other related works. For example, if you’ve published an article related to the dataset you’re uploading, you may want to include that work’s DOI and/or ADS bibcode and to select the appropriate relationship from the dropdown menu. You’re not limited just to assigned identifiers, though; you can also include URLs or URNs and their relationship to your upload.

Have code that you want to share along with your data? You can get a DOI for your code, too. Review Github’s guide to Making Your Code Citable to learn how, and include the DOI as a related identifier for your dataset!

This step might not seem necessary—and, indeed, Zenodo doesn’t require it—but if there are any links you can make to any other works, we strongly recommend you do so. Doing so helps to increase the visibility of your work and any related work.

Contributors

Screenshot of Zenodo upload Contributors section
 

This section allows you to identify all other individuals who contributed to your dataset (beyond the authors you identified in the “Basic information” section). These are individuals who perhaps don’t have primary intellectual responsibility for the dataset you’re uploading, but still played in a role in its creation. If applicable, select an appropriate role from the dropdown for each contributor.

References, Journal, Conference, Book/Report/Chapter, & Thesis

Screenshot of Zenodo upload collapsed sections
 

These sections are unlikely to apply to a dataset you upload. Instead, if your dataset is related to another published work like an article, thesis, conference proceeding, etc., you should include that work’s identifier(s) in the “Related/alternate identifiers” section with the appropriate relationship.

Subject

Screenshot of Zenodo upload Subjects section
 

Add any appropriate terms from a taxonomy or controlled vocabulary, like the Unified Astronomy Thesaurus (UAT). Include both the term from the UAT (in the “Term” box) and the URL for the term (in the “Identifier” box). As mentioned previously, include any terms you’re not able to find in the UAT in the “Keywords” field in the “Basic information” section.

Save and Publish!

Screenshot of Zenodo upload Save and Publish section
 

Now that you’ve completed the record, you’re ready to publish your dataset! Double check to make sure you’ve completed all of the steps, and then select the “Save” button (either at the top or bottom of the page). Once your upload has been saved, you’ll be able to select “Publish” to complete the process.

Please contact library staff at library@cfa.harvard.edu if you have any questions or need any assistance with this process.