Home Fundamentals Research Data Management FAIR Data Principles Metadata Ontologies Data Sharing Data Publications Data Management Plan Version Control & Git Public Data Repositories Persistent Identifiers Electronic Lab Notebooks (ELN) DataPLANT Implementations Annotated Research Context User Journey ARC specification ARC Commander QuickStart QuickStart (Experts) Swate QuickStart Walk-through Best Practices For Data Annotation DataHUB DataPLAN Ontology Service Landscape ARC Commander Manual Setup Git Installation ARC Commander Installation Windows MacOS Linux ARC Commander DataHUB Access Before we start Central Functions Initialize Clone Connect Synchronize Configure Branch ISA Metadata Functions ISA Metadata Investigation Study Assay Update Export ARCitect Manual Installation - Windows Installation - macOS Installation - Linux QuickStart ARCmanager Manual What is the ARCmanager? How to use the ARCmanager Swate Manual Swate Installation Excel Browser Excel Desktop Windows – installer Windows – manually macOS – manually Organization-wide Core Features Annotation tables Building blocks Building Block Types Adding a Building Block Using Units with Building Blocks Filling cells with ontology terms Advanced Term Search Templates File Picker Expert Features Contribute Templates ISA-JSON DataHUB Manual Overview User Settings Generate a Personal Access Token (PAT) Projects Panel ARC Panel Forks Working with files ARC Settings ARC Wiki Groups Panel Create a new user group Data publications Passing Continuous Quality Control Submitting ARCs with ARChigator Track publication status Use your DOIs Guides ARC User Journey Create your ARC ARC Commander QuickStart ARC Commander QuickStart (Experts) ARCitect QuickStart Annotate Data in your ARC Annotation Principles ISA File Types Best Practices For Data Annotation Swate QuickStart Swate Walk-through Share your ARC Register at the DataHUB DataPLANT account Invite collaborators to your ARC Sharing ARCs via the DataHUB Work with your ARC Using ARCs with Galaxy Computational Workflows CWL Introduction CWL runner installation CWL Examples CWL Metadata Recommended ARC practices Syncing recommendation Keep files from syncing to the DataHUB Working with large data files Adding external data to the ARC ARCs in Enabling Platforms Publication to ARC Troubleshooting Git Troubleshooting Contribute Swate Templates Knowledge Base Teaching Materials Events 2023 Nov: CEPLAS PhD Module Oct: CSCS CEPLAS Start Your ARC Sept: MibiNet CEPLAS Start Your ARC July: RPTU Summer School on RDM July: Data Steward Circle May: CEPLAS Start Your ARC Series Start Your ARC Series - Videos Frequently Asked Questions

Public Data Repositories

last updated at 2022-05-09 What are data repositories?

Public data repositories are one option to publish your research data. They usually focus on the data – as opposed to other research outputs such as manuscripts. Data repositories assign persistent identifiers (e.g. a DOI) to your dataset and by that comply with requirements of most publication journals.
We differentiate between domain-specific and general-purpose repositories.

Domain-specific data repositories

Domain-specific data repositories are well-established in a domain or community specialized on a certain data type. They frequently co-develop or foster compliance with metadata standards (see metadata) and oftentimes curate data. Data deposition at these repositories is recommended.
The following table lists examples of relevant endpoint repositories (ER) for data produced by DataPLANT participants. Check the links below for additional repositories.

Repository Description Biological data domain DataPLANT Templates available
EBI-ENA European Nucleotide Archive genome / transcriptome sequences
EBI-ArrayExpress Archive of Functional Genomics Data transcriptome
EBI-MetaboLights Database of Metabolomics metabolome
EBI-PRIDE PRoteomics IDEntifications Database proteome
EBI-BioImage Archive Stores and distributes biological images imaging, microscopy
e!DAL-PGP Plant Genomics & Phenomics Research Data Repository phenome
NCBI-GEO Gene Expression Omnibus transcriptome
NCBI-GenBank Genetic Sequence Database genome
NCBI-SRA Sequence Read Archive genome / transcriptome sequences
General-purpose repositories

In cases where no suitable domain-specific repository exists, general-purpose repositories are an option to publicly deposit research data and receive a PID. A benefit of general-purpose repositories is that they allow deposition of virtually any data type. Also research data packages with mixes of data types and computational workflows can be deposited, which aligns well with typical plant science investigations. However, since these repositories can only foster compliance with metadata standards at a very generic level (e.g. bibliographic or technical, see metadata), they limit the capacity for FAIR reuse of data.

Examples for general-purpose repositories include

Finding a suitable repository

The following resources provide good starting points to seek a suitable repository for your research data.

Submitting data to a public data repository

Depositing research data at a public data repository can be tedious. Especially the domain-specific repositories require compliance with specific data submission routines (a) in terms of format and content and (b) for both "raw data" and "metadata". Only data types relevant for the respective domain are accepted and need to be provided in proper data formats. In order to guarantee that the information required to properly describe the data is present, they require adherence to domain-specific metadata standards, represented in the proper format and oftentimes require the use of controlled vocabularies and ontologies. And finally the mere technicalities of how to collect and submit the (meta)data varies greatly between repositories, ranging from the use of pure upload via file transfers (e.g. FTP), APIs, online web forms or specialized software requiring local installation. The large repository providers invest a lot to harmonize their formats and submission routines. Still, there is a long way to go and we are currently far away from the unified way where "If you know one, you know them all."

Submitting to repositories

How does DataPLANT support me in submitting to public data repositories?

The following table gives an overview about DataPLANT tools and services related to submitting data to repositories. Follow the link in the first column for details.

Name Type Tasks on metadata
ARC
(Annotated Research Context)
Standard Structure:
  • Package data with metadata
Swate
(Swate Workflow Annotation Tool for Excel)
Tool Collect and structure:
  • Annotate experimental and computational workflows with ISA metadata schema
  • Easy use of ontologies and controlled vocabularies
  • Metadata templates for versatile data types
ARC Commander Tool Collect, structure and share:
  • Add bibliographical metadata to your ARC
  • ARC version control and sharing via DataPLANT's DataHUB
  • Automated metadata referencing and version control as your ARC grows
DataHUB Service Share:
  • Federated system to share ARCs
  • Manage who can view or access your ARC

DataPLANT Support

Besides these technical solutions, DataPLANT supports you with community-engaged data stewardship. For further assistance, feel free to reach out via our helpdesk or by contacting us directly .
Contribution Guide 📖
✏️ Edit this page