Home Fundamentals Research Data Management FAIR Data Principles Metadata Ontologies Data Sharing Data Publications Data Management Plan Version Control & Git Public Data Repositories Persistent Identifiers Electronic Lab Notebooks (ELN) DataPLANT Implementations Annotated Research Context User Journey ARC specification ARC Commander QuickStart QuickStart (Experts) Swate QuickStart Walk-through Best Practices For Data Annotation DataHUB DataPLAN Ontology Service Landscape ARC Commander Manual Setup Git Installation ARC Commander Installation Windows MacOS Linux ARC Commander DataHUB Access Before we start Central Functions Initialize Clone Connect Synchronize Configure Branch ISA Metadata Functions ISA Metadata Investigation Study Assay Update Export ARCitect Manual Installation - Windows Installation - macOS Installation - Linux QuickStart QuickStart - Videos ARCmanager Manual What is the ARCmanager? How to use the ARCmanager Swate Manual Swate Installation Excel Browser Excel Desktop Windows – installer Windows – manually macOS – manually Organization-wide Core Features Annotation tables Building blocks Building Block Types Adding a Building Block Using Units with Building Blocks Filling cells with ontology terms Advanced Term Search Templates File Picker Expert Features Contribute Templates ISA-JSON DataHUB Manual Overview User Settings Generate a Personal Access Token (PAT) Projects Panel ARC Panel Forks Working with files ARC Settings ARC Wiki Groups Panel Create a new user group Data publications Passing Continuous Quality Control Submitting ARCs with ARChigator Track publication status Use your DOIs Guides ARC User Journey Create your ARC ARC Commander QuickStart ARC Commander QuickStart (Experts) ARCitect QuickStart Annotate Data in your ARC Annotation Principles ISA File Types Best Practices For Data Annotation Swate QuickStart Swate Walk-through Share your ARC Register at the DataHUB DataPLANT account Invite collaborators to your ARC Sharing ARCs via the DataHUB Work with your ARC Using ARCs with Galaxy Computational Workflows CWL Introduction CWL runner installation CWL Examples CWL Metadata Recommended ARC practices Syncing recommendation Keep files from syncing to the DataHUB Working with large data files Adding external data to the ARC ARCs in Enabling Platforms Publication to ARC Troubleshooting Git Troubleshooting Contribute Swate Templates Knowledge Base Teaching Materials Events 2023 Nov: CEPLAS PhD Module Oct: CSCS CEPLAS Start Your ARC Sept: MibiNet CEPLAS Start Your ARC July: RPTU Summer School on RDM July: Data Steward Circle May: CEPLAS Start Your ARC Series Start Your ARC Series - Videos Events 2024 CEPLAS ARC Trainings – Spring 2024 MibiNet CEPLAS DataPLANT Tool-Workshops Frequently Asked Questions

ARCs in Enabling Platforms

last updated at 2023-08-04

🚧 work in progress 🚧

About this guide

In this guide we explore how the ARC can help streamline data flows and project management in enabling platforms.

UserAdvanced ModeRead

Before we can start

It helps to be familiar with

Enabling Platforms

Central enabling platforms can range from small single-lab services to large instrumentation-based research infrastructures such as core facilities. Platforms offer a specific set of methods and assays as a routine support to a community of researchers. Typical examples with application in plant sciences include platforms offering "omics" technologies (genome, transcriptome, proteome, metabolome), phenotyping, (microscopic) imaging, genetic engineering or plant growth facilities.

Project communication and data flow

For simplicity, in this guide we call researchers, clients and customers approaching the platform "collaborators".

collaborator platform team

Central platforms interact with many versatile collaborators. Most platforms have established workflows or routines for (i) project initiation, (ii) submission of samples to be assayed, (iii) exchange of and access to generated data. This communication typically includes a ping-pong of meetings and emails to shape the study in mind, elaborate the biological question and hypothesis, define the most suitable method offered by the platform. To effectively process the project, the platform raises requirements for how samples need to be prepared and submitted.

Project management

As a platform you manage a lot of projects in parallel. Keeping these projects up-to-date and coordinated between collaborators and platform team members is challenging.

Here's a few tips to support your project management:

Streamlined data exchange

During project collaboration a lot of information including metadata is exchanged. Here's an idea what your data flow could look like with the ARC:

  1. Data exchange between collaborator and platform occurs through the ARC shared via the DataHUB. Depending on the phase of the project this involves different data and information.
    (1a) The collaborator primarily describes the investigation's goal and study design, annotates the submitted samples with metadata and adds associated protocols. Once available the collaborator receives assayed data and relevant information.
    (1b) The platform team receives relevant sample metadata to run the respective assay and complements the ARC with platform-specific assay metadata, protocols and standard operating procedures (SOPs).
  2. A clone of the ARC is stored locally at the platform.
  3. From the platform data source, e.g. a workstation attached to your assay device, the generated data can directly be written or transferred to the assay dataset folder of your ARC stored in the file storage

A few tips to streamline your data exchange:

Data storage, handling and transfer

Your platform produces a lot of valuable raw data. Depending on your setup and the type of data you generate, the data is stored on individual machines, hard drives, institutional or internal file shares. For instance, workstations directly associated with measuring devices require quickest data transfer rates or specific security restrictions and write data to proximal data stores rather than a cloud storage.

The DataHUB can help as a platform to centralize data from different sources and function as a geo-redundant backup with tracked changes:

Track changes and provenance

Different people are involved in the projects passing your platform. From team members to internal and external collaboration partners (collaborators). This makes it sometimes hard to keep track of who contributed how, what, when, why and to manage who needs access to which data.

Routine computations

Your platform supports researchers with a specific set of methods and assays. You perform routine measurements, follow established protocols and SOPs to generate specific types of samples and data. Aligned with your lab workflows, you also follow computational routines: For the same types of (raw) data you re-use computational steps for processing and analysis. Reusing computational workflows across projects helps comprehensibility for your collaborators and ensures reproducibility and quality for your data.

The standardized ARC structure helps with routine computations:

🚧 Articles and guides on CWL are currently work in progress.

Data publication

At some point you and your collaborator want to publish the data resulting from the project.

There are two major options: You can publish the current version of the ARC and receive a DOI. Some scientific journals require data to be published in domain-specific repositories, specialized for a certain type of assay. For this purpose converters help you export the relevant dataset and metadata into a repository-accepted format.

🚧 Both these routines are work in progress.

💡 We are working on converters to read and reshape the relevant data and metadata of your ARC into a format accepted by domain-specific repositories. You can support this by telling us relevant repositories for your type of data or help creating templates.

Preparing your platform to benefit from the ARC

Here we recommend steps that you can take to prepare your platform towards using ARCs. They are optional and it depends on your platform, whether these are suitable.

☑️ Create an example ARC

☑️ Share your example ARC via the DataHUB

☑️ Create a DataHUB group for your platform

☑️ Design metadata templates for sample annotation

☑️ Write a short guide for your collaborators

☑️ Start packaging your projects as ARCs 🚀

Meet your collaborators in an ARC

You are all set and ready to use ARCs for collaboration in your platform. Here we try to address a few questions from discussions with platform heads.

Who initiates the ARC: platform or collaborator?

You decide whether you prepare the ARC for you collaborators (scenario A) or meet them half the way (scenario B). Eventually this depends on different factors, e.g. the type of collaboration you agreed upon or whether or not the collaborators are used to work with the ARC and associated tools.

Following the exemplary scenario A, you could setup the ARC for your collaboration (1), add the relevant studies and assays (2) as well as templates (3) as discussed with them and ask them to complement the required metadata (4) and protocols (5) before you can run the assays and add the dataset (6).

Alternatively, collaborators already working with ARCs could invite you to "their" ARC (exemplary scenario B). They can independently set up the ARC and fill metadata (4) based on your prepared templates (3).

💡 In scenario B the collaborator might invite you to a very large ARC with data not really relevant for your platform-specific collaboration. In this case you might want to exclude irrelevant data or avoid downloading large data when syncing the ARC.

Can I retain my established naming convention for project management and data storage?

Running a central platform, you probably follow an established project management system or naming convention. This is particularly important for how project folders are named on your platform's data storage. When implementing ARCs, you do not need to change this system. You can simply name the ARCs the same way you are used to name your project folders.

This is also true for scenario B exemplified above. You can simply fork the collaborator's ARC and rename it according to your system.

💡 You might want to consider requiring your collaborators to name the assays (folder name or assay identifier) according to your project management system.

How can I share ARCs with non-ARC collaborators?

Not all of your collaborators use ARCs or are planning to do so, but you still want to interact with those collaborators in the same routines employed with everyone else. They only need a DataHUB account and they can simply sign in with their existing scientific account.

The DataHUB comes with built-in features that allow interaction with the ARC solely via the web browser without any addional tools.

DataPLANT Support

Besides these technical solutions, DataPLANT supports you with community-engaged data stewardship. For further assistance, feel free to reach out via our helpdesk or by contacting us directly .
Contribution Guide 📖
✏️ Edit this page