Skip to content

Is My ARC Good Enough?

You followed Viola’s steps during the start here guide and are now overwhelmed? Sure, a guide streamlined onto a demo dataset is a whole different story than achieving this with your own complex data.
Here we provide recommendations and considerations for structuring an ARC based on your current project and datasets. Remember: creating an ARC is an ongoing process, and it’s meant to evolve over time.

The “final” ARC does not exist – Immutable, Yet Evolving!

Section titled The “final” ARC does not exist – Immutable, Yet Evolving!

Think of your ARC as an evolving entity that adapts and improves as your project progresses.

  • Don’t aim for perfection right away: At first, your ARC doesn’t need to be flawless. You’re not expected to win an award for the best ARC from the outset. The goal is for it to be useful to you. As long as your ARC serves its purpose—whether by organizing data, tracking workflows, or aiding in reproducibility—that’s a win.
  • Priorities vary across researchers: Different people may have different ideas about what should be made FAIR first and what can be polished later. Allow yourself to start with the basics and improve it step by step.

So, don’t stress about making your ARC perfect from the get-go—focus on making it functional.

Start Simple: Just Dump the Files Into Your ARC

Section titled Start Simple: Just Dump the Files Into Your ARC

An ARC’s core principle is that “everything is a file.” It’s common to work with a collection of files and folders in your daily research. Why not just start by organizing them into an ARC?

  • Initial File Dump: At first, don’t worry too much about the precise structure. Simply place your files into an “additional payload” folder within the ARC. This will help you get started without overthinking the details.
  • Version Control with Git: By putting your files in the ARC, you instantly gain the benefit of version control through Git. This helps you track changes and maintain a history of your files.
  • Safe Backup via DataHUB: Once you upload your ARC to the DataHUB, you’ll also have a secure backup of your files.

Add Metadata to Make Your ARC More Shareable and Citable

Section titled Add Metadata to Make Your ARC More Shareable and Citable

Next, enrich your ARC with some basic metadata:

  • Project and Creator Info: Include metadata about your project and the researchers involved. This step makes your ARC more sharable and citable from the start.
  • Link to the Investigation: Add this metadata to your investigation section. This is an easy way to ensure your work is discoverable and properly credited.

Sketch Your Laboratory Workflows

Section titled Sketch Your Laboratory Workflows

A key goal of an ARC is to trace each finding or result back to its originating biological experiment. To achieve this, your ARC will need to link dataset files to individual samples through a series of processes (laboratory or computational steps) with defined inputs and outputs.

  • Map Out Your Lab Workflows: Before diving into the structure of your ARC, take some time to sketch what you did in the lab. What experiments did you perform? What samples did you analyze? Which protocols did you follow? This sketch will help you understand how to organize your data and workflows later.

Organize Your Files into studies and assays

Section titled Organize Your Files into studies and assays

Once you have a better understanding of your lab processes, you can begin organizing your ARC:

  • Define studies and assays: Structure your data by moving files into relevant folders, such as studies and assays. This makes it clear where the raw data (dataset) is stored and which protocols were used to generate that data.
  • Reference Protocols: As you organize, simply reference the existing protocols (stored as free-text documents) in your ARC. This ensures consistency without overwhelming you with unnecessary details at this stage.
Section titled Simple First: Link Input and Output Nodes

Before delving into complex parameterization or detailed annotation tables, start simple:

  • Connect Inputs and Outputs: Begin by connecting your studies and assays through input and output nodes. This allows you to trace the flow of data through your workflows without getting bogged down by excessive detail.
  • Re-draw Lab Workflows: At this stage, you can essentially redraw your lab workflows as tables, mapping each process step to its inputs and outputs.

Parameterize Your Protocols for Machine Readability

Section titled Parameterize Your Protocols for Machine Readability

Once you have the basic structure in place, you can start making your data more machine-readable and searchable:

  • Parameterize Protocols: To improve reproducibility, break down your protocols and workflows into structured annotation tables. This will allow you to capture the parameters used at each step of your research.
  • Make It Searchable: This will make your study more discoverable and ensure that your methods are clear and reproducible.

Keep It Simple for Your Data Analysis Workflows

Section titled Keep It Simple for Your Data Analysis Workflows

The same approach applies to your data analysis workflows:

  • Treat Data Analysis as Protocols: Regardless of whether your data analysis involves clickable software or custom code, treat it like a protocol. For now, just store the results in your dataset folder.
  • Iterate as You Go: You don’t need to go into deep detail at first. Just focus on capturing the core analysis steps, and refine them later as your project progresses.

Making Data Analysis More Reproducible: Use CWL, Containers, and Dependency Management

Section titled Making Data Analysis More Reproducible: Use CWL, Containers, and Dependency Management

If you want to make your data analysis more reproducible and ensure that your workflows are easily reusable, consider wrapping your analysis tools in CWL (Common Workflow Language) and using containers:

  • CWL for Reproducibility: Use CWL to describe your computational workflows in a standardized way. This ensures that others can run your analysis with the same inputs and parameters, regardless of their system.
  • Containerization: Leverage Docker or Singularity containers to encapsulate all software dependencies. This makes it easier to share your workflows and ensures they run consistently across different environments.
  • Manage Dependencies: Use tools like Conda or Docker to manage your software dependencies, avoiding issues with mismatched versions or missing libraries.

Conclusion: The ARC is a Living FAIR Digital Object

Section titled Conclusion: The ARC is a Living FAIR Digital Object

The process of creating an ARC is gradual and evolving. Start simple, and focus on getting the basics in place. Over time, you can refine and enhance your ARC to improve its usefulness and functionality, making it a valuable tool for organizing, sharing, and reproducing your research.