Skip to content

Linking inputs and outputs

A key objective of the ARC is to trace each finding or result back to its specific biological experiment. Achieving this requires linking dataset files to their corresponding individual samples. To accomplish this, we follow a sequence of processes with defined inputs and outputs.

Consider the example experiment from the Start Here guide where six Arabidopsis thaliana plants were exposed to cold stress, and the sugar content was measured as a response. The ARC structure for this experiment could look like this:

  • DirectoryAthalianaColdStressSugar
    • Directorystudies
      • DirectoryAthalianaColdStress
        • Directoryprotocols
          • plant-sampling.md
    • Directoryassays
      • DirectorySugarContent
        • Directorydataset
          • sugar_result.csv
        • Directoryprotocols
          • sugar_extraction.md
          • sugar_measurement.md
        • isa.assay.xlsx
        • README.md

The ARC contains one study (AthalianaColdStress) and one assay (SugarContent). The study includes a protocol for plant sampling describing how the plants were grown and treated, while the assay contains protocols for sugar extraction and sugar measurement. The dataset file sugar_result.csv holds the measured sugar content data.

The following three annotation tables describe the three consecutive processes:

  • Plant Sampling (part of the Study AthalianaColdStress),
  • Sugar Extraction (part of the Assay SugarContent), and
  • Sugar Measurement (part of the Assay SugarContent).

Each table starts with an Input column specifying the input entity (sample, material or data) for the respective process, followed by a ProtocolUri column indicating the protocol used, and ends with an Output column specifying the output entity resulting from the process.

The annotation tables (and effectively the studies and assays) are linked by reusing the respective identifiers of the Input and Output entities (samples, materials or dataset files) across the different processes – i.e. the Output of one process becomes the Input of the next process.

In this example we follow one line of highlighted samples through the processes:

Input[Source Name]ProtocolUri[…]Output[Sample Name]
Cold1./protocols/plant-sampling.mdCold1_leaf
Cold2./protocols/plant-sampling.mdCold2_leaf
Cold3./protocols/plant-sampling.mdCold3_leaf
RT1./protocols/plant-sampling.mdRT1_leaf
RT2./protocols/plant-sampling.mdRT2_leaf
RT3./protocols/plant-sampling.mdRT3_leaf

Following the simple approach of reusing sample and data identifiers in different parts of the ARC, we were able to concisely link the samples through the different lab processes in studies and assays to the data produced from those samples.

The tables above contain all information visualized in the following flowchart to show how the study and assay processes are connected:

Loading diagram...
Source
flowchart LR
linkStyle default stroke:#2d3e50,stroke-width:2px;
classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
subgraph study1[Study:AthalianaColdStress]
s1[Plants] ---p1[plant-sampling]--> s2[Leaves]
end
subgraph assay1[Assay:SugarContent]
s2 ---p2[SugarExtraction]--> s3[Sugar extracts]
s3 ---p3[SugarMeasurement]--> d1@{ shape: doc, label: sugar_result.csv}
end
class study1 studyStyle;
class assay1 assayStyle;
class p1,p2,p3 processStyle;
class s1,s2,s3 sampleStyle;
class d1 dataStyle;

Zooming in on the sample level, we can follow the samples through the processes from their biological origin to the data:

Loading diagram...
Source
%%{init: {
"flowchart": {
"nodeSpacing": 40,
"rankSpacing": 30
}
}}%%
flowchart LR
linkStyle default stroke:#2d3e50,stroke-width:2px;
classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
classDef cold1 fill:#fff59d,color:#2d3e50,font-weight:bold;
subgraph study1["Study:AthalianaColdStress"]
subgraph p1[Plant Sampling]
subgraph in1[Plants]
is1a[Cold1]
is1b[Cold2]
is1c[Cold3]
is1d[RT1]
is1e[RT2]
is1f[RT3]
end
subgraph out1[Leaves]
os2a[Cold1_leaf]
os2b[Cold2_leaf]
os2c[Cold3_leaf]
os2d[RT1_leaf]
os2e[RT2_leaf]
os2f[RT3_leaf]
end
is1a --> os2a
is1b --> os2b
is1c --> os2c
is1d --> os2d
is1e --> os2e
is1f --> os2f
end
end
os2a[Cold1_leaf] --> is2a[Cold1_leaf]
os2b[Cold2_leaf] --> is2b[Cold2_leaf]
os2c[Cold3_leaf] --> is2c[Cold3_leaf]
os2d[RT1_leaf] --> is2d[RT1_leaf]
os2e[RT2_leaf] --> is2e[RT2_leaf]
os2f[RT3_leaf] --> is2f[RT3_leaf]
subgraph assay1[Assay: SugarContent]
subgraph p2[Sugar Extraction]
subgraph in2[Leaves]
is2a
is2b
is2c
is2d
is2e
is2f
end
subgraph out2[Sugar extracts]
os3a[Cold1_sugar-ext]
os3b[Cold2_sugar-ext]
os3c[Cold3_sugar-ext]
os3d[RT1_sugar-ext]
os3e[RT2_sugar-ext]
os3f[RT3_sugar-ext]
end
is2a --> os3a
is2b --> os3b
is2c --> os3c
is2d --> os3d
is2e --> os3e
is2f --> os3f
end
os3a --> is3a[Cold1_sugar-ext]
os3b --> is3b[Cold2_sugar-ext]
os3c --> is3c[Cold3_sugar-ext]
os3d --> is3d[RT1_sugar-ext]
os3e --> is3e[RT2_sugar-ext]
os3f --> is3f[RT3_sugar-ext]
subgraph p3[Sugar Measurement]
subgraph in3[Sugar extracts]
is3a
is3b
is3c
is3d
is3e
is3f
end
is3a --> d1@{ shape: doc, label: sugar_result.csv}
is3b --> d1
is3c --> d1
is3d --> d1
is3e --> d1
is3f --> d1
end
end
class study1 studyStyle;
class assay1 assayStyle;
class p1,p2,p3 processStyle;
class in1,in2,in3,out1,out2,out3 sampleStyle;
class d1 dataStyle;
class is1a,os2a,is2a,os3a,is3a cold1;

Using this approach, we can trace back the dataset file to its specific biological origin. For example, the sugar content measurement for the sample Cold1_sugar-ext can be traced back through the processes to the original plant sample Cold1. Looking at it from this other perspective (i.e. starting from the data): all metadata enriched in the preceding annotation tables aid in contextualizing the data in sugar_result.csv, such as the protocols used, conditions applied, and sample origins. Hence, this linkage is crucial for understanding the context of the data and ensuring its reliability and reproducibility in scientific research.