Linking inputs and outputs
A key objective of the ARC is to trace each finding or result back to its specific biological experiment. Achieving this requires linking dataset files to their corresponding individual samples. To accomplish this, we follow a sequence of processes with defined inputs and outputs.
Consider the example experiment from the Start Here guide where six Arabidopsis thaliana plants were exposed to cold stress, and the sugar content was measured as a response. The ARC structure for this experiment could look like this:
DirectoryAthalianaColdStressSugar
Directorystudies
DirectoryAthalianaColdStress
Directoryprotocols
- plant-sampling.md
Directoryassays
DirectorySugarContent
Directorydataset
- sugar_result.csv
Directoryprotocols
- sugar_extraction.md
- sugar_measurement.md
- isa.assay.xlsx
- README.md
- …
The ARC contains one study (AthalianaColdStress) and one assay (SugarContent). The study includes a protocol for plant sampling describing how the plants were grown and treated, while the assay contains protocols for sugar extraction and sugar measurement. The dataset file sugar_result.csv holds the measured sugar content data.
Annotation tables describe processes
Section titled “Annotation tables describe processes”The following three annotation tables describe the three consecutive processes:
- Plant Sampling (part of the Study
AthalianaColdStress), - Sugar Extraction (part of the Assay
SugarContent), and - Sugar Measurement (part of the Assay
SugarContent).
Each table starts with an Input column specifying the input entity (sample, material or data) for the respective process, followed by a ProtocolUri column indicating the protocol used, and ends with an Output column specifying the output entity resulting from the process.
The annotation tables (and effectively the studies and assays) are linked by reusing the respective identifiers of the Input and Output entities (samples, materials or dataset files) across the different processes – i.e. the Output of one process becomes the Input of the next process.
In this example we follow one line of highlighted samples through the processes:
Input[Source Name] | ProtocolUri | […] | Output[Sample Name] |
|---|---|---|---|
| Cold1 | ./protocols/plant-sampling.md | … | Cold1_leaf |
| Cold2 | ./protocols/plant-sampling.md | … | Cold2_leaf |
| Cold3 | ./protocols/plant-sampling.md | … | Cold3_leaf |
| RT1 | ./protocols/plant-sampling.md | … | RT1_leaf |
| RT2 | ./protocols/plant-sampling.md | … | RT2_leaf |
| RT3 | ./protocols/plant-sampling.md | … | RT3_leaf |
Input[Sample Name] | ProtocolUri | […] | Output[Sample Name] |
|---|---|---|---|
| Cold1_leaf | ./protocols/sugar_extraction.md | … | Cold1_sugar-ext |
| Cold2_leaf | ./protocols/sugar_extraction.md | … | Cold2_sugar-ext |
| Cold3_leaf | ./protocols/sugar_extraction.md | … | Cold3_sugar-ext |
| RT1_leaf | ./protocols/sugar_extraction.md | … | RT1_sugar-ext |
| RT2_leaf | ./protocols/sugar_extraction.md | … | RT2_sugar-ext |
| RT3_leaf | ./protocols/sugar_extraction.md | … | RT3_sugar-ext |
Input [Sample Name] | ProtocolUri | […] | Output [Data] |
|---|---|---|---|
| Cold1_sugar-ext | ./protocols/sugar_measurement.md | … | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| Cold2_sugar-ext | ./protocols/sugar_measurement.md | … | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| Cold3_sugar-ext | ./protocols/sugar_measurement.md | … | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| RT1_sugar-ext | ./protocols/sugar_measurement.md | … | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| RT2_sugar-ext | ./protocols/sugar_measurement.md | … | ./assays/SugarMeasurement/dataset/sugar_result.csv |
| RT3_sugar-ext | ./protocols/sugar_measurement.md | … | ./assays/SugarMeasurement/dataset/sugar_result.csv |
Linking samples to data
Section titled “Linking samples to data”Following the simple approach of reusing sample and data identifiers in different parts of the ARC, we were able to concisely link the samples through the different lab processes in studies and assays to the data produced from those samples.
The tables above contain all information visualized in the following flowchart to show how the study and assay processes are connected:
Source
flowchart LRlinkStyle default stroke:#2d3e50,stroke-width:2px;classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
subgraph study1[Study:AthalianaColdStress] s1[Plants] ---p1[plant-sampling]--> s2[Leaves]end
subgraph assay1[Assay:SugarContent] s2 ---p2[SugarExtraction]--> s3[Sugar extracts] s3 ---p3[SugarMeasurement]--> d1@{ shape: doc, label: sugar_result.csv}endclass study1 studyStyle;class assay1 assayStyle;class p1,p2,p3 processStyle;class s1,s2,s3 sampleStyle;class d1 dataStyle;Zooming in on the sample level, we can follow the samples through the processes from their biological origin to the data:
Source
%%{init: { "flowchart": { "nodeSpacing": 40, "rankSpacing": 30 }}}%%
flowchart LRlinkStyle default stroke:#2d3e50,stroke-width:2px;classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;classDef cold1 fill:#fff59d,color:#2d3e50,font-weight:bold;
subgraph study1["Study:AthalianaColdStress"]
subgraph p1[Plant Sampling]
subgraph in1[Plants] is1a[Cold1] is1b[Cold2] is1c[Cold3] is1d[RT1] is1e[RT2] is1f[RT3] end
subgraph out1[Leaves] os2a[Cold1_leaf] os2b[Cold2_leaf] os2c[Cold3_leaf] os2d[RT1_leaf] os2e[RT2_leaf] os2f[RT3_leaf] end
is1a --> os2a is1b --> os2b is1c --> os2c is1d --> os2d is1e --> os2e is1f --> os2f
endend
os2a[Cold1_leaf] --> is2a[Cold1_leaf]os2b[Cold2_leaf] --> is2b[Cold2_leaf]os2c[Cold3_leaf] --> is2c[Cold3_leaf]os2d[RT1_leaf] --> is2d[RT1_leaf]os2e[RT2_leaf] --> is2e[RT2_leaf]os2f[RT3_leaf] --> is2f[RT3_leaf]
subgraph assay1[Assay: SugarContent]
subgraph p2[Sugar Extraction]
subgraph in2[Leaves] is2a is2b is2c is2d is2e is2f end
subgraph out2[Sugar extracts] os3a[Cold1_sugar-ext] os3b[Cold2_sugar-ext] os3c[Cold3_sugar-ext] os3d[RT1_sugar-ext] os3e[RT2_sugar-ext] os3f[RT3_sugar-ext] end
is2a --> os3a is2b --> os3b is2c --> os3c is2d --> os3d is2e --> os3e is2f --> os3f
end
os3a --> is3a[Cold1_sugar-ext] os3b --> is3b[Cold2_sugar-ext] os3c --> is3c[Cold3_sugar-ext] os3d --> is3d[RT1_sugar-ext] os3e --> is3e[RT2_sugar-ext] os3f --> is3f[RT3_sugar-ext]
subgraph p3[Sugar Measurement]
subgraph in3[Sugar extracts] is3a is3b is3c is3d is3e is3f end
is3a --> d1@{ shape: doc, label: sugar_result.csv} is3b --> d1 is3c --> d1 is3d --> d1 is3e --> d1 is3f --> d1
end
end
class study1 studyStyle;class assay1 assayStyle;class p1,p2,p3 processStyle;class in1,in2,in3,out1,out2,out3 sampleStyle;class d1 dataStyle;class is1a,os2a,is2a,os3a,is3a cold1;Tracing back the data to its biological origin
Section titled “Tracing back the data to its biological origin”Using this approach, we can trace back the dataset file to its specific biological origin. For example, the sugar content measurement for the sample Cold1_sugar-ext can be traced back through the processes to the original plant sample Cold1. Looking at it from this other perspective (i.e. starting from the data): all metadata enriched in the preceding annotation tables aid in contextualizing the data in sugar_result.csv, such as the protocols used, conditions applied, and sample origins. Hence, this linkage is crucial for understanding the context of the data and ensuring its reliability and reproducibility in scientific research.