Linking inputs and outputs

A key objective of the ARC is to trace each finding or result back to its specific biological experiment. Achieving this requires linking dataset files to their corresponding individual samples. To accomplish this, we follow a sequence of processes with defined inputs and outputs.

Example

Consider the example experiment from the Start Here guide where six Arabidopsis thaliana plants were exposed to cold stress, and the sugar content was measured as a response. The ARC structure for this experiment could look like this:

DirectoryAthalianaColdStressSugar
- Directorystudies
  - DirectoryAthalianaColdStress
    Directoryprotocols
    plant-sampling.md
- Directoryassays
  - DirectorySugarContent
    Directorydataset
    sugar_result.csv
    Directoryprotocols
    sugar_extraction.md
    sugar_measurement.md
    isa.assay.xlsx
    README.md
- …

The ARC contains one study (AthalianaColdStress) and one assay (SugarContent). The study includes a protocol for plant sampling describing how the plants were grown and treated, while the assay contains protocols for sugar extraction and sugar measurement. The dataset file sugar_result.csv holds the measured sugar content data.

Annotation tables describe processes

The following three annotation tables describe the three consecutive processes:

Plant Sampling (part of the Study AthalianaColdStress),
Sugar Extraction (part of the Assay SugarContent), and
Sugar Measurement (part of the Assay SugarContent).

Each table starts with an Input column specifying the input entity (sample, material or data) for the respective process, followed by a ProtocolUri column indicating the protocol used, and ends with an Output column specifying the output entity resulting from the process.

The annotation tables (and effectively the studies and assays) are linked by reusing the respective identifiers of the Input and Output entities (samples, materials or dataset files) across the different processes – i.e. the Output of one process becomes the Input of the next process.

In this example we follow one line of highlighted samples through the processes:

`Input`[Source Name]	`ProtocolUri`	[…]	`Output`[Sample Name]
Cold1	./protocols/plant-sampling.md	…	Cold1_leaf
Cold2	./protocols/plant-sampling.md	…	Cold2_leaf
Cold3	./protocols/plant-sampling.md	…	Cold3_leaf
RT1	./protocols/plant-sampling.md	…	RT1_leaf
RT2	./protocols/plant-sampling.md	…	RT2_leaf
RT3	./protocols/plant-sampling.md	…	RT3_leaf

`Input`[Sample Name]	`ProtocolUri`	[…]	`Output`[Sample Name]
Cold1_leaf	./protocols/sugar_extraction.md	…	Cold1_sugar-ext
Cold2_leaf	./protocols/sugar_extraction.md	…	Cold2_sugar-ext
Cold3_leaf	./protocols/sugar_extraction.md	…	Cold3_sugar-ext
RT1_leaf	./protocols/sugar_extraction.md	…	RT1_sugar-ext
RT2_leaf	./protocols/sugar_extraction.md	…	RT2_sugar-ext
RT3_leaf	./protocols/sugar_extraction.md	…	RT3_sugar-ext

`Input` [Sample Name]	`ProtocolUri`	[…]	`Output` [Data]
Cold1_sugar-ext	./protocols/sugar_measurement.md	…	./assays/SugarMeasurement/dataset/sugar_result.csv
Cold2_sugar-ext	./protocols/sugar_measurement.md	…	./assays/SugarMeasurement/dataset/sugar_result.csv
Cold3_sugar-ext	./protocols/sugar_measurement.md	…	./assays/SugarMeasurement/dataset/sugar_result.csv
RT1_sugar-ext	./protocols/sugar_measurement.md	…	./assays/SugarMeasurement/dataset/sugar_result.csv
RT2_sugar-ext	./protocols/sugar_measurement.md	…	./assays/SugarMeasurement/dataset/sugar_result.csv
RT3_sugar-ext	./protocols/sugar_measurement.md	…	./assays/SugarMeasurement/dataset/sugar_result.csv

Linking samples to data

Following the simple approach of reusing sample and data identifiers in different parts of the ARC, we were able to concisely link the samples through the different lab processes in studies and assays to the data produced from those samples.

The tables above contain all information visualized in the following flowchart to show how the study and assay processes are connected:

Loading diagram...

Source

1
flowchart LR
2
linkStyle default stroke:#2d3e50,stroke-width:2px;
3
classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
4
classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
5
classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
6
classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
7
classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
8

9
subgraph study1[Study:AthalianaColdStress]
10
    s1[Plants] ---p1[plant-sampling]--> s2[Leaves]
11
end
12

13
subgraph assay1[Assay:SugarContent]
14
    s2 ---p2[SugarExtraction]--> s3[Sugar extracts]
15
    s3 ---p3[SugarMeasurement]--> d1@{ shape: doc, label: sugar_result.csv}
16
end
17
class study1 studyStyle;
18
class assay1 assayStyle;
19
class p1,p2,p3 processStyle;
20
class s1,s2,s3 sampleStyle;
21
class d1 dataStyle;

Zooming in on the sample level, we can follow the samples through the processes from their biological origin to the data:

Loading diagram...

Source

1
%%{init: {
2
  "flowchart": {
3
    "nodeSpacing": 40,
4
    "rankSpacing": 30
5
  }
6
}}%%
7

8

9
flowchart LR
10
linkStyle default stroke:#2d3e50,stroke-width:2px;
11
classDef studyStyle fill:#dae7c1,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
12
classDef assayStyle fill:#ffe080,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:bold;
13
classDef processStyle fill:#E08F9C,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
14
classDef sampleStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
15
classDef dataStyle fill:#FEFEFE,rx:.4em,ry:.4em,color:#2d3e50,stroke:#2d3e50,font-weight:normal;
16
classDef cold1 fill:#fff59d,color:#2d3e50,font-weight:bold;
17

18
subgraph study1["Study:AthalianaColdStress"]
19

20
  subgraph p1[Plant Sampling]
21

22
    subgraph in1[Plants]
23
        is1a[Cold1]
24
        is1b[Cold2]
25
        is1c[Cold3]
26
        is1d[RT1]
27
        is1e[RT2]
28
        is1f[RT3]
29
    end
30

31
    subgraph out1[Leaves]
32
      os2a[Cold1_leaf]
33
      os2b[Cold2_leaf]
34
      os2c[Cold3_leaf]
35
      os2d[RT1_leaf]
36
      os2e[RT2_leaf]
37
      os2f[RT3_leaf]
38
    end
39

40
    is1a --> os2a
41
    is1b --> os2b
42
    is1c --> os2c
43
    is1d --> os2d
44
    is1e --> os2e
45
    is1f --> os2f
46

47
    end
48
end
49

50
os2a[Cold1_leaf]  --> is2a[Cold1_leaf]
51
os2b[Cold2_leaf]  --> is2b[Cold2_leaf]
52
os2c[Cold3_leaf]  --> is2c[Cold3_leaf]
53
os2d[RT1_leaf]    --> is2d[RT1_leaf]
54
os2e[RT2_leaf]    --> is2e[RT2_leaf]
55
os2f[RT3_leaf]    --> is2f[RT3_leaf]
56

57
subgraph assay1[Assay: SugarContent]
58

59
  subgraph p2[Sugar Extraction]
60

61
    subgraph in2[Leaves]
62
      is2a
63
      is2b
64
      is2c
65
      is2d
66
      is2e
67
      is2f
68
    end
69

70
    subgraph out2[Sugar extracts]
71
      os3a[Cold1_sugar-ext]
72
      os3b[Cold2_sugar-ext]
73
      os3c[Cold3_sugar-ext]
74
      os3d[RT1_sugar-ext]
75
      os3e[RT2_sugar-ext]
76
      os3f[RT3_sugar-ext]
77
    end
78

79
    is2a --> os3a
80
    is2b --> os3b
81
    is2c --> os3c
82
    is2d --> os3d
83
    is2e --> os3e
84
    is2f --> os3f
85

86
  end
87

88
    os3a --> is3a[Cold1_sugar-ext]
89
    os3b --> is3b[Cold2_sugar-ext]
90
    os3c --> is3c[Cold3_sugar-ext]
91
    os3d --> is3d[RT1_sugar-ext]
92
    os3e --> is3e[RT2_sugar-ext]
93
    os3f --> is3f[RT3_sugar-ext]
94

95
  subgraph p3[Sugar Measurement]
96

97
    subgraph in3[Sugar extracts]
98
      is3a
99
      is3b
100
      is3c
101
      is3d
102
      is3e
103
      is3f
104
    end
105

106
    is3a --> d1@{ shape: doc, label: sugar_result.csv}
107
    is3b --> d1
108
    is3c --> d1
109
    is3d --> d1
110
    is3e --> d1
111
    is3f --> d1
112

113
  end
114

115
end
116

117
class study1 studyStyle;
118
class assay1 assayStyle;
119
class p1,p2,p3 processStyle;
120
class in1,in2,in3,out1,out2,out3 sampleStyle;
121
class d1 dataStyle;
122
class is1a,os2a,is2a,os3a,is3a cold1;

Tracing back the data to its biological origin

Using this approach, we can trace back the dataset file to its specific biological origin. For example, the sugar content measurement for the sample Cold1_sugar-ext can be traced back through the processes to the original plant sample Cold1. Looking at it from this other perspective (i.e. starting from the data): all metadata enriched in the preceding annotation tables aid in contextualizing the data in sugar_result.csv, such as the protocols used, conditions applied, and sample origins. Hence, this linkage is crucial for understanding the context of the data and ensuring its reliability and reproducibility in scientific research.