Option 2 – CWL

Ok, in this case the heatmap was actually produced using a python script. It should be possible to document this in a reusable way.
Create a run and a workflow
Section titled Create a run and a workflowIf your data analysis is code-based, you likely aim to make it reusable and actionable in-place. To achieve this, we recommend to wrap and annotate your workflow using Common Workflow Language (CWL). Although CWL is out of the scope of this starters’ guide, we want to share the basic concept here.
- In the ARC the computational workflow is placed in the
workflows
folder. - The results produced from workflows are stored in
runs
. Every time a workflow is employed, it creates a new run result.
CWL abstract – documents to describe your computational workflows
Section titled CWL abstract – documents to describe your computational workflowsThe script file in workflows
is accompanied with a CWL file, which contains workflow metadata to render it reusable. The specific parameters of a run
’s execution are stored in a separate job file.
Use Common Workflow Language (CWL)
Section titled Use Common Workflow Language (CWL)Here we wrap a minimal example script into a CWL document. We use a python script (heatmap.py
), that creates a heatmap file (heatmap.svg
) based on an input table containing sugar abundance assay data (sugar_result.csv
).
Wrappig the script in CWL makes it reusable in another ARC to generate the same type of heatmap based on another input table. For more details, checkout the introduction to CWL.
Isolate run parameters and workflow
Section titled Isolate run parameters and workflowWe add the following files to the ARC. You can download the files here.
import pandas as pdimport plotly.express as pximport sys
# Read command line argumentsMeasurementTableCSV=sys.argv[1]FigureFileName=sys.argv[2]
# Read the CSV filedata = pd.read_csv(MeasurementTableCSV, index_col=0, on_bad_lines='skip')
# Create a heatmapfig = px.imshow(data, labels=dict(x="Columns", y="Rows", color="Value"), x=data.columns, y=data.index)
# Save heatmap to filefig.write_image(FigureFileName + ".svg")
#!/usr/bin/env cwl-runner
cwlVersion: v1.2class: CommandLineToolrequirements: - class: InitialWorkDirRequirement listing: - entryname: heatmap.py entry: $include: heatmap.py - class: NetworkAccess networkAccess: truebaseCommand: [python3, heatmap.py]inputs: MeasurementTableCSV: type: File inputBinding: position: 1 FigureFileName: type: string inputBinding: position: 2
outputs: heatmapfile: type: File outputBinding: glob: "*.svg"
# Small Python runtime as imageFROM python:3.9-slim
# Install python dependenciesRUN pip install --no-cache-dir pandas==2.2.3 plotly==6.0.1 kaleido==0.2.1
#!/usr/bin/env cwl-runnercwlVersion: v1.2class: Workflow
inputs: MeasurementTableCSV: File FigureFileName: string
steps: heatmap: run: ../../workflows/heatmap/workflow.cwl in: MeasurementTableCSV: MeasurementTableCSV FigureFileName: FigureFileName out: [ heatmapfile ]
outputs: output: type: File outputSource: heatmap/heatmapfile
MeasurementTableCSV: class: File path: ../../assays/SugarMeasurement/dataset/sugar_result.csvFigureFileName: heatmap
Briefly summarized,
- the
heatmap.py
is the example data analysis script, which creates a heatmap based on a CSV table input - the
workflow.cwl
is a CWL document that incorporates theheatmap.py
- It requires two
inputs
MeasurementTableCSV
: the file name of the CSV tableFigureFileName
: how the user wants to name the output file
- And it generates one
output
: an.svg
file named according toFigureFileName
- It requires two
- the
Dockerfile
handles the software dependencies- here: Python and the python packages
pandas
,plotly
,kaleido
(including specific versions)
- here: Python and the python packages
- the
run.cwl
connects the inputs to the workflow steps to be run- in this example only a single step
heatmap/workflow.cwl
is being run
- in this example only a single step
- the
run.yml
provides the requiredinput
parameters forrun.cwl
- the relative path to the CSV table input:
sugar_result.csv
- the desired file name: e.g.
heatmap
- the relative path to the CSV table input:
Source
flowchart LR
subgraph "workflow.cwl" py@{ shape: doc, label: "heatmap.py"} dk@{ shape: doc, label: "Dockerfile" }end
run.yml --o run.cwlworkflow.cwl --o run.cwlrun.cwl --> heatmap.svg
sugar_result.csv -.- run.yml
Using the workflow in your ARC
Section titled Using the workflow in your ARC- Open the ARC
- Add a folder “heatmap” to
workflows
- Import
workflow.cwl
intoworkflows/heatmap
- Import
heatmap.py
intoworkflows/heatmap
- Import
Dockerfile
intoworkflows/heatmap
- Import
- Add a folder “heatmap-run” to
runs
- Import
run.cwl
intoruns/heatmap-run
- Import
run.yml
intoruns/heatmap-run
- Import
The ARC should now look like this:
Directoryassays
DirectorySugarMeasurement
Directorydataset
- sugar_result.csv
- …
- isa.investigation.xlsx
Directoryruns
Directoryheatmap-run
- run.cwl
- run.yml
- studies
- …
Directoryworkflows
Directoryheatmap
- Dockerfile
- heatmap.py
- workflow.cwl
-
In the ARC, navigate to the
heatmap-run
folder:Terminal window cd runs/heatmap-run -
Use the
cwltool
to run the workflow:Terminal window cwltool run.cwl run.yml