Skip to content

Option 2 – CWL

Tailwind CSS chat bubble component

Ok, in this case the heatmap was actually produced using a python script. It should be possible to document this in a reusable way.

If your data analysis is code-based, you likely aim to make it reusable and actionable in-place. To achieve this, we recommend to wrap and annotate your workflow using Common Workflow Language (CWL). Although CWL is out of the scope of this starters’ guide, we want to share the basic concept here.

  • In the ARC the computational workflow is placed in the workflows folder.
  • The results produced from workflows are stored in runs. Every time a workflow is employed, it creates a new run result.

CWL abstract – documents to describe your computational workflows

Section titled CWL abstract – documents to describe your computational workflows

The script file in workflows is accompanied with a CWL file, which contains workflow metadata to render it reusable. The specific parameters of a run’s execution are stored in a separate job file.

Use Common Workflow Language (CWL)

Section titled Use Common Workflow Language (CWL)

Here we wrap a minimal example script into a CWL document. We use a python script (heatmap.py), that creates a heatmap file (heatmap.svg) based on an input table containing sugar abundance assay data (sugar_result.csv). Wrappig the script in CWL makes it reusable in another ARC to generate the same type of heatmap based on another input table. For more details, checkout the introduction to CWL.

Isolate run parameters and workflow

Section titled Isolate run parameters and workflow

We add the following files to the ARC. You can download the files here.

import pandas as pd
import plotly.express as px
import sys
# Read command line arguments
MeasurementTableCSV=sys.argv[1]
FigureFileName=sys.argv[2]
# Read the CSV file
data = pd.read_csv(MeasurementTableCSV, index_col=0, on_bad_lines='skip')
# Create a heatmap
fig = px.imshow(data,
labels=dict(x="Columns", y="Rows", color="Value"),
x=data.columns,
y=data.index)
# Save heatmap to file
fig.write_image(FigureFileName + ".svg")

Briefly summarized,

  • the heatmap.py is the example data analysis script, which creates a heatmap based on a CSV table input
  • the workflow.cwl is a CWL document that incorporates the heatmap.py
    • It requires two inputs
      1. MeasurementTableCSV: the file name of the CSV table
      2. FigureFileName: how the user wants to name the output file
    • And it generates one output: an .svg file named according to FigureFileName
  • the Dockerfile handles the software dependencies
    • here: Python and the python packages pandas, plotly, kaleido (including specific versions)
  • the run.cwl connects the inputs to the workflow steps to be run
    • in this example only a single step heatmap/workflow.cwl is being run
  • the run.yml provides the required input parameters for run.cwl
    • the relative path to the CSV table input: sugar_result.csv
    • the desired file name: e.g. heatmap
Loading diagram...
Source
flowchart LR
subgraph "workflow.cwl"
py@{ shape: doc, label: "heatmap.py"}
dk@{ shape: doc, label: "Dockerfile" }
end
run.yml --o run.cwl
workflow.cwl --o run.cwl
run.cwl --> heatmap.svg
sugar_result.csv -.- run.yml

Using the workflow in your ARC

Section titled Using the workflow in your ARC
  1. Open the ARC
  2. Add a folder “heatmap” to workflows
    • Import workflow.cwl into workflows/heatmap
    • Import heatmap.py into workflows/heatmap
    • Import Dockerfile into workflows/heatmap
  3. Add a folder “heatmap-run” to runs
    • Import run.cwl into runs/heatmap-run
    • Import run.yml into runs/heatmap-run

The ARC should now look like this:

  • Directoryassays
    • DirectorySugarMeasurement
      • Directorydataset
        • sugar_result.csv
  • isa.investigation.xlsx
  • Directoryruns
    • Directoryheatmap-run
      • run.cwl
      • run.yml
  • studies
  • Directoryworkflows
    • Directoryheatmap
      • Dockerfile
      • heatmap.py
      • workflow.cwl
  1. In the ARC, navigate to the heatmap-run folder:

    Terminal window
    cd runs/heatmap-run
  2. Use the cwltool to run the workflow:

    Terminal window
    cwltool run.cwl run.yml