Step-by-step Example

Here we take you through a step-by-step CWL example

Example Data

This guide builds on the ARC created in the Start here guide. If you have not yet followed the guide, you can download the ARC (e.g. via ARCitect) from the DataHUB.

This is what the ARC looks like

Directorystudies
- DirectoryAthalianaColdStress
  - README.md
  - isa.study.xlsx
  - Directoryprotocols
    growth_protocol.md
  - resources
Directoryassays
- DirectorySugarMeasurement
  - README.md
  - Directorydataset
    sugar_result.csv
  - isa.assay.xlsx
  - Directoryprotocols
    sugar_extraction_protocol.md
isa.investigation.xlsx
Directoryworkflows
- …
Directoryruns
- …

Here we wrap a minimal example script into a CWL document. We use a python script (heatmap.py), that creates a heatmap file (heatmap.svg) based on an input table containing sugar abundance assay data (sugar_result.csv). Wrappig the script in CWL makes it reusable in another ARC to generate the same type of heatmap based on another input table. For more details, checkout the introduction to CWL.

Isolate run parameters and workflow

We add the following files to the ARC. You can download the files here.

import pandas as pd
import plotly.express as px
import sys

# Read command line arguments
MeasurementTableCSV=sys.argv[1]
FigureFileName=sys.argv[2]

# Read the CSV file
data = pd.read_csv(MeasurementTableCSV, index_col=0, on_bad_lines='skip')

# Create a heatmap
fig = px.imshow(data,
                labels=dict(x="Columns", y="Rows", color="Value"),
                x=data.columns,
                y=data.index)

# Save heatmap to file
fig.write_image(FigureFileName + ".svg")

#!/usr/bin/env cwl-runner

cwlVersion: v1.2
class: CommandLineTool
requirements:
  - class: InitialWorkDirRequirement
    listing:
      - entryname: heatmap.py
        entry:
          $include: heatmap.py
  - class: NetworkAccess
    networkAccess: true
baseCommand: [python3, heatmap.py]
inputs:
  MeasurementTableCSV:
    type: File
    inputBinding:
      position: 1
  FigureFileName:
    type: string
    inputBinding:
      position: 2

outputs:
  heatmapfile:
    type: File
    outputBinding:
      glob: "*.svg"

# Small Python runtime as image
FROM python:3.9-slim

# Install python dependencies
RUN pip install --no-cache-dir pandas==2.2.3 plotly==6.0.1 kaleido==0.2.1

#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: Workflow

inputs:
  MeasurementTableCSV: File
  FigureFileName: string

steps:
  heatmap:
    run: ../../workflows/heatmap/workflow.cwl
    in:
      MeasurementTableCSV: MeasurementTableCSV
      FigureFileName: FigureFileName
    out: [ heatmapfile ]

outputs:
  output:
    type: File
    outputSource: heatmap/heatmapfile

MeasurementTableCSV:
    class: File
    path: ../../assays/SugarMeasurement/dataset/sugar_result.csv
FigureFileName: heatmap

Briefly summarized,

the heatmap.py is the example data analysis script, which creates a heatmap based on a CSV table input
the workflow.cwl is a CWL document that incorporates the heatmap.py
- It requires two inputs
  1. MeasurementTableCSV: the file name of the CSV table
  2. FigureFileName: how the user wants to name the output file
- And it generates one output: an .svg file named according to FigureFileName
the Dockerfile handles the software dependencies
- here: Python and the python packages pandas, plotly, kaleido (including specific versions)
the run.cwl connects the inputs to the workflow steps to be run
- in this example only a single step heatmap/workflow.cwl is being run
the run.yml provides the required input parameters for run.cwl
- the relative path to the CSV table input: sugar_result.csv
- the desired file name: e.g. heatmap

Loading diagram...

Source

flowchart LR

subgraph "workflow.cwl"
  py@{ shape: doc, label: "heatmap.py"}
  dk@{ shape: doc, label: "Dockerfile" }
end

run.yml --o run.cwl
workflow.cwl --o run.cwl
run.cwl --> heatmap.svg

sugar_result.csv -.- run.yml

Using the workflow in your ARC

Open the ARC
Add a folder “heatmap” to workflows
- Import workflow.cwl into workflows/heatmap
- Import heatmap.py into workflows/heatmap
- Import Dockerfile into workflows/heatmap
Add a folder “heatmap-run” to runs
- Import run.cwl into runs/heatmap-run
- Import run.yml into runs/heatmap-run

The ARC should now look like this:

Directoryassays
- DirectorySugarMeasurement
  - Directorydataset
    sugar_result.csv
  - …
isa.investigation.xlsx
Directoryruns
- Directoryheatmap-run
  - run.cwl
  - run.yml
studies
…
Directoryworkflows
- Directoryheatmap
  - Dockerfile
  - heatmap.py
  - workflow.cwl

In the ARC, navigate to the heatmap-run folder:
Terminal window
```
cd runs/heatmap-run
```
Use the cwltool to run the workflow:
Terminal window
```
cwltool run.cwl run.yml
```
This requires that cwltool and Docker are installed on your system.