CWL Fragpipe

FragPipe

FragPipe is a software suite for the analysis of mass spectrometry-based proteomics data. It integrates multiple state-of-the-art software tools and algorithms into a unified, user-friendly graphical interface, allowing researchers to process raw MS data efficiently.

Setting up the FragPipe ARC-CWL workflow

The FragPipe ARC-CWL workflow requires several files and directories to be set up correctly. The following sections will guide you through the setup process.

ARC

You need an ARC to run this workflow, since necessary information about the experiment is retrieved from the assay and study files by the workflow. How to set up an ARC is described in this Getting Started Guide.

Special requirements for the ARC:

Assay and study files containing columns describing the data:
- Experiment (e.g., hot/cold)
- Replicate (e.g., 1, 2, 3, …)
- Acquisition mode (DDA or DIA)
One assay sheet named “MassSpectrometry,” which is linked to previous assays and studies containing the aforementioned columns through inputs and outputs (standard assay/study setup, this is described in the Getting Started Guide):
- The output of this sheet must be of type Data and contain the names of your mass spectrometry files or folders (just the name, not the path, e.g., MyRun.d).
One folder in the workflows directory named “Fragpipe.”

Dockerfile

The current official Docker image is missing curl. This can be circumvented by using your own Dockerfile based on the official one and adding the curl installation command to the file.

Download the Dockerfile here by clicking on the download button in the upper right corner.
Place the Dockerfile in the Fragpipe folder within the workflows directory.

ARC Parameter Collection Script

This workflow uses a script to collect the required parameters for the FragPipe ARC-CWL workflow.

Download the script here.
Place the script in the scripts folder within the Fragpipe directory in the workflows directory.

This script will create the required manifest file and adapt the workflow file for the run.

CommandLineTool and Workflow Descriptions

Download the FragPipe CommandLineTool.
Download the ManifestAndWorkflow CommandLineTool.
Download the Workflow.
Place all three files in the Fragpipe directory within the workflows directory.

The workflow executes both CommandLineTool descriptions in the correct order.

FragPipe Tools

The FragPipe tools must be downloaded manually due to licensing restrictions. The easiest way is to download all required tools through the FragPipe GUI and mount them into the Docker container. For this, the tools must be placed in the tools directory within the workflow directory.

Downloading the Tools with the FragPipe GUI

Download the FragPipe GUI from the FragPipe releases and unzip it.
Navigate to the bin folder and run the fragpipe.exe.
In the Config tab, click on the Download / Update button and follow the steps.
Copy the tools directory into your workflow directory.

Directory structure

After obtaining all required files, your workflows directory structure should look like this:

Directoryworkflows
- DirectoryFragpipe
  - Dockerfile
  - Fragpipe.cwl
  - ManifestAndWorkflow.cwl
  - Workflow.cwl
  - tools
  - Directoryscripts
    manifestAndWorkflow.fsx

Setting up the run

The run requires a run.cwl and a run.yml file. The run.cwl file content could look like this:

cwlVersion: v1.2
class: Workflow

requirements:
  - class: MultipleInputFeatureRequirement
  - class: SubworkflowFeatureRequirement
# you can specify resources here:
#  - class: ResourceRequirement
#    coresMin: 20
#    ramMin: 40960
inputs:
  arcDirectory: Directory
  runName: string
  assayName: string
  experimentColumn: string
  replicateColumn: string
  acquisitionColumn: string
  fastAPath: string
  ddaOnly: string
  headless: boolean
  workdir: string
  toolsFolder: string
  threads: int

steps:
  FragpipeAll:
    run: ../../workflows/Fragpipe/workflow.cwl
    in:
      arcDirectory: arcDirectory
      runName: runName
      assayName: assayName
      experimentColumn: experimentColumn
      replicateColumn: replicateColumn
      acquisitionColumn: acquisitionColumn
      fastAPath: fastAPath
      ddaOnly: ddaOnly
      headless: headless
      workdir: workdir
      toolsFolder: toolsFolder
      threads: threads
    out: [result]

outputs:
  FragpipeAllResult:
    type: Directory
    outputSource: FragpipeAll/result

# The following is metadata for the workflow

arc:author:
  - class: arc:Person
    arc:first name: "Caroline"
    arc:last name: "Ott"
    arc:email: "caroline.ott@rptu.de"
    arc:affiliation: "RPTU Kaiserslautern/Landau"

The run.yml file contains all the necessary information for your run. The run.yml file content could look like this:

An example run.yml file could look like this:

arcDirectory:
  class: Directory
  path: ../../
runName: MyRun
assayName: MyAssay
experimentColumn: "Condition, Parameter"
replicateColumn: "Replicate, Characteristic"
acquisitionColumn: "scan mode, Parameter"
fastAPath: ./arc/studies/MyOrganism/resources/ProteinsFastaWithDecoys.fasta
ddaOnly: "TRUE"
headless: true
workdir: ./arc/runs/MyRun
toolsFolder: ./arc/workflows/Fragpipe/tools
threads: 20

# The following is metadata for the workflow

arc:performer:
  - class: arc:Person
    arc:first name: "Your"
    arc:last name: "Name"
    arc:email: "your@email"
    arc:affiliation: "Your institution"

Copy the content for the run.cwl and run.yml blocks into a text editor (e.g., Notepad).
Save them as run.cwl and run.yml in the MyRun (example name) folder within the runs directory.
Update the paths and names in the run.yml file to match your ARC:
- arcDirectory doesn’t change.
- Paths always start with ./arc, since the ARC is mounted into the Docker container with that name.
- The runName is the name you want to give your run.
  - The workdir name is usually the same as the runName.
- The assayName is the name of the assay you want to analyze.
Update the experimentColumn, replicateColumn, and acquisitionColumn values to match your assay or study files:
- The first part is the column name, the second part is the column type (Characteristic, Parameter or Factor).
- These three columns must exist in your assay or study files.
- These columns are the ones you specified here.
ddaOnly can be set to “TRUE” or “FALSE”:
- If you are analyzing DIA files, set it to “FALSE”.
The number of threads can be set to the number of files you want to analyze in parallel:
- You also need to specify the number of threads in the run.cwl file under ResourceRequirement.
  - Remove the # in front of the lines and set the number of cores and RAM you want to use.

FragPipe Workflow

FragPipe requires a fragpipe.workflow file specifying the workflow settings. You can use a predesigned workflow from the FragPipe repository or create your own (this is easier with the GUI).

Using the GUI

Open the FragPipe GUI and navigate to the Workflow tab.
Select a workflow and press the Load workflow button.
Follow the instructions on the FragPipe website for your use case, or keep the preset workflow you selected.
Click on the Save to custom folder button in the Workflow tab and save it to the runs directory in the folder named MyRun (example name) under the name fragpipe.workflow.
If you have a protein FASTA withouth decoys, you can add them in the Database tab:
- Click on the Browse button and select your FASTA file.
- Click on the Add decoys button to add decoys.
- The FASTA file with decoys will be saved in the same folder as the original FASTA file.

Directory structure

After you obtained those files, the folder structure should look like this:

Directoryruns
- DirectoryMyRun
  - run.cwl
  - run.yml
  - fragpipe.workflow
Directoryworkflows
- DirectoryFragpipe
  - Dockerfile
  - Fragpipe.cwl
  - ManifestAndWorkflow.cwl
  - Workflow.cwl
  - tools
  - Directoryscripts
    manifestAndWorkflow.fsx

Running the Workflow

A CWL runner is required to run the workflow.

Install the CWL reference runner as described here: CWL Runner Installation.
Open the command-line tool.
Activate the environment in which you installed the CWL runner:
- Instructions for this can be found here.
Navigate to the runs folder:
- Use the cd command to navigate there:
  Terminal window
```
cd path/to/your/runs
```
Run the workflow with the following command:
Terminal window
```
cwltool ./MyRun/run.cwl ./MyRun/run.yml
```