Skip to content

CWL Fragpipe

FragPipe is a software suite for the analysis of mass spectrometry-based proteomics data. It integrates multiple state-of-the-art software tools and algorithms into a unified, user-friendly graphical interface, allowing researchers to process raw MS data efficiently.

Setting up the FragPipe ARC-CWL workflow

Section titled Setting up the FragPipe ARC-CWL workflow

The FragPipe ARC-CWL workflow requires several files and directories to be set up correctly. The following sections will guide you through the setup process.

You need an ARC to run this workflow, since necessary information about the experiment is retrieved from the assay and study files by the workflow. How to set up an ARC is described in this Getting Started Guide.

Special requirements for the ARC:

  • Assay and study files containing columns describing the data:
    • Experiment (e.g., hot/cold)
    • Replicate (e.g., 1, 2, 3, …)
    • Acquisition mode (DDA or DIA)
  • One assay sheet named “MassSpectrometry,” which is linked to previous assays and studies containing the aforementioned columns through inputs and outputs (standard assay/study setup, this is described in the Getting Started Guide):
    • The output of this sheet must be of type Data and contain the names of your mass spectrometry files or folders (just the name, not the path, e.g., MyRun.d).
  • One folder in the workflows directory named “Fragpipe.”

The current official Docker image is missing curl. This can be circumvented by using your own Dockerfile based on the official one and adding the curl installation command to the file.

  • Download the Dockerfile here by clicking on the download button in the upper right corner.
  • Place the Dockerfile in the Fragpipe folder within the workflows directory.

ARC Parameter Collection Script

Section titled ARC Parameter Collection Script

This workflow uses a script to collect the required parameters for the FragPipe ARC-CWL workflow.

  • Download the script here.
  • Place the script in the scripts folder within the Fragpipe directory in the workflows directory.

This script will create the required manifest file and adapt the workflow file for the run.

CommandLineTool and Workflow Descriptions

Section titled CommandLineTool and Workflow Descriptions

The workflow executes both CommandLineTool descriptions in the correct order.

The FragPipe tools must be downloaded manually due to licensing restrictions. The easiest way is to download all required tools through the FragPipe GUI and mount them into the Docker container. For this, the tools must be placed in the tools directory within the workflow directory.

Downloading the Tools with the FragPipe GUI

Section titled Downloading the Tools with the FragPipe GUI
  • Download the FragPipe GUI from the FragPipe releases and unzip it.
  • Navigate to the bin folder and run the fragpipe.exe.
  • In the Config tab, click on the Download / Update button and follow the steps.
  • Copy the tools directory into your workflow directory.

After obtaining all required files, your workflows directory structure should look like this:

  • Directoryworkflows
    • DirectoryFragpipe
      • Dockerfile
      • Fragpipe.cwl
      • ManifestAndWorkflow.cwl
      • Workflow.cwl
      • tools
      • Directoryscripts
        • manifestAndWorkflow.fsx

The run requires a run.cwl and a run.yml file. The run.cwl file content could look like this:

cwlVersion: v1.2
class: Workflow
requirements:
- class: MultipleInputFeatureRequirement
- class: SubworkflowFeatureRequirement
# you can specify resources here:
# - class: ResourceRequirement
# coresMin: 20
# ramMin: 40960
inputs:
arcDirectory: Directory
runName: string
assayName: string
experimentColumn: string
replicateColumn: string
acquisitionColumn: string
fastAPath: string
ddaOnly: string
headless: boolean
workdir: string
toolsFolder: string
threads: int
steps:
FragpipeAll:
run: ../../workflows/Fragpipe/workflow.cwl
in:
arcDirectory: arcDirectory
runName: runName
assayName: assayName
experimentColumn: experimentColumn
replicateColumn: replicateColumn
acquisitionColumn: acquisitionColumn
fastAPath: fastAPath
ddaOnly: ddaOnly
headless: headless
workdir: workdir
toolsFolder: toolsFolder
threads: threads
out: [result]
outputs:
FragpipeAllResult:
type: Directory
outputSource: FragpipeAll/result
# The following is metadata for the workflow
arc:author:
- class: arc:Person
arc:first name: "Caroline"
arc:last name: "Ott"
arc:email: "caroline.ott@rptu.de"
arc:affiliation: "RPTU Kaiserslautern/Landau"

The run.yml file contains all the necessary information for your run. The run.yml file content could look like this:

An example run.yml file could look like this:

arcDirectory:
class: Directory
path: ../../
runName: MyRun
assayName: MyAssay
experimentColumn: "Condition, Parameter"
replicateColumn: "Replicate, Characteristic"
acquisitionColumn: "scan mode, Parameter"
fastAPath: ./arc/studies/MyOrganism/resources/ProteinsFastaWithDecoys.fasta
ddaOnly: "TRUE"
headless: true
workdir: ./arc/runs/MyRun
toolsFolder: ./arc/workflows/Fragpipe/tools
threads: 20
# The following is metadata for the workflow
arc:performer:
- class: arc:Person
arc:first name: "Your"
arc:last name: "Name"
arc:email: "your@email"
arc:affiliation: "Your institution"
  • Copy the content for the run.cwl and run.yml blocks into a text editor (e.g., Notepad).
  • Save them as run.cwl and run.yml in the MyRun (example name) folder within the runs directory.
  • Update the paths and names in the run.yml file to match your ARC:
    • arcDirectory doesn’t change.
    • Paths always start with ./arc, since the ARC is mounted into the Docker container with that name.
    • The runName is the name you want to give your run.
      • The workdir name is usually the same as the runName.
    • The assayName is the name of the assay you want to analyze.
  • Update the experimentColumn, replicateColumn, and acquisitionColumn values to match your assay or study files:
    • The first part is the column name, the second part is the column type (Characteristic, Parameter or Factor).
    • These three columns must exist in your assay or study files.
    • These columns are the ones you specified here.
  • ddaOnly can be set to “TRUE” or “FALSE”:
    • If you are analyzing DIA files, set it to “FALSE”.
  • The number of threads can be set to the number of files you want to analyze in parallel:
    • You also need to specify the number of threads in the run.cwl file under ResourceRequirement.
      • Remove the # in front of the lines and set the number of cores and RAM you want to use.

FragPipe requires a fragpipe.workflow file specifying the workflow settings. You can use a predesigned workflow from the FragPipe repository or create your own (this is easier with the GUI).

  • Open the FragPipe GUI and navigate to the Workflow tab.
  • Select a workflow and press the Load workflow button.
  • Follow the instructions on the FragPipe website for your use case, or keep the preset workflow you selected.
  • Click on the Save to custom folder button in the Workflow tab and save it to the runs directory in the folder named MyRun (example name) under the name fragpipe.workflow.
  • If you have a protein FASTA withouth decoys, you can add them in the Database tab:
    • Click on the Browse button and select your FASTA file.
    • Click on the Add decoys button to add decoys.
    • The FASTA file with decoys will be saved in the same folder as the original FASTA file.

After you obtained those files, the folder structure should look like this:

  • Directoryruns
    • DirectoryMyRun
      • run.cwl
      • run.yml
      • fragpipe.workflow
  • Directoryworkflows
    • DirectoryFragpipe
      • Dockerfile
      • Fragpipe.cwl
      • ManifestAndWorkflow.cwl
      • Workflow.cwl
      • tools
      • Directoryscripts
        • manifestAndWorkflow.fsx

A CWL runner is required to run the workflow.

  • Install the CWL reference runner as described here: CWL Runner Installation.
  • Open the command-line tool.
  • Activate the environment in which you installed the CWL runner:
    • Instructions for this can be found here.
  • Navigate to the runs folder:
    • Use the cd command to navigate there:
      Terminal window
      cd path/to/your/runs
  • Run the workflow with the following command:
    Terminal window
    cwltool ./MyRun/run.cwl ./MyRun/run.yml