CWL Metadata
CWL Metadata
Section titled CWL MetadataMetadata plays a crucial role in enhancing the comprehensibility of CWL files. By embedding additional information about the performer and the process within the metadata, researchers can create a more comprehensive and informative description of their workflows.
-
Performer Metadata:
In scientific research, it’s essential to know who is responsible for a particular workflow. By including performer metadata in CWL files, researchers can specify the individuals or teams behind the development and execution of the workflow.
-
Process Metadata:
Annotating the CWL file with metadata related to the underlying processes adds another layer of information, making it easier to understand and reproduce the workflow. Process metadata may include:
- Description of Processes: Detailed explanations of the steps involved in the workflow, providing context for each stage or the individual process.
- Input and Output Descriptions: Clarifications about the expected inputs and outputs of each step, aiding users in understanding the data flow.
- Description of parameters contained in the YAML jobfile
Annotating a CWL or job file
Section titled Annotating a CWL or job fileCWL or job files can be annotated using ontology terms in the yaml format. They support the use of namespaces according to the schema salad specification. An example for the annotation with authorship metadata can be found here. The metadata concerning the executed run should be separated in the CWL and job file, depending on what the metadata describes. If an input for a tool, that is specified in the job file, is described, the metadata should be placed in the job file. If the metadata describes the tool itself, it should be placed in the CWL file.
In the case of a self contained tool, the corresponding metadata section could look like this and would be located in the cwl file:
This metadata section provides information about the technology platform and the person executing the workflow. It also provides information about the tool input and output files, as well as the operations that are applied to the data. In this case, everything is encoded in the executed script and there are no variable inputs. Therefore, all metadata is written in the CWL file. An example for this can be found here.
Frequently though, tools have input parameters, that alter the tools execution or input and output files. In this case, the metadata has to be written in the right location. For a tool with varying inputs and specifiable output location, this could look as the following for the CWL file:
And this for the job file:
Examples for this can be found here for the cwl file and here for the job file.
An application example including metadata can be found here. It contains a CWL file with the ARC mounted and a fixed script. The CWL file has two mandatory and one optional parameter. There is one job file for the execution without the optional parameter and one job file for the execution with the optional parameter. The metadata between the two job files differs by the metadata concerning the optional parameter.