Skip to content

Authoring ARC Validation Packages

In general, any script or tool can be used to validate aspects of an ARC. However, to be compatible with the Continuous Quality Control (CQC) pipelines on the DataHUB, validation packages must adhere to certain guidelines and conventions, both in terms of format and generated output.

In CQC pipelines, validation packages are pulled from the ARC Validation Package Registry (AVPR) and executed in isolated environments via the arc-validate CLI tool. Validation packages must adhere to the structure (e.g. programming language, metadata) and output requirements outlined below to be both publishable to AVPR and executable by arc-validate.

See also the relevant sections in the ARC specification

Supported validation package file formats

Section titled Supported validation package file formats

Validation packages must be self-contained, single-file scripts.

The following programming languages can currently be used to create validation packages:

  • F# (.fsx)
    • software package management in F# scripts MUST use #r nuget ... directives to reference any external dependencies.
    • F# scripts are executed via dotnet fsi.
  • Python (.py)
    • software package management in Python scripts MUST use uv inline script dependencies to reference any external dependencies. Ideally also specify the python language version.
    • Python scripts are executed via uv run.

Package metadata are used to display information about the validation package in the AVPR, keep track of versioning, release notes, etc.

The metadata MUST be the first thing occurring in the validation package file and can be either formatted as a standalone string or bound to a variable. It is formatted as YAML frontmatter, enclosed by triple-dashed lines (---).

In F# validation packages, the frontmatter MUST be enclosed in a multi-line comment ((* ... *)).

(*
---
<yaml frontmatter here>
---
*)

You can bind YAML frontmatter as a string inside your package. This is recommended because you can now re-use the metadata in your package code.

Binding must be placed at the start of the file to the name PACKAGE_METADATA with a [<Literal>] attribute exactly like this:

let [<Literal>] PACKAGE_METADATA = """(*
---
<yaml frontmatter here>
---
*)"""

Re-use in the package code:

// The F# library for writing ARC validation packages, adjust version!
#r "nuget: ARCExpect, 5.0.1"
Setup.ValidationPackage(
metadata = Setup.Metadata(PACKAGE_METADATA),
CriticalValidationCases = [...]
)
|> Execute.ValidationPipeline(
basePath = ...
)
FieldTypeDescription
Namestringthe name of the package
MajorVersionintthe major version of the package
MinorVersionintthe minor version of the package
PatchVersionintthe patch version of the package
Summarystringa single sentence description (less than 50 words) of the package
Descriptionstringan unconstrained free text description of the package

Example:

let [<Literal>] PACKAGE_METADATA = """(*
---
Name: my-package
MajorVersion: 1
MinorVersion: 0
PatchVersion: 0
Summary: My package does the thing.
Description: |
My package does the thing.
It does it very good, it does it very well.
It does it very fast, it does it very swell.
---
*)"""
FieldTypeDescription
Publishboola boolean value indicating whether the package should be published to the registry. If set to true, the package will be built and pushed to the registry. If set to false (or not present), the package will be ignored.
Authorsauthor[]the authors of the package. For more information about mandatory and optional fields in this object, see Objects > Author
Tagsstring[]a list of tags with optional ontology annotations that describe the package. For more information about mandatory and optional fields in this object, see Objects > Tag
ReleaseNotesstring[]a list of release notes for the package indicating changes from previous versions
CQCHookEndpointstringan optional URL to a CQC Hook endpoint that can be used for continuous quality control (CQC) integration. If provided, this endpoint will be called with validation results after each package execution.

Example:

let [<Literal>] PACKAGE_METADATA = """(*
---
Name: my-package
MajorVersion: 1
MinorVersion: 0
PatchVersion: 0
Summary: My package does the thing.
Description: |
My package does the thing.
It does it very good, it does it very well.
It does it very fast, it does it very swell.
Publish: true
Authors:
- FullName: John Doe
Email: j@d.com
Affiliation: University of Nowhere
AffiliationLink: https://nowhere.edu
- FullName: Jane Doe
Email: jj@d.com
Affiliation: University of Somewhere
AffiliationLink: https://somewhere.edu
Tags:
- Name: validation
- Name: my-tag
TermSourceREF: my-ontology
TermAccessionNumber: MO:12345
ReleaseNotes: |
- initial release
- does the thing
- does it well
CQCHookEndpoint: https://some-url.xd
---
*)"""

Author metadata about the people that create and maintain the package.

FieldTypeDescriptionMandatory
FullNamestringthe full name of the authoryes
Emailstringthe email address of the authorno
Affiliationstringthe affiliation (e.g. institution) of the authorno
AffiliationLinkstringa link to the affiliation of the authorno

Tags can be any string with an optional ontology annotation from a controlled vocabulary:

FieldTypeDescriptionMandatory
Namestringthe name of the tagyes
TermSourceREFstringReference to a controlled vocabulary sourceno
TermAccessionNumberstringAccession in the referenced controlled vocabulary sourceno

Packages SHOULD be versioned according to the semantic versioning standard. This means that the version number of a package should be incremented according to the following rules:

  • Major version: incremented when you make changes incompatible with previous versions
  • Minor version: incremented when you add functionality in a backwards-compatible manner
  • Patch version: incremented when you make backwards-compatible bug fixes

Validation packages MUST create a folder structure in the folder they are executed in that looks exactly like this:

  • Directory<execution_directory>
    • Directory.arc-validate-results
      • Directory<package-name>@<package_version>
        • validation_report.xml
        • validation_summary.json
        • badge.svg

Meaning for the package invenio with version 3.1.0, the following folder structure MUST be created:

  • Directory<execution_directory>
    • Directory.arc-validate-results
      • Directoryinvenio@3.1.0
        • validation_report.xml
        • validation_summary.json
        • badge.svg

A file in JUnit XML format named validation_report.xml MUST be created inside the package output folder. This file contains detailed information about all validation cases executed by the package, including their status (passed, failed,errored), execution time, and any error messages or stack traces.

A SVG badge named badge.svg MUST be created inside the package output folder. This badge summarizes the overall validation status of the package execution. The badge SHOULD indicate whether the validation passed or failed, and MAY include additional information such as the number of tests executed or the percentage of tests passed. This badge will be displayed on the ARC homepage.

A JSON file named validation_summary.json MUST be created inside the package output folder. This file contains a summary of the validation results, including the total number of tests executed, the number of tests passed, failed, and errored, as well as other information necessary to trigger downstream processes in CQC pipelines.

Current schema is available in the ARC specification

To submit a validation package to the ARC Validation Package Registry (AVPR) go to the AVPR repository in GitHub and file a PR as stated in the repository’s README.