ARC revision for publication using default issues

This guide provides a template issue list for ARC revision in the DataHUB. The issue list can be imported into the DataHUB to create a default set of issues for ARC revision. The issue list is based on the typical review criteria for ARC publication in DataPLANT, but can be adapted to the specific needs of the ARC revision.

review-arc-issues.csv

title	description
After publication: Cite ARC and DataPLANT and make ARC public	/confidential /label ~critical In your journal manuscript, please - [ ] Cite the DOI to your ARC in the 'data availability' statement - [ ] Cite the DataPLANT publication see https://nfdi4plants.org/articles/citing/ Once the manuscript is published, please - [ ] Set your ARC to `public` in the DataHUB via Settings → General → Visibility, project features, permissions - directly go to [➡ Visibility Settings](../../../edit#js-shared-permissions)
DataHUB: Move ARC into non-personal namespace	/confidential /label ~critical In order to guarantee that the ARC can (in the future) be accessed and maintained by others, please transfer it to a group namespace. - see https://nfdi4plants.org/nfdi4plants.knowledgebase/datahub/navigation-settings/datahub-arc-settings/ - or directly go to [➡ Advanced Settings](../../../edit#js-project-advanced-settings) and scroll down to 'Transfer project'
DataHUB: Add icon and description	/confidential /label ~enhancement Adding an icon (project avatar) and a description to your ARC makes it more attractive and easier to find for others. For instance, as icon you can use a typical image from your investigation, e.g. a plant or a plot; the description could be your manuscript title. - see https://nfdi4plants.org/nfdi4plants.knowledgebase/datahub/navigation-settings/datahub-arc-settings/ - or directly go to [➡ ARC Settings](../../../edit)
DataHUB: Add a LICENSE	/confidential /label ~critical Please add a LICENSE file to your ARC. This is important for others to know how they can use your data and to ensure that your data can be reused by others. - see https://nfdi4plants.org/nfdi4plants.knowledgebase/datahub/arc-features/datahub-arc-license/ - or directly [➡ add a LICENSE here](../../new/main?commit_message=Add+LICENSE&file_name=LICENSE)
DataHUB: Improve README	/confidential /label ~suggestion The README file is the first thing that other users see when they access your ARC. It should give a human-readable overview of the content of the ARC. DataPLANT provides a tool to automatically generate a nicely structured README from the ISA metadata as a basis, see https://nfdi4plants.org/nfdi4plants.knowledgebase/resources/arc-summary/
DataHUB: Check usage quota	/confidential /label ~critical Please check the `Project Storage` of your ARC. You can find this on the right side of the ARC's page. Most of the storage space should be under 'LFS'. As a rule of thumb, the usage under 'Repository' storage should be less than 500 MB. To see the project storage, one must at least have `maintainer` permission to the ARC. - directly go to [➡ usage quota](../../usage_quotas)
ISA Metadata: Double-check the investigation metadata	/confidential /label ~critical - [ ] the investigation identifier should not contain special characters - [ ] the title is clear – this could be the title of your manuscript - [ ] the description is clear and informative – this could be the abstract of your manuscript - [ ] all people who contributed to the project are listed in the investigation contacts – typically this includes all manuscript authors - [ ] if a publication (e.g. a pre-print) is associated with the investigation, it should be listed in the investigation publications
ISA Metadata: Double-check the use of annotation principles	/confidential /label ~critical In annotation tables, make sure to follow the [annotation principles](https://nfdi4plants.org/nfdi4plants.knowledgebase/core-concepts/isa-annotation-principles/), e.g. - dataset files are referenced as `Input [Data]` or `Output [Data]` - samples are referenced as `Input [Source Name]` or `Input [Sample Name]` or `Output [Sample Name]` - `Characteristic` columns describe inherent properties of samples or material - `Parameter` columns describe steps in your experimental workflow - `Factor` columns represent independent variables that are varied within the study design
ISA Metadata: Add annotation tables to every study and assay	/confidential /label ~critical Every study and assay must have an annotation table. This is important to ensure that the ARC is comprehensible and reusable, and machine-readable. A minimal annotation table should contain at least an `Input` and an `Output` column. One main goal of the ARC is to annotate raw, measurement dataset files with the necessary metadata to make them comprehensible and reusable. So, please double-check the sample–to-dataset connections and protocol references. To achieve this, - [ ] dataset files are added to assay `dataset` folders - [ ] dataset files are linked in annotation tables as `Output [Data]` - Only those files are listed as part of the DOI registration, so missing annotation tables can lead to missing dataset files in the DOI registration. - [ ] the annotation tables themselves provide necessary metadata (columns) or 'link back' to preceding annotation tables (of the same or other studies or assays) via `Input [Sample Name]` or `Input [Source Name]` columns - [ ] free-text protocols can be linked to assay or study annotation tables via `ProtocolUri`or `Protocol REF` columns, respectively. - While this helps to provide a human-readable description of the experimental workflow, the most important metadata should be contained in the annotation tables themselves to ensure that the ARC is comprehensible and reusable, and machine-readable See also https://nfdi4plants.org/nfdi4plants.knowledgebase/arc-use-cases/inputs-outputs
ISA Metadata: Double-check assay top-level metadata	/confidential /label ~suggestion All assays should contain 'top-level metadata'. This helps to find and understand the assay. - [ ] Short and concise title and description - [ ] Measurement Type - [ ] Technology Type - [ ] Technology Platform - [ ] Performers
Data from external sources or publications	/confidential /label ~suggestion If you have data from external sources that are relevant to your study (e.g. from a database, an online tool, or a publication's supplement), you can add them to your ARC. As described [here](https://nfdi4plants.github.io/nfdi4plants.knowledgebase/arc-use-cases/external-data), you can simply add a new study for such 'external data'. - [ ] add the data files to the `resources` folder of the study - [ ] add publications to relevant external data sources in the study 'top-level metadata' - [ ] add a protocol to describe, how to retrieve – e.g. create or download – the data
ISA Metadata: Double-check study top-level metadata	/confidential /label ~suggestion All studies should contain 'top-level metadata'. This helps to find and understand the study. - [ ] Short and concise title and description - [ ] Add contacts to show who contributed to that specific study or experiment - [ ] Add publications to relevant external data sources
Annotation of data analysis	/confidential /label ~suggestion Description of data analysis is an important part of the ARC, as it helps others to understand how the data was processed and analyzed. This basically follows the same logic as annotation of experimental data: one can annotate the data analysis steps via annotation tables, link a free-text description ('protocol') or a script (e.g. an R or python script or notebook) to define what data analysis was done on which `Input [Data]` and generating which `Output [Data]`. For details on different options, see https://nfdi4plants.org/nfdi4plants.knowledgebase/start-here/data-analysis/. Scripts or notebooks used for data processing and analysis, should be (re)usable in the ARC, e.g. - [ ] define necessary dependencies (e.g. which packages to install) - [ ] remove absolute paths - [ ] adapt relative paths to the structure of the ARC, referencing suitable input and output paths in the ARC More advanced users may want to annotate the data analysis steps in more detail, e.g. using the CWL-based workflows & runs logic to define the data analysis workflow and its dependencies in a machine-readable way. For details, see https://nfdi4plants.org/nfdi4plants.knowledgebase/cwl/ Workflows & runs should - [ ] contain minimal metadata, e.g. short and concise title and description, version - [ ] reference a reusable container (e.g. a Docker image or local Docker File) that contains all necessary dependencies for the workflow - [ ] reference the input and output dataset files in the ARC, e.g. via `Input [Data]` and `Output [Data]` columns in annotation tables
Add supplemental data	/confidential /label ~enhancement While creating a journal manuscript, you may have aggregated and submitted 'supplemental data' to the journal. These data are often a collection of different files, e.g. raw data files, processed data files, scripts, and documentation. An ARC provides a suitable location for any of these files. - [ ] add supplemental datasets to make them accessible and reusable for others and show how they relate to the overall ARC
Add or reference all relevant raw data	/confidential /label ~critical The ARC should contain all raw data files relevant for the investigation. Raw data is considered the 'outcome' of an assay, e.g. a measurement. Typically, one would add the data files directly to an assay, e.g. - [ ] add a new assay for the measurement that generated the raw data - [ ] add all relevant raw data files to the assay's `dataset` folder - [ ] provide clear metadata of the data in the annotation tables Some journals require that raw data is deposited in defined repositories (e.g. at EBI or NCBI). In this case, the raw data files can be linked in an assay via annotation tables. - [ ] for every dataset file add a URL to the annotation table under the `Output [Data]` column - [ ] make sure, the URL is stable and points to the correct file in the repository - [ ] provide clear metadata annotation of the data files via the annotation tables

Download review-arc-issues.csv

Using the review issue list during ARC revision

Navigate to the ARC to be revised in the DataHUB
Open the issue menu: left sidebar → Plan → Issues
Via the three-dots menu in the top-right corner use Import CSV
Select the review-arc-issues.csv file and import it
Wait a short moment. You receive an email notification once the import finished (or failed due to parsing errors).

(Optional) Bundle issues in a milestone

Create a new milestone (left sidebar → Plan → Milestones), e.g. with the following title and description:

Title

ARC publication revision

Description

This milestone helps to bundle all issues (➡ [Issues](../issues)) related to the ongoing revision of your ARC for publication via DataPLANT's [ARChive](https://archive.nfdi4plants.org/).

### ARC publication criteria

[This article](https://nfdi4plants.github.io/nfdi4plants.knowledgebase/guides/review-arc/) in the knowledge base summarizes what DataPLANT typically expects during ARC publication.

The ➡ [Issues](../issues) are used to transparently discuss certain quality criteria for the ARCs. Feel free to get in touch with the reviewers and data stewards by responding directly via the issues.

### Issue names and labels

- Issues starting with `DataHUB:` can easily be addressed here in the DataHUB
- Issues starting with `ISA Metadata:` are most easily fixed using some ARC tooling, e.g. ARCitect, Swate, ARC commander
- Issues labeled as `critical` **must** be addressed for acceptance
- Issues labeled as `suggestion` **should** be addressed for acceptance – this also depends on the nature of the ARC
- Issues labeled as `enhancement` add a nice touch to the ARC, e.g. for (human) findability

Open the issue menu: left sidebar → Plan → Issues
Click on the “Bulk edit” button in the top-right corner to select some or all issues
In the right sidebar, select the “Milestone” dropdown and choose the milestone you just created
Click on “Update selected” to apply the milestone to the selected issues

What’s next? – The actual revision process

The issue list is really just a starting point for ARC revision. This should help ARC reviewers to have a clear overview of the most important issues that need to be addressed for ARC publication. However, the issue list is not exhaustive and may need to be adapted to the specific needs of the ARC revision. For example, some issues may not be relevant for a specific ARC, while other issues may need to be added.

In order to guide the ARC creators and relate the issues to their specific ARC, it usually makes sense to add some comments to the issues. This can be done directly in the DataHUB by responding to the issues and include specific parts, e.g. study or assay names and screenshots (e.g. from ARCitect).

Furthermore, since the issue list is generic and with the logic above, always all issues are imported, it may be useful to close some of the issues immediately after import if they are not (anymore) relevant for the specific ARC revision. Again, this highlights, that certain criteria simply apply to all ARCs.

Adapting the issue list

The issue list is based on the CSV import feature of GitLab issues, which allows you to create multiple issues at once by uploading a CSV file with the issue details. For more details on the CSV format and the available options, please refer to the GitLab documentation.

Basics about the issue CSV file

The first two columns must be title and description.
The first row is used as a header and is ignored.
The order of the rows does not matter – by default the issues are sorted alphabetically in the DataHUB.

Use of quick-actions

GitLab offers so-called “quick-actions” to perform certain actions on issues, e.g. to assign an issue to a user, to add a label or to mark an issue as confidential. These quick-actions can also be used in the CSV file to set up the issues in a more specific way during import.

Each action must be on a separate line.
For quick actions like /label and /milestone, the label or milestone must already exist in the project.

Labels

Using the quick-action /label allows to categorize the issues. Instead of typical journal review terminology (e.g. “required”, “recommended”, “major”, “minor”), this issue list uses the default GitLab labels critical, suggestion and enhancement. Since these labels already exist on every project (ARC) by default. This prevents having to create the labels before issue import.

Confidential

Marking review issues as “confidential” via the /confidential quick-action simply adds a bit more comfort for the ARC authors. Issues marked as confidential are only visible to project members with at least “Reporter” permissions.