review-arc-issues.csv
| title | description |
|---|---|
| After publication: Cite ARC and DataPLANT and make ARC public | /confidential /label ~critical In your journal manuscript, please - [ ] Cite the DOI to your ARC in the 'data availability' statement - [ ] Cite the DataPLANT publication see https://nfdi4plants.org/articles/citing/ Once the manuscript is published, please - [ ] Set your ARC to `public` in the DataHUB via Settings → General → Visibility, project features, permissions - directly go to [➡ Visibility Settings](../../../edit#js-shared-permissions) |
| DataHUB: Move ARC into non-personal namespace | /confidential /label ~critical In order to guarantee that the ARC can (in the future) be accessed and maintained by others, please transfer it to a group namespace. - see https://nfdi4plants.org/nfdi4plants.knowledgebase/datahub/navigation-settings/datahub-arc-settings/ - or directly go to [➡ Advanced Settings](../../../edit#js-project-advanced-settings) and scroll down to 'Transfer project' |
| DataHUB: Add icon and description | /confidential /label ~enhancement Adding an icon (project avatar) and a description to your ARC makes it more attractive and easier to find for others. For instance, as icon you can use a typical image from your investigation, e.g. a plant or a plot; the description could be your manuscript title. - see https://nfdi4plants.org/nfdi4plants.knowledgebase/datahub/navigation-settings/datahub-arc-settings/ - or directly go to [➡ ARC Settings](../../../edit) |
| DataHUB: Add a LICENSE | /confidential /label ~critical Please add a LICENSE file to your ARC. This is important for others to know how they can use your data and to ensure that your data can be reused by others. - see https://nfdi4plants.org/nfdi4plants.knowledgebase/datahub/arc-features/datahub-arc-license/ - or directly [➡ add a LICENSE here](../../new/main?commit_message=Add+LICENSE&file_name=LICENSE) |
| DataHUB: Improve README | /confidential /label ~suggestion The README file is the first thing that other users see when they access your ARC. It should give a human-readable overview of the content of the ARC. DataPLANT provides a tool to automatically generate a nicely structured README from the ISA metadata as a basis, see https://nfdi4plants.org/nfdi4plants.knowledgebase/resources/arc-summary/ |
| DataHUB: Check usage quota | /confidential /label ~critical Please check the `Project Storage` of your ARC. You can find this on the right side of the ARC's page. Most of the storage space should be under 'LFS'. As a rule of thumb, the usage under 'Repository' storage should be less than 500 MB. To see the project storage, one must at least have `maintainer` permission to the ARC. - directly go to [➡ usage quota](../../usage_quotas) |
| ISA Metadata: Double-check the investigation metadata | /confidential /label ~critical - [ ] the investigation identifier should not contain special characters - [ ] the title is clear – this could be the title of your manuscript - [ ] the description is clear and informative – this could be the abstract of your manuscript - [ ] all people who contributed to the project are listed in the investigation contacts – typically this includes all manuscript authors - [ ] if a publication (e.g. a pre-print) is associated with the investigation, it should be listed in the investigation publications |
| ISA Metadata: Double-check the use of annotation principles | /confidential /label ~critical In annotation tables, make sure to follow the [annotation principles](https://nfdi4plants.org/nfdi4plants.knowledgebase/core-concepts/isa-annotation-principles/), e.g. - dataset files are referenced as `Input [Data]` or `Output [Data]` - samples are referenced as `Input [Source Name]` or `Input [Sample Name]` or `Output [Sample Name]` - `Characteristic` columns describe inherent properties of samples or material - `Parameter` columns describe steps in your experimental workflow - `Factor` columns represent independent variables that are varied within the study design |
| ISA Metadata: Add annotation tables to every study and assay | /confidential /label ~critical Every study and assay must have an annotation table. This is important to ensure that the ARC is comprehensible and reusable, and machine-readable. A minimal annotation table should contain at least an `Input` and an `Output` column. One main goal of the ARC is to annotate raw, measurement dataset files with the necessary metadata to make them comprehensible and reusable. So, please double-check the sample–to-dataset connections and protocol references. To achieve this, - [ ] dataset files are added to assay `dataset` folders - [ ] dataset files are linked in annotation tables as `Output [Data]` - Only those files are listed as part of the DOI registration, so missing annotation tables can lead to missing dataset files in the DOI registration. - [ ] the annotation tables themselves provide necessary metadata (columns) or 'link back' to preceding annotation tables (of the same or other studies or assays) via `Input [Sample Name]` or `Input [Source Name]` columns - [ ] free-text protocols can be linked to assay or study annotation tables via `ProtocolUri`or `Protocol REF` columns, respectively. - While this helps to provide a human-readable description of the experimental workflow, the most important metadata should be contained in the annotation tables themselves to ensure that the ARC is comprehensible and reusable, and machine-readable See also https://nfdi4plants.org/nfdi4plants.knowledgebase/arc-use-cases/inputs-outputs |
| ISA Metadata: Double-check assay top-level metadata | /confidential /label ~suggestion All assays should contain 'top-level metadata'. This helps to find and understand the assay. - [ ] Short and concise title and description - [ ] Measurement Type - [ ] Technology Type - [ ] Technology Platform - [ ] Performers |
| Data from external sources or publications | /confidential /label ~suggestion If you have data from external sources that are relevant to your study (e.g. from a database, an online tool, or a publication's supplement), you can add them to your ARC. As described [here](https://nfdi4plants.github.io/nfdi4plants.knowledgebase/arc-use-cases/external-data), you can simply add a **new study** for such 'external data'. - [ ] add the data files to the `resources` folder of the study - [ ] add publications to relevant external data sources in the study 'top-level metadata' - [ ] add a protocol to describe, how to retrieve – e.g. create or download – the data |
| ISA Metadata: Double-check study top-level metadata | /confidential /label ~suggestion All studies should contain 'top-level metadata'. This helps to find and understand the study. - [ ] Short and concise title and description - [ ] Add contacts to show who contributed to that specific study or experiment - [ ] Add publications to relevant external data sources |
| Annotation of data analysis | /confidential /label ~suggestion Description of data analysis is an important part of the ARC, as it helps others to understand how the data was processed and analyzed. This basically follows the same logic as annotation of experimental data: one can annotate the data analysis steps via annotation tables, link a free-text description ('protocol') or a script (e.g. an R or python script or notebook) to define what data analysis was done on which `Input [Data]` and generating which `Output [Data]`. For details on different options, see https://nfdi4plants.org/nfdi4plants.knowledgebase/start-here/data-analysis/. Scripts or notebooks used for data processing and analysis, should be (re)usable in the ARC, e.g. - [ ] define necessary dependencies (e.g. which packages to install) - [ ] remove absolute paths - [ ] adapt relative paths to the structure of the ARC, referencing suitable input and output paths in the ARC More advanced users may want to annotate the data analysis steps in more detail, e.g. using the CWL-based workflows & runs logic to define the data analysis workflow and its dependencies in a machine-readable way. For details, see https://nfdi4plants.org/nfdi4plants.knowledgebase/cwl/ Workflows & runs should - [ ] contain minimal metadata, e.g. short and concise title and description, version - [ ] reference a reusable container (e.g. a Docker image or local Docker File) that contains all necessary dependencies for the workflow - [ ] reference the input and output dataset files in the ARC, e.g. via `Input [Data]` and `Output [Data]` columns in annotation tables |
| Add supplemental data | /confidential /label ~enhancement While creating a journal manuscript, you may have aggregated and submitted 'supplemental data' to the journal. These data are often a collection of different files, e.g. raw data files, processed data files, scripts, and documentation. An ARC provides a suitable location for any of these files. - [ ] add supplemental datasets to make them accessible and reusable for others and show how they relate to the overall ARC |
| Add or reference all relevant raw data | /confidential /label ~critical The ARC should contain all raw data files relevant for the investigation. Raw data is considered the 'outcome' of an assay, e.g. a measurement. Typically, one would add the data files directly to an assay, e.g. - [ ] add a new assay for the measurement that generated the raw data - [ ] add all relevant raw data files to the assay's `dataset` folder - [ ] provide clear metadata of the data in the annotation tables Some journals require that raw data is deposited in defined repositories (e.g. at EBI or NCBI). In this case, the raw data files can be linked in an assay via annotation tables. - [ ] for every dataset file add a URL to the annotation table under the `Output [Data]` column - [ ] make sure, the URL is stable and points to the correct file in the repository - [ ] provide clear metadata annotation of the data files via the annotation tables |
Download review-arc-issues.csv