ARC from Publication
I have a publication with associated datasets. We published this before we knew about ARCs. Now I want to create an ARC based on this publication and its datasets to include relevant data and analysis steps that were not included in the original publication. How do I do this?
In this use-case we collect recommendations and thoughts on creating an ARC based on a publication and associated published datasets. This is not the typical entry into an ARC, but rather retrospective. It might however help to build community-tailored showcases; i.e. showing what a project could look like as an ARC.
-
Create an ARC and name it e.g. “FirstAuthorLastName-PublicationYear”
-
Add a
README.mdfile to the ARC, e.g. with the following content:# Title of Publication## Original Publication<citation as provided by publisher or exported from bibliography manager; ideally in a standard format including the DOI>## Abstract<paper abstract>## License<license / copyright as provided by publisher> -
Add original publication files into a folder (e.g.
_publication) inside the ARC, e.g.- The publication pdf
- Any supplemental files (as offered on the publication page)
- You can add a
_publication/README.mdwith a table of links to the files
-
Add a
LICENSEfile to the ARC
Translate the publication into the ARC’s Investigation
Section titled “Translate the publication into the ARC’s Investigation”- Add Title: the publication title
- Add Description: the publication abstract
- Add Public Release Data: the publication online date
- Add Contacts: the authors in same order as on publication
- Add First Name, Last Name, Affiliation
- If possible, add Emails and ORCIDs
- Add Publication
- DOI, Title, Authors, Status = Published
Separate the publication into studies and assays
Section titled “Separate the publication into studies and assays”In a publication most information is collated into a very aggregated state. In an ARC, we want to convert this into a more granular and modular structure. Hence, the major challenge is to identify how to decompose which sections and files above and how to best represent them in the ARC structure. This may require some iterations. As a starting point, try to reach a point where one assay covers one type of measurement and dataset and move “backwards” from there.
-
For each publication bit, try to categorize it into the ARC components, e.g. as follows:
Publication ARC Methods Study or assay protocol Computational analysis (e.g. scripts) Assay protocol or Workflow Results (e.g. Figures, Tables) Assay dataset or Run Supplemental files Assay dataset -
Try to clearly separate different types of measurements and datasets and the methods leading to them.
- For every measurement or result, create one
Assay - Add the respective files (e.g. result tables, result figures, supplemental files) to the
Assaydataset folder - Add the respective
protocol(e.g. methods, computational analysis) used to generate the associated dataset as anAssayprotocol - Add an annotation table to every assay and reference the
datasetfiles in anOutput [Data]column.
- Identify the sample sets leading to the
Assaydatasets. Which samples were measured in which assay? - For every sample set, create one
Studyand add an annotation table to list sample IDs for these samples. - Add the respective
protocol(e.g. materials, methods) used to generate the associated sample set as aStudyprotocol - Link the respective
Assaydataset files to theStudysamples. - If there are multiple measurements that share the same sample set you can link multiple
Assaysto thisStudy.
Annotate data and samples
Section titled “Annotate data and samples”- Reference the protocol file in the respective
StudyandAssayannotation table viaProtocol UriorProtocol REFcolumns. - Extract the major
protocolinformation and add this as metadata to the annotation tables.
Additional recommendations
Section titled “Additional recommendations”- Avoid spaces in file names. We recommend to use camelCase or PascalCase for file names
- However, in order to keep track of links and data origin, it is recommended to keep the original name of data files, i.e. if a publisher or repository stores files with spaces.
The publication may contain a section “data availability” or “data accession” or similar that references external links, e.g. large data files deposited to a public repository.
- Try to find and transfer all info (sample accessions, IDs, metadata, links, etc.) into the ARC.