Metadata and ISA

What is
metadata?

Viola's PhD Project

Exercise: Take 5 minutes to note down the metadata

Viola investigates the effect of the plant circadian clock on sugar metabolism in W. mirabilis. For her PhD project, which is part of an EU-funded consortium in Prof. Beetroot's lab, she acquires seeds from a South-African botanical society. Viola grows the plants under different light regimes, harvests leaves from a two-day time series experiment, extracts polar metabolites as well as RNA and submits the samples to nearby core facilities for metabolomics and transcriptomics measurements, respectively. After a few weeks of iterative consultation with the facilities' heads as well as technicians and computational biologists involved, Viola receives back a wealth of raw and processed data. From the data she produces figures and wraps everything up to publish the results in the Journal of Wonderful Plant Sciences.

Metadata everywhere

Viola investigates the effect of the plant circadian clock on sugar metabolism in W. mirabilis. For her PhD project, which is part of an EU-funded consortium in Prof. Beetroot's lab, she acquires seeds from a South-African botanical society. Viola grows the plants under different light regimes, harvests leaves from a two-day time series experiment, extracts polar metabolites as well as RNA and submits the samples to nearby core facilities for metabolomics and transcriptomics measurements, respectively. After a few weeks of iterative consultation with the facilities' heads as well as technicians and computational biologists involved, Viola receives back a wealth of raw and processed data. From the data she produces figures and wraps everything up to publish the results in the Journal of Wonderful Plant Sciences.

Project metadata

project design

  • researcher
  • institute and project
  • biological context
  • research question
  • purpose of data collection
  • ...

experimental processes

  • origin and nature of the biological material
  • lab protocols
  • instrument model
  • ...

data-analytical processes

  • algorithms
  • tools
  • software versions and dependencies employed
  • ...

Other types of metadata

bibliographic

  • Title
  • Publication date and title
  • Description
  • Author
  • Contacts
  • Keywords
  • ...

legal or administrative

  • data origin, ownership, rovenance,
  • licensing
  • ethical aspects
  • ...

technical

  • expected data volume
  • storage location
  • file formats
  • ...

Metadata from a FAIR perspective

Findable

  • metadata names the content of the data
  • basis for search engines
  • makes it categorizable for people and machines

Accessible

  • information about origin
  • location of storage
  • access rights

Interoperable

  • metadata identifies software and file formats
  • required conversions between file formats

Reusable

  • obtain and reuse research data according to clear rules described in licenses

Metadata "Standards"

Examples from Minimum Information for Biological and Biomedical Investigations (MIBBI):

💡 Check out https://fairsharing.org/ for more examples

Metadata standards ≈ Checklists

  • Determine (minimal) required information
  • Usually do not determine the format (i.e. shape or file type)

A small Interactive detour

-> favorite Movie

How does google "know"?!

Schemas and machine-readability

Structured data and the internet

Schema.org

  • create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, ...
  • Structured data can be used to mark up all kinds of items from products to events to recipes
  • Communicate with search engines (-> SEO, search engine optimization)
  • Enhance findability from search engine results
  • Provide context to an ambigous webpage
  • Metadata interoperability and standardization across all website using schema.org

Structured data and the internet: Schema.org

https://schema.org/Person

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "address": {
    "@type": "PostalAddress",
    "addressLocality": "Seattle",
    "addressRegion": "WA",
    "postalCode": "98052",
    "streetAddress": "20341 Whitworth Institute 405 N. Whitworth"
  },
  "colleague": [
    "http://www.xyz.edu/students/alicejones.html",
    "http://www.xyz.edu/students/bobsmith.html"
  ],
  "email": "mailto:jane-doe@xyz.edu",
  "image": "janedoe.jpg",
  "jobTitle": "Professor",
  "name": "Jane Doe",
  "telephone": "(425) 123-4567",
  "url": "http://www.janedoe.com"
}
</script>

JSON-LD

JSON-LD = JavaScript Object Notation for Linked Data

<script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@type": "SportsTeam",
    "name": "San Francisco 49ers",
    "member": {
      "@type": "OrganizationRole",
      "member": {
        "@type": "Person",
        "name": "Joe Montana"
      },
      "startDate": "1979",
      "endDate": "1992",
      "roleName": "Quarterback"
    }
  }
</script>

RDFa

RDFa = Resource Description Framework in Attributes

<div vocab="http://schema.org/" typeof="SportsTeam">
  <span property="name">San Francisco 49ers</span>
  <div property="member" typeof="OrganizationRole">
    <div property="member" typeof="http://schema.org/Person">
      <span property="name">Joe Montana</span>
    </div>
    <span property="startDate">1979</span>
    <span property="endDate">1992</span>
    <span property="roleName">Quarterback</span>
  </div>
</div>

Standards

Dublin Core

https://www.dublincore.org/schemas/

DataCite Schema

DataCite Schema: Simple Example

...
  <identifier identifierType="DOI">10.5072/D3P26Q35R-Test</identifier>
  <creators>
    <creator>
      <creatorName nameType="Personal">Fosmire, Michael</creatorName>
      <givenName>Michael</givenName>
      <familyName>Fosmire</familyName>
    </creator>
    <creator>
      <creatorName nameType="Personal">Wertz, Ruth</creatorName>
      <givenName>Ruth</givenName>
      <familyName>Wertz</familyName>
    </creator>
    <creator>
      <creatorName nameType="Personal">Purzer, Senay</creatorName>
      <givenName>Senay</givenName>
      <familyName>Purzer</familyName>
    </creator>
  </creators>
  <titles>
    <title xml:lang="en">Critical Engineering Literacy Test (CELT)</title>
  </titles>
  <publisher xml:lang="en">Purdue University Research Repository (PURR)</publisher>
  <publicationYear>2013</publicationYear>
  <subjects>
    <subject xml:lang="en">Assessment</subject>
    <subject xml:lang="en">Information Literacy</subject>
    <subject xml:lang="en">Engineering</subject>
    <subject xml:lang="en">Undergraduate Students</subject>
    <subject xml:lang="en">CELT</subject>
    <subject xml:lang="en">Purdue University</subject>
  </subjects>
  <language>en</language>
  <resourceType resourceTypeGeneral="Dataset">Dataset</resourceType>
...

https://schema.datacite.org/meta/kernel-4.3/example/datacite-example-dataset-v4.xml

Ontologies

Ontology

(Sometimes also referred to "semantic model")

An ontology combines features of

  • a dictionary,
  • a taxonomy, and
  • a thesaurus

Dictionary

Alphabetically lists terms and their definitions

Pizza: "a dish made typically of flattened bread dough spread with a savory mixture usually including tomatoes and cheese and often other toppings and baked"

Taxonomy

Hierarchy or classification

Thesaurus

Dictionary of synonyms and relations

Pizza ≈ Lahmacun ≈ Focaccia ≈ Flammkuchen

Ontology

  • Structures a set of concepts in a particular area and the relations between them in a graph-like manner
  • Can be used in disambiguation, defining hierarchies, a standard to define terms
  • Define a common vocabulary of concepts and their relationships to model a particular domain while making it machine understandable

The semantic triple

Modeling a pizza menu

Modeling a pizza menu

Modeling a pizza menu

Predicates have two directions

Looking at the menu from a different perspective

An object of one triplet can be the subject to another

(Towards) a knowledge graph

Searching the menu

An ontology can be queried:

  • "name all pizzas with topping mushrooms"

The Pizza Ontology

Example ontologies

EDAM ontology

PECO ontology

Explore more examples

ARC builds on ISA

https://isa-tools.org/format/specification.html

ARC builds on ISA

isa.<>.xlsx files within ARCs

Study and assay files are registered in the investigation file

The output of a study or assay file can function as input for a new isa.assay.xlsx

Output building blocks:

  • Sample Name
  • Raw Data File
  • Derived Data File

Swate

Annotation by flattening the knowledge graph

  • Low-friction metadata annotation
  • Familiar spreadsheet, row/column-based environment

Annotation principle

  • Low-friction metadata annotation
  • Familiar spreadsheet, row/column-based environment

Adding new building blocks (columns)

  • Swate can be used for the annotation of isa.study.xlsx and isa.assay.xlsx files

Annotation Building Block types

  • Source Name (Input)
  • Protocol Columns
    • Protocol Type, Protocol Ref
  • Characteristic
  • Parameter
  • Factor
  • Component
  • Output Columns
    • Sample Name, Raw Data File, Derived Data File

Let's take a detour on Annotation Principles | slides

Ontology term search

Enable related term directed search to directly fill cells with child terms

Fill your table with ontology terms

Hierarchical combination of ontologies

Swate templates

Checklists and Templates

Metadata standards or repository requirements can be represented as templates

Realization of lab-specific metadata templates

Facilities can define their most common workflows as templates

Directly import templates via Swate

  • DataPLANT curated
  • Community templates

Contributors

Slides presented here include contributions by

Exercise: Association map Online: Let participants annotate (via video conference tool) Presence: Draw map on (white) board

- let participant name a movie - how do you find out the actors, director, release year, etc.? - => google.com - google movie - see knowledge graph to the right - how does google know all that?! - ===> schema.org

TODO: - This is actually not a proper ontology(!), but rather a knowledge graph (= ontology + data)

LIVE-Demo - Search an "interesting" term from PECO in browser (EBI OLS) - Example: - plant exposure abiotic plant exposure physical plant exposure water environment exposure drought environment exposure - Show the graph view (and expand it interactively) - Mention that terms (subjects, objects) and properties (predicates) have "URIs", "PIDs" - Show that terms can have alternative / external IDs and link to "outdated" ontologies

<style scoped> section p img{ /* padding-left: 230px */ } </style>

combination of ISA (Characteristics, Parameter, Factor) and a biological or technological ontology (e.g. temperature, strain, instrument model) gives the flexibility to display an ontology term, e.g. temperature, as a regular process parameter or as the factor your study is based on (Parameter \[temperature\] or Factor \[temperature\]).