codemetar: generate codemeta metadata for R packages

The ‘Codemeta’ Project defines a ‘JSON-LD’ format for describing software metadata, as detailed at https://codemeta.github.io. This package provides utilities to generate, parse, and modify codemeta.jsonld files automatically for R packages, as well as tools and examples for working with codemeta json-ld more generally.

It has three main goals:

  • Quickly generate a valid codemeta.json file from any valid R package To do so, we automatically extract as much metadata as possible using the DESCRIPTION file, as well as extracting metadata from other common best-practices such as the presence of Travis and other badges in README, etc.
  • Facilitate the addition of further metadata fields into a codemeta.json file, as well as general manipulation of codemeta files.
  • Support the ability to crosswalk between terms used in other metadata standards, as identified by the Codemeta Project Community, see https://codemeta.github.io/crosswalk

For more general information about the CodeMeta Project for defining software metadata, see https://codemeta.github.io. In particular, new users might want to start with the User Guide, while those looking to learn more about JSON-LD and consuming existing codemeta files should see the Developer Guide.

A brief intro to common terms we’ll use:

  • Linked data: We often use different words to mean the same thing. And sometimes the same word to mean different things. Linked data seeks to address this issue by using URIs (i.e. URLs) to make this explcit.

  • context: No one likes typing out long URLs all the time. So instead, the context of a JSON-LD file ("@context" element) gives us the context for the terms we use, that is, the root URL. Usually schema.org but domain specific ones also (eg codemeta)

  • Schema.org: A major initiative led by Google and other search engines to define a simple and widely used context to link data on the web through a catalogue of standard metadata fields

  • The CodeMeta Project: an academic led community initiative to formalise the metadata fields included in typical software metadata records and introduce important fields that did not have clear equivalents. The codemeta crosswalk provides an explicit map between the metadata fields used by a broad range of software repositories, registries and archives

  • JSON-LD: While ‘linked data’ can be represented in many different formats, these have consistently proven a bit tricky to use, either for consumers or developers or both. JSON-LD provides a simple adaptation of the JSON format, which has proven much more popular with both audiences, that allows it to express (most) linked-data concepts. It is now the format of choice for expressing linked data by Google and many others. Any JSON-LD file is valid JSON, and any JSON file can be treated as JSON-LD.

  • codemetar: The CodeMeta Project has created tools in several languages to impelement the CodeMeta Crosswalk (using JSON-LD) and help extract software metadata into codemeta.json records. codemetar is one such tool, focused on R and R packages.

Installation

You can install codemetar from GitHub with:

# install.packages("devtools")
devtools::install_github("codemeta/codemetar")
library("codemetar")

Example

This is a basic example which shows you how to generate a codemeta.json for an R package (e.g. for testthat):

write_codemeta("testthat")

codemetar can take the path to the package root instead. This may allow codemetar to glean some additional information that is not available from the description file alone.

{
  "@context": [
    "http://purl.org/codemeta/2.0",
    "http://schema.org"
  ],
  "@type": "SoftwareSourceCode",
  "identifier": "testthat",
  "description": "Software testing is important, but, in part because it is \n    frustrating and boring, many of us avoid it. 'testthat' is a testing framework \n    for R that is easy learn and use, and integrates with your existing 'workflow'.",
  "name": "testthat: Unit Testing for R",
  "issueTracker": "https://github.com/r-lib/testthat/issues",
  "datePublished": "2017-12-13 09:30:12 UTC",
  "license": "https://spdx.org/licenses/MIT",
  "version": "2.0.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "version": "3.4.3",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 3.4.3 (2017-11-30)",
  "provider": {
    "@id": "https://cran.r-project.org",
    "@type": "Organization",
    "name": "Central R Archive Network (CRAN)",
    "url": "https://cran.r-project.org"
  },
  "author": [
    {
      "@type": "Person",
      "givenName": "Hadley",
      "familyName": "Wickham",
      "email": "hadley@rstudio.com"
    }
  ],
  "contributor": [
    {
      "@type": "Organization",
      "name": "R Core team"
    }
  ],
  "copyrightHolder": [
    {
      "@type": "Organization",
      "name": "RStudio"
    }
  ],
  "maintainer": {
    "@type": "Person",
    "givenName": "Hadley",
    "familyName": "Wickham",
    "email": "hadley@rstudio.com"
  },
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "covr",
      "name": "covr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "devtools",
      "name": "devtools",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "xml2",
      "name": "xml2",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    }
  ],
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "cli",
      "name": "cli",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "crayon",
      "name": "crayon",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "digest",
      "name": "digest",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "magrittr",
      "name": "magrittr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "methods",
      "name": "methods"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "praise",
      "name": "praise",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "R6",
      "name": "R6",
      "version": "2.2.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rlang",
      "name": "rlang",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "withr",
      "name": "withr",
      "version": "2.0.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Central R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      }
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": "3.1"
    }
  ]
}

Modifying or enriching CodeMeta metadata

The best way to ensure codemeta.json is as complete as possible is to begin by making full use of the fields that can be set in an R package DESCRIPTION file, such as BugReports and URL. Using the Authors@R notation allows a much richer specification of author roles, correct parsing of given vs family names, and email addresses.

In the current implementation, developers may specify an ORCID url for an author in the optional comment field of Authors@R, e.g.

Authors@R: person("Carl", "Boettiger", role=c("aut", "cre", "cph"), email="cboettig@gmail.com", comment="http://orcid.org/0000-0002-1642-628X")

which will allow codemetar to associate an identifier with the person. This is clearly something of a hack since R’s person object lacks an explicit notion of id, and may be frowned upon.

Using the DESCRIPTION file

The DESCRIPTION file is the natural place to specify any metadata for an R package. The codemetar package can detect certain additional terms in the CodeMeta context. Almost any additional codemeta field can be added to and read from the DESCRIPTION into a codemeta.json file (see codemetar:::additional_codemeta_terms for a list).

CRAN requires that you prefix any additional such terms to indicate the use of schema.org explicitly, e.g. keywords would be specified in a DESCRIPTION file as:

X-schema.org-keywords: metadata, codemeta, ropensci, citation, credit, linked-data

Where applicable, these will override values otherwise guessed from the source repository. Use comma-separated lists to separate multiple values to a property, e.g. keywords.

See the DESCRIPTION file of the codemetar package for an example.

Going further

Check out all the codemetar vignettes for tutorials on other cool stuff you can do with codemeta and json-ld.