The ‘Codemeta’ Project defines a ‘JSON-LD’ format for describing software metadata, as detailed at https://codemeta.github.io. This package provides utilities to generate, parse, and modify codemeta.jsonld files automatically for R packages, as well as tools and examples for working with codemeta json-ld more generally.
It has three main goals:
For more general information about the CodeMeta Project for defining software metadata, see https://codemeta.github.io. In particular, new users might want to start with the User Guide, while those looking to learn more about JSON-LD and consuming existing codemeta files should see the Developer Guide.
Linked data: We often use different words to mean the same thing. And sometimes the same word to mean different things. Linked data seeks to address this issue by using URIs (i.e. URLs) to make this explcit.
context: No one likes typing out long URLs all the time. So instead, the context of a JSON-LD file ("@context"
element) gives us the context for the terms we use, that is, the root URL. Usually schema.org but domain specific ones also (eg codemeta)
Schema.org: A major initiative led by Google and other search engines to define a simple and widely used context to link data on the web through a catalogue of standard metadata fields
The CodeMeta Project: an academic led community initiative to formalise the metadata fields included in typical software metadata records and introduce important fields that did not have clear equivalents. The codemeta crosswalk provides an explicit map between the metadata fields used by a broad range of software repositories, registries and archives
JSON-LD: While ‘linked data’ can be represented in many different formats, these have consistently proven a bit tricky to use, either for consumers or developers or both. JSON-LD provides a simple adaptation of the JSON format, which has proven much more popular with both audiences, that allows it to express (most) linked-data concepts. It is now the format of choice for expressing linked data by Google and many others. Any JSON-LD file is valid JSON, and any JSON file can be treated as JSON-LD.
codemetar: The CodeMeta Project has created tools in several languages to impelement the CodeMeta Crosswalk (using JSON-LD) and help extract software metadata into codemeta.json
records. codemetar
is one such tool, focused on R and R packages.
You can install codemetar from GitHub with:
# install.packages("devtools")
devtools::install_github("codemeta/codemetar")
library("codemetar")
This is a basic example which shows you how to generate a codemeta.json
for an R package (e.g. for testthat
):
write_codemeta("testthat")
codemetar
can take the path to the package root instead. This may allow codemetar
to glean some additional information that is not available from the description file alone.
write_codemeta(".")
{
"@context": [
"http://purl.org/codemeta/2.0",
"http://schema.org"
],
"@type": "SoftwareSourceCode",
"identifier": "testthat",
"description": "Software testing is important, but, in part because it is \n frustrating and boring, many of us avoid it. 'testthat' is a testing framework \n for R that is easy learn and use, and integrates with your existing 'workflow'.",
"name": "testthat: Unit Testing for R",
"issueTracker": "https://github.com/r-lib/testthat/issues",
"datePublished": "2017-12-13 09:30:12 UTC",
"license": "https://spdx.org/licenses/MIT",
"version": "2.0.0",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
"version": "3.4.3",
"url": "https://r-project.org"
},
"runtimePlatform": "R version 3.4.3 (2017-11-30)",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
},
"author": [
{
"@type": "Person",
"givenName": "Hadley",
"familyName": "Wickham",
"email": "hadley@rstudio.com"
}
],
"contributor": [
{
"@type": "Organization",
"name": "R Core team"
}
],
"copyrightHolder": [
{
"@type": "Organization",
"name": "RStudio"
}
],
"maintainer": {
"@type": "Person",
"givenName": "Hadley",
"familyName": "Wickham",
"email": "hadley@rstudio.com"
},
"softwareSuggestions": [
{
"@type": "SoftwareApplication",
"identifier": "covr",
"name": "covr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "devtools",
"name": "devtools",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "knitr",
"name": "knitr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "rmarkdown",
"name": "rmarkdown",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "xml2",
"name": "xml2",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
}
],
"softwareRequirements": [
{
"@type": "SoftwareApplication",
"identifier": "cli",
"name": "cli",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "crayon",
"name": "crayon",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "digest",
"name": "digest",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "magrittr",
"name": "magrittr",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "methods",
"name": "methods"
},
{
"@type": "SoftwareApplication",
"identifier": "praise",
"name": "praise",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "R6",
"name": "R6",
"version": "2.2.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "rlang",
"name": "rlang",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "withr",
"name": "withr",
"version": "2.0.0",
"provider": {
"@id": "https://cran.r-project.org",
"@type": "Organization",
"name": "Central R Archive Network (CRAN)",
"url": "https://cran.r-project.org"
}
},
{
"@type": "SoftwareApplication",
"identifier": "R",
"name": "R",
"version": "3.1"
}
]
}
The best way to ensure codemeta.json
is as complete as possible is to begin by making full use of the fields that can be set in an R package DESCRIPTION file, such as BugReports
and URL
. Using the Authors@R
notation allows a much richer specification of author roles, correct parsing of given vs family names, and email addresses.
In the current implementation, developers may specify an ORCID url for an author in the optional comment
field of Authors@R
, e.g.
Authors@R: person("Carl", "Boettiger", role=c("aut", "cre", "cph"), email="cboettig@gmail.com", comment="http://orcid.org/0000-0002-1642-628X")
which will allow codemetar
to associate an identifier with the person. This is clearly something of a hack since R’s person
object lacks an explicit notion of id
, and may be frowned upon.
The DESCRIPTION file is the natural place to specify any metadata for an R package. The codemetar
package can detect certain additional terms in the CodeMeta context. Almost any additional codemeta field can be added to and read from the DESCRIPTION into a codemeta.json
file (see codemetar:::additional_codemeta_terms
for a list).
CRAN requires that you prefix any additional such terms to indicate the use of schema.org
explicitly, e.g. keywords
would be specified in a DESCRIPTION file as:
X-schema.org-keywords: metadata, codemeta, ropensci, citation, credit, linked-data
Where applicable, these will override values otherwise guessed from the source repository. Use comma-separated lists to separate multiple values to a property, e.g. keywords.
See the DESCRIPTION file of the codemetar
package for an example.
Check out all the codemetar vignettes for tutorials on other cool stuff you can do with codemeta and json-ld.