Framing with JSON-LD

library(codemetar)
library(jsonlite)
library(jsonvalidate)
library(jsonld)

We begin with an example of a relatively thorough codemeta.json file. Note that the original file conforms to our schema, meaning that the data conforms to the expected tree structure.

ex <- system.file("examples/full-codemeta.json", package = "codemetar")
schema <- system.file("schema/codemeta_schema.json", package = "codemetar")


json_validate(ex, schema)

## [1] TRUE

A predictable tree structure is very useful in programmatic applications of the data, e.g. we can find the email of the first author as:

codemeta <- read_json(ex)
codemeta$agents[[1]]$email

## [1] "slaughter@nceas.ucsb.edu"

However, this puts the burden on the data provider to be familiar with and conform to our schema. It is important to bear in mind that some of these restrictions are, in a sense, arbitrary: the data could be formatted in some different manner without any change to it’s information content. A different format may be more convenient for the user to provide, and a different application may prefer a different tree structure from the one imposed on the data by our schema. With JSON-LD framing, we can avoid putting this burden on the data provider and instead allow the developer/consumer to request the data in whatever tree structure is most convenient.

To illustrate this, let’s first transform the data into an equivalent alternative representation with a flat structure:

doc <- paste(readLines(ex), collapse = "\n")
flat <- jsonld_flatten(doc)

Note that though this is a lossless conversion (no information is lost, and we can transform back to the original format if requested), the resulting flat structure does not confrom to our schema:

json_validate(flat, schema)

## [1] FALSE

and consequently, we can no longer assume the prior structure to extract information of interest:

codemeta <- fromJSON(flat, simplifyVector = FALSE) # same as writeLines(flat, "flat.json"); read_json("flat.json")
codemeta$agents[[1]]$email

## NULL

This looks like bad news. Fortunately, we can reconstruct the data in whatever format we need using the appropriate transformations:

Exploring the transformations

Expanding the flat document doesn’t change the structure at all, but expanding from the original document gives a different tree:

expanded <- jsonld_expand(flat)
identical(flat, expanded)

## [1] TRUE

expanded2 <- jsonld_expand(doc)
identical(expanded, expanded2)

## [1] FALSE

Using a simple frame:

ex_frame <- '{
  "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
  "@type": "SoftwareSourceCode",
  "@explicit": "true",
  "title": {},
  "description": {},
  "agents": {
    "@explicit": "true",
    "name": {},
    "email": {}
  }
}'

Framing the flat file and the expanded version of the flat file give

framed <- jsonld_frame(expanded, ex_frame) 
framed2 <- jsonld_frame(expanded2, ex_frame) 

identical(framed, framed2)

## [1] TRUE

Likewise, we still get the same thing had we started with the flat file or the original doc:

identical(framed, jsonld_frame(flat, ex_frame))

## [1] TRUE

identical(framed, jsonld_frame(doc, ex_frame))

## [1] TRUE

Note this resulting document contains only the information explicitly requested in the above frame (since we used "@explicit": "true"),

framed

## {
##   "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
##   "@graph": [
##     {
##       "@id": "_:b0",
##       "@type": "SoftwareSourceCode",
##       "title": "CodeMeta, a minimal convention for software metadata",
##       "agents": [
##         {
##           "@id": "http://orcid.org/0000-0002-2192-403X",
##           "@type": "person",
##           "email": "slaughter@nceas.ucsb.edu",
##           "name": "Peter Slaughter"
##         },
##         {
##           "@id": "http://orcid.org/0000-0002-3957-2474",
##           "@type": "organization",
##           "email": "info@ucop.edu",
##           "name": "University of California, Santa Barbara"
##         }
##       ],
##       "description": "Codemeta is a metadata content standard for software.  It includes schemas in JSON-LD and XML Schema for providing semantics and validation."
##     }
##   ]
## }

As a result of the framing, our desired codemeta document appears as the first (and only) graph ([["@graph"]][[1]]) in the resulting document (note that we could have begun with a flat file that had triples from a collection of multiple codemeta.json files, which would have corresponded to multiple graphs all sharing the same context, hence we must specify which graph here). Importantly, the codemeta file now has the desired tree structure we requested:

codemeta <- fromJSON(framed, FALSE)[["@graph"]][[1]]
codemeta$agents[[1]]$email

## [1] "slaughter@nceas.ucsb.edu"

A proposed frame for matching the codemeta schema:

# Load the frame from the package.  Alternatively we might want to only use URLs for defining the schema and frame schema
frame_schema <- paste(readLines(system.file("schema/frame_schema.json", package="codemetar")),collapse = "\n")

## Apply the frame and render as JSON
framed <- jsonld_frame(flat, frame_schema)
obj <- fromJSON(framed, FALSE)

## Output of frame always contains the context and 1 or more graphs separately; we add the context back to our graph to get codemeta.json:
codemeta <- obj[["@graph"]][[1]]
codemeta$`@context` <- obj$`@context`

## Write out as JSON and validate
write_json(codemeta, "ex.json", pretty = TRUE, auto_unbox = TRUE)
json_validate("ex.json", schema, verbose = TRUE)

## [1] FALSE
## attr(,"errors")
##   field                   message
## 1  data has additional properties

Currently, validation fails becuase all nodes get ids in the course of being framed (actually in the course of being flattened, which would happen implicitly anway since framing always works on flattened data). The schema file does not permit additional fields, so this raises the validation flag being shown.

Carl Boettiger

2017-04-06

Exploring the transformations

A proposed frame for matching the codemeta schema:

Contents