library(codemetar)
library(jsonlite)
library(jsonvalidate)
library(jsonld)
We begin with an example of a relatively thorough codemeta.json
file. Note that the original file conforms to our schema, meaning that the data conforms to the expected tree structure.
ex <- system.file("examples/full-codemeta.json", package = "codemetar")
schema <- system.file("schema/codemeta_schema.json", package = "codemetar")
json_validate(ex, schema)
## [1] TRUE
A predictable tree structure is very useful in programmatic applications of the data, e.g. we can find the email of the first author as:
codemeta <- read_json(ex)
codemeta$agents[[1]]$email
## [1] "slaughter@nceas.ucsb.edu"
However, this puts the burden on the data provider to be familiar with and conform to our schema. It is important to bear in mind that some of these restrictions are, in a sense, arbitrary: the data could be formatted in some different manner without any change to it’s information content. A different format may be more convenient for the user to provide, and a different application may prefer a different tree structure from the one imposed on the data by our schema. With JSON-LD framing, we can avoid putting this burden on the data provider and instead allow the developer/consumer to request the data in whatever tree structure is most convenient.
To illustrate this, let’s first transform the data into an equivalent alternative representation with a flat structure:
doc <- paste(readLines(ex), collapse = "\n")
flat <- jsonld_flatten(doc)
Note that though this is a lossless conversion (no information is lost, and we can transform back to the original format if requested), the resulting flat structure does not confrom to our schema:
json_validate(flat, schema)
## [1] FALSE
and consequently, we can no longer assume the prior structure to extract information of interest:
codemeta <- fromJSON(flat, simplifyVector = FALSE) # same as writeLines(flat, "flat.json"); read_json("flat.json")
codemeta$agents[[1]]$email
## NULL
This looks like bad news. Fortunately, we can reconstruct the data in whatever format we need using the appropriate transformations:
Expanding the flat document doesn’t change the structure at all, but expanding from the original document gives a different tree:
expanded <- jsonld_expand(flat)
identical(flat, expanded)
## [1] TRUE
expanded2 <- jsonld_expand(doc)
identical(expanded, expanded2)
## [1] FALSE
Using a simple frame:
ex_frame <- '{
"@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
"@type": "SoftwareSourceCode",
"@explicit": "true",
"title": {},
"description": {},
"agents": {
"@explicit": "true",
"name": {},
"email": {}
}
}'
Framing the flat file and the expanded version of the flat file give
framed <- jsonld_frame(expanded, ex_frame)
framed2 <- jsonld_frame(expanded2, ex_frame)
identical(framed, framed2)
## [1] TRUE
Likewise, we still get the same thing had we started with the flat
file or the original doc
:
identical(framed, jsonld_frame(flat, ex_frame))
## [1] TRUE
identical(framed, jsonld_frame(doc, ex_frame))
## [1] TRUE
Note this resulting document contains only the information explicitly requested in the above frame (since we used "@explicit": "true"
),
framed
## {
## "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
## "@graph": [
## {
## "@id": "_:b0",
## "@type": "SoftwareSourceCode",
## "title": "CodeMeta, a minimal convention for software metadata",
## "agents": [
## {
## "@id": "http://orcid.org/0000-0002-2192-403X",
## "@type": "person",
## "email": "slaughter@nceas.ucsb.edu",
## "name": "Peter Slaughter"
## },
## {
## "@id": "http://orcid.org/0000-0002-3957-2474",
## "@type": "organization",
## "email": "info@ucop.edu",
## "name": "University of California, Santa Barbara"
## }
## ],
## "description": "Codemeta is a metadata content standard for software. It includes schemas in JSON-LD and XML Schema for providing semantics and validation."
## }
## ]
## }
As a result of the framing, our desired codemeta document appears as the first (and only) graph ([["@graph"]][[1]]
) in the resulting document (note that we could have begun with a flat file that had triples from a collection of multiple codemeta.json
files, which would have corresponded to multiple graphs all sharing the same context, hence we must specify which graph here). Importantly, the codemeta file now has the desired tree structure we requested:
codemeta <- fromJSON(framed, FALSE)[["@graph"]][[1]]
codemeta$agents[[1]]$email
## [1] "slaughter@nceas.ucsb.edu"
# Load the frame from the package. Alternatively we might want to only use URLs for defining the schema and frame schema
frame_schema <- paste(readLines(system.file("schema/frame_schema.json", package="codemetar")),collapse = "\n")
## Apply the frame and render as JSON
framed <- jsonld_frame(flat, frame_schema)
obj <- fromJSON(framed, FALSE)
## Output of frame always contains the context and 1 or more graphs separately; we add the context back to our graph to get codemeta.json:
codemeta <- obj[["@graph"]][[1]]
codemeta$`@context` <- obj$`@context`
## Write out as JSON and validate
write_json(codemeta, "ex.json", pretty = TRUE, auto_unbox = TRUE)
json_validate("ex.json", schema, verbose = TRUE)
## [1] FALSE
## attr(,"errors")
## field message
## 1 data has additional properties
Currently, validation fails becuase all nodes get ids in the course of being framed (actually in the course of being flattened, which would happen implicitly anway since framing always works on flattened data). The schema file does not permit additional fields, so this raises the validation flag being shown.