Generate RDF quads from a MS Access database
RDF – a.k.a. linked data, a.k.a. linked open data, a.k.a. the technology underbedding knowledge graphs, a.k.a. data meshes – is the best way to integrate data from different source.
Sometimes, such a source is Microsoft Access database.
Therefore I’ve made @rdmr-eu/rdfjs-source-msaccess
to enable MS Access databases as an RDF source in NodeJS code, as well as generate quads from the command line.
That way, Access databases can be used in the wider ecosystem of RDF-JS libraries, like Comunica, which allows us to make federative queries also over Access databases.
I’ve made this for my job where a information management tool initially was developed inside Access, but as the product evolved into a fully-fledged ontology, I needed to find a way to export the information contained therein.
The package builds upon mdb-reader
and provides a RDF-JS Source to be compatible .
The package can be used both as a library in NodeJS, or as a command line tool.
You can try out the latter with npx @rdmr-eu/rdfjs-source-msaccess -i DATABASE.accdb
.
Password-protected databases, alternative base IRIs, output to stdout and a .nq-file are all supported, as well as two different output modes.
Two different modelling modes
In the transformation pipeline we used at work, we initially used an easy quad-structure: the table name was the graph name, the row number the triple subject, the column name the triple predicate and the value was its object. As an example, the following table:
Id (Long) | DateAdded (Date/Time) | ContentBody (Text with markup) | Published (Boolean) |
---|---|---|---|
1 | 10-3-2023 | This is content | Ja |
2 | 9-3-2023 | Old | Nee |
… would generate the following RDF data:
<csv:table/ContentTable> {
<csv:table/ContentTable/row/1> <csv:Id> "1"^^xsd:long ;
<csv:DateAdded> "2023-03-10T00:00:00.000Z"^^xsd:dateTime ;
<csv:ContentBody> "<div>This is <strong>content</strong></div>" ;
<csv:Published> "true"^^xsd:boolean .
<csv:table/ContentTable/row/2> <csv:Id> "2"^^xsd:long ;
<csv:DateAdded> "2023-03-09T00:00:00.000Z"^^xsd:dateTime ;
<csv:ContentBody> "<div><em>Old</em></div>" ;
<csv:Published> "false"^^xsd:boolean .
}
Lately, we’ve been using SPARQL-Anything (if I could write Java and there’d be a good MS Access library for Java, I’d’ve written this library for SPARQL-Anything!) which is very useful to create quads from all kinds of non-RDF data sources.
The modelling paradigm called Facade-X is somewhat different and based on the spreadsheet format, I’ve added another output mode for @rdmr-eu/rdfjs-source-msaccess
to make it resemble Facade-X.
<https://example.org/data#ContentTable> {
[] rdf:type fx:root ;
rdf:_1 _:ContentTable1 ;
rdf:_2 _:ContentTable2 .
_:ContentTable1 xyz:Id "1"^^xsd:long ;
xyz:DateAdded "2023-03-10T00:00:00.000Z"^^xsd:dateTime ;
xyz:ContentBody "<div>This is <strong>content</strong></div>" ;
xyz:Published "true"^^xsd:boolean .
_:ContentTable2 xyz:Id "2"^^xsd:long ;
xyz:DateAdded "2023-03-09T00:00:00.000Z"^^xsd:dateTime ;
xyz:ContentBody "<div><em>Old</em></div>" ;
xyz:Published "false"^^xsd:boolean .
}
I hope @rdmr-eu/rdfjs-source-msaccess
will prove to be as useful for others as it has been for us.
It’s open source licensed under MPL-2 and open for feedback and pull requests at GitHub.