A Linked data view on blog posts

Tuesday 29 September 2020

With Jekyll and the Liquid templating language, we can easily make a Linked data view on the posts of the blog.

Dit bericht geeft een voorbeeld hoe Linked data ook werkt met de Liquid sjabloontaal, een onderliggende technologie van deze website. Omwille van het grotere publiek dat gebruik maakt van de publicatiesoftware Jekyll is het in het Engels geschreven.

One of my endless list of issues with my website, is the lack of structured and Linked data in its architecture. Sure, as a static CMS it works fine, and I’m really proud of the trilingual design I hacked onto Jekyll, but: as Linked data export in my day-to-day job, I should incorporate some of it on my private website.

So, I decided to first start with a simple meta-data export, that embraces Linked data standards. I felt that, of the several serializations of Linked data, Turtle is easiest to read and write for us humans. Other formats, like JSON-LD and XML, are perhaps more widely supported, but with important caveats: JSON is tedious to write, XML moreso, and RDF-XML has some semantics issues (oh irony!). As the different serializations encode the exact same information, a data consumer may also convert the resultant Turtle-file to their favorite syntax.

As for the information modelling, I chose to use Schema.org ontology. Founded by several search engine companies for SEO-purposes, it is a great generic ontology for many knowledge projects. Its key concepts for this excersice are schema:Blog for the blogroll and schema:BlogPosting for the individual posts.

---
@prefix schema: <http://schema.org/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

@base <{{site.url}}>.

<https://rdmr.eu/> a schema:Blog ;
    schema:name "Rdmr.eu" ;
    .

I started with a new file called posts.ttl that can be put anywhere in the Jekyll project. Note the two initial lines with triple dashes (---). These tell the Jekyll processor that this file contains Liquid markup and that those tags needs to be rendered first.

The prefix directives are used for the referred ontologies or schemas: here I use aforementioned Schema.org and the XML Schema Definition, explained later on. The @base directive provide a base for relative URIs. The actual URI is rendered by Liquid, getting its value from the site settings. That makes this template flexible for use in Jekyll templates, as well.

The first triples it contains is static information. In this example, just the blog roll, typed as Schema.org’s Blog. I prefer to start hand-written Turtle files with some basic information, with statements that declare the files origin or purpose. Here, we’re talking about a blogroll, yet where are the posts?

# Metadata on the blogposts
{% for p in site.posts -%}
<{{p.url}}> a schema:BlogPosting ;
    schema:author <https://rdmr.eu/#me> ;
    schema:name "{{p.title}}"@nl ;
    schema:datePublished "{{p.date | date_to_xmlschema}}"^^xsd:dateTime 
    .
{% endfor %}

Now for the actual data, we use the Schema.org’s BlogPosting class to type the individual posts. Those are generated (unsorted) by the for p in site.posts template. Usually, I would explicitely sort such generators, but considering Linked data has no concept of order, this complexity was not needed.

They get some author info (in this example something static) and dynamically retrieve the post title. Note that we need a Liquid filter on the post’s date to render it as a XML-Schema compatible date. Also note that, although not obligatory, I’ve typed the string literal as a xsd:dateTime. That can help data consumers correctly process this statement.

# Linking the blogposts to the blog
{% for p in site.posts -%}
<https://rdmr.eu/> schema:blogPost <{{p.url}}> .
{%- endfor -%}

Then finally, the blog postings are linked to the previously defined blogroll.

I hope this post gives some pointers towards implementing data views on Jekyll sites. If have any comments, drop me a line on Twitter @redmer.