Skip to content
Logo

FYI on Jena from Powershell

Wednesday, 9 June 2021. Door: Redmer Kronemeijer.

Beware of this encoding caveat when working with Jena.

When you work with the incomparable Jena command line utilities, e.g. to output a RDF/XML file as Turtle, my colleague got the following error message when processing the resultant TTL file further:

CRITICAL:
  Parsing failed.
  Exception: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

The 0xff byte is invalid (and not as I thought originally, part of the UTF-8 BOM — that’s 0xEF,0xBB,0xBF) in UTF-8. It turned out to be due to the UTF-16 LE internal encoding of Powershell, as they called it thus:

$ rdfxml --formatted=TTL abc123.rdf > abc123.ttl

The solution was to run the same command in cmd.exe. But you can also use another command line interface that uses UTF-8 on Windows, perhaps Git-Bash.exe or a WSL command line interface.

There is a lot more you can do with the CLI extras from Apache Jena. Bob du Charme recently wrote two blog about them as well (but in case of the JavaScript SPARQL extensions, beware of your Java version).

# Permalink
Logo

Redmer Kronemeijer is data-architect bij CROW, waar hij werkt met Linked data.

  • Archief
  • Mee bezig
  • Contact
  • Mastodon
  • GitHub
  • LinkedIn
  • Privacybeleid
  • Impressum
© 2016-2023. Alle rechten voorbehouden.