FYI on Jena from Powershell
Beware of this encoding caveat when working with Jena.
When you work with the incomparable Jena command line utilities, e.g. to output a RDF/XML file as Turtle, my colleague got the following error message when processing the resultant TTL file further:
CRITICAL:
Parsing failed.
Exception: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
The 0xff
byte is invalid (and not as I thought originally, part of the UTF-8 BOM — that’s 0xEF,0xBB,0xBF
) in UTF-8.
It turned out to be due to the UTF-16 LE internal encoding of Powershell, as they called it thus:
$ rdfxml --formatted=TTL abc123.rdf > abc123.ttl
The solution was to run the same command in cmd.exe
.
But you can also use another command line interface that uses UTF-8 on Windows, perhaps Git-Bash.exe
or a WSL command line interface.
There is a lot more you can do with the CLI extras from Apache Jena. Bob du Charme recently wrote two blog about them as well (but in case of the JavaScript SPARQL extensions, beware of your Java version).