Beware of this encoding caveat when working with Jena.
When you work with the incomparable Jena command line utilities, e.g. to output a RDF/XML file as Turtle, my colleague got the following error message when processing the resultant TTL file further:
CRITICAL: Parsing failed. Exception: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
0xff byte is invalid (and not as I thought originally, part of the UTF-8 BOM — that’s
0xEF,0xBB,0xBF) in UTF-8.
It turned out to be due to the UTF-16 LE internal encoding of Powershell, as they called it thus:
$ rdfxml --formatted=TTL abc123.rdf > abc123.ttl
The solution was to run the same command in
But you can also use another command line interface that uses UTF-8 on Windows, perhaps
Git-Bash.exe or a WSL command line interface.