Locate an Unknown non-ASCII Multi-byte Character in a File


Some program or API is failing because of a rouge multi-byte character in a file. So how do you locate the unknown character without examining every character in the file?

Answer (annoying, but correct)

Every search points to -P with grep, but that's GNU grep not BSD grep that macOS ships with. Get over it and install goddamn GNU grep. 1

$ brew install grep

When brew is done it will inform you that any command (binary) installed will have the prefix “g” for GNU.

All commands have been installed with the prefix "g".

Thus to use GNU grep on macOS type ggrep. To find all occurrences of multi-byte character in badfile.txt input the following

$ ggrep --color='auto' -P -n '[^\x00-\x7F]' badfile.txt

  1. There definitely has to be a better way (something with non-GNU stuff). ↩︎