Introducing uu: a tool for inspecting Unicode text
I wrote a small tool called uu
which can be used to examine streams of Unicode text.
The subcommand uu inspect
will read from STDIN, parse it as UTF-8, and print a line for each code point it finds, with details about that code point.
$ echo 'V = ⁴⁄₃πr³ 🤔' | uu inspect
GLYPH CODE POINT UTF-8 BYTES NAME BLOCK CATEGORY
V U+0056 56 LATIN CAPITAL LETTER V Basic Latin Uppercase Letter
U+0020 20 SPACE Basic Latin Space
= U+003D 3d EQUALS SIGN Basic Latin Math Symbol
U+0020 20 SPACE Basic Latin Space
⁴ U+2074 e2 81 b4 SUPERSCRIPT FOUR Superscripts and Subscripts Other Numeric
⁄ U+2044 e2 81 84 FRACTION SLASH General Punctuation Math Symbol
₃ U+2083 e2 82 83 SUBSCRIPT THREE Superscripts and Subscripts Other Numeric
π U+03C0 cf 80 GREEK SMALL LETTER PI Greek and Coptic Lowercase Letter
r U+0072 72 LATIN SMALL LETTER R Basic Latin Lowercase Letter
³ U+00B3 c2 b3 SUPERSCRIPT THREE Latin-1 Supplement Other Numeric
U+0020 20 SPACE Basic Latin Space
🤔 U+1F914 f0 9f a4 94 THINKING FACE Supplemental Symbols and Pictographs Other Symbol
^J U+000A 0a <LINE FEED> Basic Latin Control
The subcommand uu lookup
takes a code point as a command line argument and prints a table of information about it.
$ uu lookup U+203D
Glyph: ‽
Code point: U+203D
Name: INTERROBANG
Block: General Punctuation
Category: Other Punctuation (Po)
Bidirectional Class: OtherNeutral (ON)
Added in version: 1.1.0
UTF-8: e2 80 bd
UTF-16BE: 20 3d
UTF-16LE: 3d 20
UTF-32BE: 00 00 20 3d
UTF-32LE: 3d 20 00 00
I wrote an early version of this tool in 2018, while working on a project to pre-process human language text to make it suitable for input into a text-to-speech ML model. I was using Tim Whitlock’s Unicode character inspector app to examine the sample inputs I was working with, and wishing for a command line tool that offered similar features, so I hacked together a quick Python script to do the job.
I recently got around to rewriting the program and releasing it under the ISC license. It’s now a stand-alone executable with no dependencies. You can get the source code on Github (you’ll need a Rust toolchain installed to build it). Alternatively, if you’re on macOS you can install it with Homebrew:
brew install jake-low/tools/uu
I hope someone else out there will find it useful.