Awesome Open Source
Awesome Open Source

unibits | Reveal the Unicode [version] [ci]

Ruby library and CLI command that visualizes various Unicode and ASCII/single byte encodings in the terminal:

  • Makes analyzing encodings easier
  • Helps you with debugging strings
  • Highlights invalid/special/blank bytes/characters/codepoints
  • Supports UTF-8, UTF-16LE/UTF-16BE, UTF-32LE/UTF-32BE, ISO-8859-X, Windows-125X, IBMX, CP85X, macX, TIS-620/Windows-874, KOI8-R/KOI8-U, 7-Bit ASCII/GB1988, and arbitrary BINARY data

Color Coding

Each byte of the given string is highlighted using the following mechanism (characters -> codepoints):

  • Red for invalid bytes
  • Light blue for blanks
  • Blue for control characters
  • Non-control formatting characters in pink
  • Green for marks (Unicode only)
  • Orange for unassigned codepoints
  • Lighter orange for unassigned codepoints which are also ignorable
  • Random color for all other codepoints

The same colors are used in the higher-level companion tool uniscribe.

Setup

Make sure you have Ruby installed and installing gems works properly. Then do:

$ gem install unibits

Usage

Pass the string to debug to unibits:

From CLI

$ unibits " Idiosyncrtic "

From Ruby

require 'unibits/kernel_method'
unibits " Idiosyncrtic "

Advanced Options

unibits takes some optional options:

  • encoding (e): The encoding of the given string (uses the string's default encoding if none given)
  • convert (c): An encoding the string should be converted to before visualizing it
  • stats: Whether to show a short stats header (default: true), you can deactivate on the CLI with --no-stats
  • wide-ambiguous: Treat characters of ambiguous width as 2 spaces instead of 1 (more info)
  • width (w): Set a custom column width, if not set, unibits will retrieve it from the terminal or just use 80

Examples of Valid Encodings

UTF-8

CLI: $ unibits -e utf-8 -c utf-8 " Idiosyncrtic "

Ruby: unibits " Idiosyncrtic ", encoding: 'utf-8', convert: 'utf-8'

Screenshot UTF-8

UTF-16LE

CLI: $ unibits -e utf-8 -c utf-16le " Idiosyncrtic "

Ruby: unibits " Idiosyncrtic ", encoding: 'utf-8', convert: 'utf-16le'

Screenshot UTF-16LE

UTF-32BE

CLI: $ unibits -e utf-8 -c utf-32be " Idiosyncrtic "

Ruby: unibits " Idiosyncrtic ", encoding: 'utf-8', convert: 'utf-32be'

Screenshot UTF-32BE

BINARY

CLI: $ unibits -e binary " Idiosyncrtic "

Ruby: unibits " Idiosyncrtic ", encoding: 'binary'

Screenshot BINARY

ASCII

CLI: $ unibits -e utf-8 -c ascii "ascii"

Ruby: unibits "ascii", encoding: 'utf-8', convert: 'ascii'

Screenshot ASCII

Examples of Invalid Encodings

UTF-8

Example in Ruby: unibits "unexpected \x80 | not enough \xF0\x9F\x8C | overlong \xE0\x81\x81 | surrogate \xED\xA0\x80 | too large \xF5\x8F\xBF\xBF"

Screenshot invalid UTF-8

ASCII

Example in Ruby: unibits " Idiosyncrtic ", encoding: 'ascii'

Screenshot invalid ASCII

Notes

More info

Related gems

Lots of thanks to @damienklinnert for the motivation and inspiration required to build this!

Copyright (C) 2017-2021 Jan Lelis https://janlelis.com. Released under the MIT license.

Related Awesome Lists
Top Programming Languages
Top Projects

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Ruby (222,126
Command Line (131,487
Terminal (17,898
Character (14,786
Encoding (6,545
Unicode (4,458
Ascii (4,154
Debugging Tool (582
Utf 8 (272
Codepoints (173
Ruby Cli (86
Utf 16 (53
Utf 32 (27