Display Unicode in Arduino
This instructables show how to display Unicode text in Arduino.
Supplies
Any Arduino dev board with Arduino_GFX supported display.
Ref.:
https://github.com/moononournation/Arduino_GFX
Unicode & UTF-8
Unicode defines 144k+ characters covering 159 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.
Unicode can be implemented by different character encodings. The Unicode standard defines Unicode Transformation Formats (UTF): UTF-8, UTF-16, and UTF-32, and several other encodings.
For better backward compatible Reason Arduino IDE, most recent OS and web page using UTF-8 encoding.
UTF-8 was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well.
Ref.:
https://en.wikipedia.org/wiki/Unicode
https://en.wikipedia.org/wiki/UTF-8
https://en.wikipedia.org/wiki/ASCII
Why Need UTF-8?
Some projects can present well only using ASCII characters without Unicode.
And some require Unicode for multiple languages support, e.g.:
- Bluetooth device that showing your mobile notification
- RSS reader display that feed latest news
- Domestic weather report panel
- Social network comments dashboard
- ebook
- and more text displaying projects
Extended ASCII
Arduino_GFX inherited from Adafruit_GFX, default using a classic fixed-space bitmap font since Adafruit_GFX 1.0. This font called glcdfont, sized at 5 x 7 pixels, containing 128 ASCII characters and 128 Extended ASCII characters. You can view all the characters in AsciiTable example.
This is the story before enable UTF-8 encoding, Arduino_GFX can toggle UTF-8 encoding by function:
gfx->setUTF8Print(true);
After enabled UTF-8 encoding, the extended ASCII characters cannot be used. But you can use the corresponding UTF-8 encoded characters instead.
For example, printing the degree celsius sign in extended ASCII is:
gfx->print("\xF8""C");
Since Arduino IDE can direct using UTF-8 encoding string, so printing same sign in UTF-8 is:
gfx->print("°C");
Or:
gfx->print("℃");
Depends on which character glyph included in the selected UTF-8 font file.
Font Data Size
As mentioned, unicode containing over 144k characters, it is not easy to pack all in an Arduino program.
Unifont is one of font type that containing most common defined UTF-8 characters. In latest unifont_jp-14.0.02 version, it contains 57389 glyphs, and the BCF format font file sized 9.4 MB.
Common AVR family dev board only have 32 KB flash store the program; ESP8266 has 4 MB flash but still limited the program to around 1 MB; RTL8720DN can store 2 MB program; ESP32 Huge APP mode can store 3 MB program; Raspberry Pi Pico can store 2 MB program (some variations can store up to 16 MB).
Ref.:
https://en.wikipedia.org/wiki/GNU_Unifont
http://unifoundry.com/pub/unifont/unifont-14.0.02/font-builds/
U8g2 Font
Arduino_GFX adopted U8g2 font format as UTF-8 solution. U8g2 font support UTF-8 encoding, and also U8g2 provide some tools to convert font file to Arduino source file.
bdfconv is one of the U8g2 provided tools, it can convert the unifont bdf font file to Arduino source file. The output binary is in compressed format and also bdfconv can select the encoding range to output, both feature can reduce the data size.
Ref.:
https://github.com/olikraus/u8g2/wiki/u8g2fontformat
https://github.com/olikraus/u8g2/tree/master/tools/font/bdfconv
Select Font Subset
Since we cannot simply squeeze a full set of Unifont glyphs into limited program space, we need select a subset of glyphs that will be used in specific project.
U8g2 already prepared lots of unifont subset for various languages, e.g.:
- u8g2_font_unifont_t_polish
- u8g2_font_unifont_t_vietnamese1
- u8g2_font_unifont_t_chinese2
- u8g2_font_unifont_t_japanese1
- u8g2_font_unifont_t_korean1
Some languages still cannot fit all glyphs in Arduino, so it has different size subsets for different requirements, e.g. Chinese font has 3 subsets:
- u8g2_font_unifont_t_chinese1 - sized 14,178 bytes
- u8g2_font_unifont_t_chinese2 - sized 20,225 bytes
- u8g2_font_unifont_t_chinese3 - sized 37,502 bytes
You can refer U8g2 Github Wiki for more details:
https://github.com/olikraus/u8g2/wiki/fntgrpunifont
Arduino_GFX Prepared Font Files
As mentioned in previous steps, some MCU can store program size up to 1-3 MB. We can tailor-made a font file that can display as much glyphs as possible. Here are some extra font files prepared in Arduino_GFX:
- u8g2_font_unifont_h_utf8
- u8g2_font_unifont_t_chinese
- u8g2_font_unifont_t_chinese4
- u8g2_font_unifont_t_cjk
The source BDF font bitmap is using unifont_jp-14.0.02 and the converting tool is U8g2 provided bdfconv.
Custom Font: U8g2_font_unifont_h_utf8
This font included all glyphs in unifont_jp-14.0.02. Number of Glyph: 57,389 Data size: 2,250,360 bytes Converting script: bdfconv -v -f 1 -b 1 -m "0-1114111" unifont_jp-14.0.02.bdf -o u8g2_font_unifont_h_utf8.h -n u8g2_font_unifont_h_utf8
Note:
Since the font data itself is over 2 MB, only ESP32 family Huge app mode can store the program. Some specific version of Raspberry Pi Pico have more than 2 MB flash but I have not tested it yet.
Custom Font: U8g2_font_unifont_t_chinese
This font included all Chinese character range glyphs.
Number of Glyph: 22,145
Data Size: 979,557 bytes
Converting script:
bdfconv -v -f 1 -m "32-127,11904-12351,19968-40959,63744-64255,65280-65376" unifont_jp-14.0.02.bdf -o u8g2_font_unifont_t_chinese.h -n u8g2_font_unifont_t_chinese
Custom Font: U8g2_font_unifont_t_chinese4
Since ESP8266 have 1 MB program size limit, all Chinese characters still cannot fit in it. It is required another subset narrow down to common used character only.
The common used characters list came from 常用國字標準字體表 in 字集 and 字表:中国常用字 in GlyphWiki.
Number of Glyph: 7,199
Data Size: 298,564 Bytes
Converting script:
bdfconv -v -f 1 -M common.txt unifont_jp-14.0.02.bdf -o u8g2_font_unifont_t_chinese4.h -n u8g2_font_unifont_t_chinese4
Custom Font: U8g2_font_unifont_t_cjk
This font contains all Chinese, Japanese and Korean characters. Those 3 languages shared 92,865 CJK Unified Ideographs, so it is handy that can use one font file for display 3 different languages.
Number of Glyph: 41364
Data Size: 1,704,862 Bytes
Converting script:
bdfconv -v -f 1 -m "32-127,4352-4607,11904-12255,12288-19903,19968-40943,43360-43391,44032-55203,55216-55295,63744-64255,65072-65103,65280-65519" unifont_jp-14.0.02.bdf -o u8g2_font_unifont_t_cjk.h -n u8g2_font_unifont_t_cjk
Ref.:
https://en.wikipedia.org/wiki/CJK_Unified_Ideographs
https://stackoverflow.com/questions/56310609/what-the-chinese-japanese-and-korean-characters-are-in-unicode
Software Preparation
Arduino IDE
Download and install Arduino IDE if you are not yet do it:
https://www.arduino.cc/en/main/software
Arduino_GFX Library
Open Arduino IDE Library Manager by select "Tools" menu -> "Manager Libraries...". Search "GFX for various displays" and press "install" button.
You may refer my previous instructables for more information about Arduino_GFX.
Unicode Example
Arduino_GFX provided various Unicode example in U8g2Font sub folder. In Arduino IDE, select "File" menu -> "Examples" -> "GFX Library for Arduino" -> "U8g2Font". 4 out of 5 examples are Unicode examples:
- U8g2FontPrintUTF8 - print Hello World in various languages with U8g2 built-in fonts
- U8g2FontUTF8Chinese - print a sample Chinese article with the font file u8g2_font_unifont_t_chinese
- U8g2FontUTF8FullCJK - print a simple greeting message in Chinese, Japanese and Korean with the font file u8g2_font_unifont_t_cjk
- U8g2FontUTF8FullUnifont - print Hello World in 74 languages with the font file u8g2_font_unifont_h_utf8
- U8g2RssReader - print an online RSS feed message with font file u8g2_font_unifont_t_chinese4
Happy Texting!
Now your Arduino projects have just broken the ASCII text limit! Enjoy!