global unicode
There are four string-like tables of functions:
ascii, latin1, utf8 and grapheme.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
fields
unicode.ascii
ascii is single-byte like string, but use the unicode table for upper/lower and character classes.
ascii does not touch bytes > 127 on upper/lower.
ascii can be used as locale-independent string replacement.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
unicode.latin1
latin1 is single-byte like string, but uses the unicode table for upper/lower and character classes.
latin1 can be used as locale-independent string replacement.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
unicode.grapheme
grapheme takes care of grapheme clusters, which are characters followed by
โgrapheme extensionโ characters (Mn+Me) like combining diacritical marks.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
unicode.utf8
utf8 operates on UTF-8 sequences as of RFC 3629:
1 byte 0-7F, 2 byte 80-7FF, 3 byte 800-FFFF, 4 byte 1000-10FFFF
(not exclusing UTF-16 surrogate characters)
Any byte not part of such a sequence is treated as it's (Latin-1) value.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!