global unicode
There are four string
-like tables of functions:
ascii
, latin1
, utf8
and grapheme
.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
fields
unicode.ascii
ascii
is single-byte like string
, but use the unicode table for upper/lower and character classes.
ascii
does not touch bytes > 127 on upper/lower.
ascii
can be used as locale-independent string
replacement.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
unicode.latin1
latin1
is single-byte like string
, but uses the unicode table for upper/lower and character classes.
latin1
can be used as locale-independent string
replacement.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
unicode.grapheme
grapheme
takes care of grapheme clusters, which are characters followed by
โgrapheme extensionโ characters (Mn+Me) like combining diacritical marks.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!
unicode.utf8
utf8
operates on UTF-8 sequences as of RFC 3629:
1 byte 0-7F, 2 byte 80-7FF, 3 byte 800-FFFF, 4 byte 1000-10FFFF
(not exclusing UTF-16 surrogate characters)
Any byte not part of such a sequence is treated as it's (Latin-1) value.
๐ฑ Types incomplete or incorrect? ๐ Please contribute!