Skip to content

global unicode

There are four string-like tables of functions: ascii, latin1, utf8 and grapheme.

๐Ÿ˜ฑ Types incomplete or incorrect? ๐Ÿ™ Please contribute!


fields


unicode.ascii


unicode.ascii: table

ascii is single-byte like string, but use the unicode table for upper/lower and character classes. ascii does not touch bytes > 127 on upper/lower.

ascii can be used as locale-independent string replacement.

๐Ÿ˜ฑ Types incomplete or incorrect? ๐Ÿ™ Please contribute!

unicode.latin1


unicode.latin1: table

latin1 is single-byte like string, but uses the unicode table for upper/lower and character classes.

latin1 can be used as locale-independent string replacement.

๐Ÿ˜ฑ Types incomplete or incorrect? ๐Ÿ™ Please contribute!

unicode.grapheme


unicode.grapheme: table

grapheme takes care of grapheme clusters, which are characters followed by โ€œgrapheme extensionโ€ characters (Mn+Me) like combining diacritical marks.

๐Ÿ˜ฑ Types incomplete or incorrect? ๐Ÿ™ Please contribute!

unicode.utf8


unicode.utf8: table

utf8 operates on UTF-8 sequences as of RFC 3629: 1 byte 0-7F, 2 byte 80-7FF, 3 byte 800-FFFF, 4 byte 1000-10FFFF (not exclusing UTF-16 surrogate characters) Any byte not part of such a sequence is treated as it's (Latin-1) value.

๐Ÿ˜ฑ Types incomplete or incorrect? ๐Ÿ™ Please contribute!