mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
synced 2025-01-07 22:42:04 +00:00
44594c2fbf
Supporting functions for UTF-8 normalization are in utf8norm.c with the header utf8norm.h. Two normalization forms are supported: nfdi and nfdicf. nfdi: - Apply unicode normalization form NFD. - Remove any Default_Ignorable_Code_Point. nfdicf: - Apply unicode normalization form NFD. - Remove any Default_Ignorable_Code_Point. - Apply a full casefold (C + F). For the purposes of the code, a string is valid UTF-8 if: - The values encoded are 0x1..0x10FFFF. - The surrogate codepoints 0xD800..0xDFFFF are not encoded. - The shortest possible encoding is used for all values. The supporting functions work on null-terminated strings (utf8 prefix) and on length-limited strings (utf8n prefix). From the original SGI patch and for conformity with coding standards, the utf8data_t typedef was dropped, since it was just masking the struct keyword. On other occasions, namely utf8leaf_t and utf8trie_t, I decided to keep it, since they are simple pointers to memory buffers, and using uchars here wouldn't provide any more meaningful information. From the original submission, we also converted from the compatibility form to canonical. Changes made by Gabriel: Rebase to Mainline Fix up checkpatch.pl warnings Drop typedefs move out of libxfs Convert from NFKD to NFD Signed-off-by: Olaf Weber <olaf@sgi.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> |
||
---|---|---|
.. | ||
Kconfig | ||
Makefile | ||
README.utf8data | ||
utf8-norm.c | ||
utf8data.h | ||
utf8n.h |
The utf8data.h file in this directory is generated from the Unicode Character Database for version 11.0.0 of the Unicode standard. The full set of files can be found here: http://www.unicode.org/Public/11.0.0/ucd/ Individual source links: http://www.unicode.org/Public/11.0.0/ucd/CaseFolding.txt http://www.unicode.org/Public/11.0.0/ucd/DerivedAge.txt http://www.unicode.org/Public/11.0.0/ucd/extracted/DerivedCombiningClass.txt http://www.unicode.org/Public/11.0.0/ucd/DerivedCoreProperties.txt http://www.unicode.org/Public/11.0.0/ucd/NormalizationCorrections.txt http://www.unicode.org/Public/11.0.0/ucd/NormalizationTest.txt http://www.unicode.org/Public/11.0.0/ucd/UnicodeData.txt md5sums (verify by running "md5sum -c README.utf8data"): 414436796cf097df55f798e1585448ee CaseFolding.txt 6032a595fbb782694456491d86eecfac DerivedAge.txt 3240997d671297ac754ab0d27577acf7 DerivedCombiningClass.txt 2a4fe257d9d8184518e036194d2248ec DerivedCoreProperties.txt 4e7d383fa0dd3cd9d49d64e5b7b7c9e0 NormalizationCorrections.txt c9500c5b8b88e584469f056023ecc3f2 NormalizationTest.txt acc291106c3758d2025f8d7bd5518bee UnicodeData.txt sha1sums (verify by running "sha1sum -c README.utf8data"): 9184727adf7bd20e36312a68581d12ba3ffb9854 CaseFolding.txt 86c55b3eb89de61704da16af9c3f22854f61b57d DerivedAge.txt b615703f62b1dbc5110e91acc3ff8b3789a067cf DerivedCombiningClass.txt f8b07ef116d7dc21a94f26e70178ed2acf8713e9 DerivedCoreProperties.txt a5fafb8998c0b8153a2a58430b8a35c811db0abc NormalizationCorrections.txt 070cdcb00cd4f0860e476750e404c59c2ebe9b25 NormalizationTest.txt 0e060fafb08d6722fbec56d9f9ebe8509f01d0ee UnicodeData.txt To update to the newer version of the Unicode standard, the latest released version of the UCD can be found here: http://www.unicode.org/Public/UCD/latest/ To build the utf8data.h file, from a kernel tree that has been built, cd to this directory (fs/unicode) and run this command: make C=../.. objdir=../.. utf8data.h.new After sanity checking the newly generated utf8data.h.new file (the version generated from the 11.0.0 UCD should be 13,834 lines long, and have a total size of 1104k) and/or comparing it with the older version of utf8data.h, rename it to utf8data.h. If you are a kernel developer updating to a newer version of the Unicode Character Database, please update this README.utf8data file with the version of the UCD that was used, the md5sum and sha1sums of the *.txt files, before checking in the new versions of the utf8data.h and README.utf8data files.