2024-09-12 03:19:32 +00:00
|
|
|
The utf8data.c file in this directory is generated from the Unicode
|
2019-04-25 17:59:17 +00:00
|
|
|
Character Database for version 12.1.0 of the Unicode standard.
|
2019-04-25 17:38:44 +00:00
|
|
|
|
|
|
|
The full set of files can be found here:
|
|
|
|
|
2019-04-25 17:59:17 +00:00
|
|
|
http://www.unicode.org/Public/12.1.0/ucd/
|
|
|
|
|
2019-04-25 17:38:44 +00:00
|
|
|
Individual source links:
|
|
|
|
|
2019-05-12 17:26:08 +00:00
|
|
|
https://www.unicode.org/Public/12.1.0/ucd/CaseFolding.txt
|
|
|
|
https://www.unicode.org/Public/12.1.0/ucd/DerivedAge.txt
|
|
|
|
https://www.unicode.org/Public/12.1.0/ucd/extracted/DerivedCombiningClass.txt
|
|
|
|
https://www.unicode.org/Public/12.1.0/ucd/DerivedCoreProperties.txt
|
|
|
|
https://www.unicode.org/Public/12.1.0/ucd/NormalizationCorrections.txt
|
|
|
|
https://www.unicode.org/Public/12.1.0/ucd/NormalizationTest.txt
|
|
|
|
https://www.unicode.org/Public/12.1.0/ucd/UnicodeData.txt
|
2019-04-25 17:38:44 +00:00
|
|
|
|
|
|
|
md5sums (verify by running "md5sum -c README.utf8data"):
|
|
|
|
|
2019-04-25 17:59:17 +00:00
|
|
|
900e76da1d822a160fd6b8c0b1d70094 CaseFolding.txt
|
|
|
|
131256380bff4fea8ad4a851616f2f10 DerivedAge.txt
|
|
|
|
e731a4089b30002144e107e3d6f8d1fa DerivedCombiningClass.txt
|
|
|
|
a47c9fbd7ff92a9b261ba9831e68778a DerivedCoreProperties.txt
|
|
|
|
fcab6dad15e440879d92f315978f93d3 NormalizationCorrections.txt
|
|
|
|
f9ff1c55a60decf436100f791b44aa98 NormalizationTest.txt
|
|
|
|
755f6af699f8c8d2d958da411f78f6c6 UnicodeData.txt
|
2019-04-25 17:38:44 +00:00
|
|
|
|
|
|
|
sha1sums (verify by running "sha1sum -c README.utf8data"):
|
|
|
|
|
2019-04-25 17:59:17 +00:00
|
|
|
dc9245f6803c4ac99555c361f5052e0b13eb779b CaseFolding.txt
|
|
|
|
3281104f237184cdb5d869e86eb8573678ada7da DerivedAge.txt
|
|
|
|
2f5f995ccb96e0fa84b15151b35d5e2681535175 DerivedCombiningClass.txt
|
|
|
|
5b8698a3fcd5018e1987f296b02e2c17e696415e DerivedCoreProperties.txt
|
|
|
|
cd83935fbc012345d8792d2c704f69497e753835 NormalizationCorrections.txt
|
|
|
|
ea419aae505b337b0d99a83fa83fe58ddff7c19f NormalizationTest.txt
|
|
|
|
dc973c0fc93d6f09d9ab9f70d1c9f89c447f0526 UnicodeData.txt
|
|
|
|
|
2019-04-25 17:38:44 +00:00
|
|
|
|
|
|
|
To update to the newer version of the Unicode standard, the latest
|
|
|
|
released version of the UCD can be found here:
|
|
|
|
|
|
|
|
http://www.unicode.org/Public/UCD/latest/
|
|
|
|
|
unicode: refactor the rule for regenerating utf8data.h
scripts/mkutf8data is used only when regenerating utf8data.h,
which never happens in the normal kernel build. However, it is
irrespectively built if CONFIG_UNICODE is enabled.
Moreover, there is no good reason for it to reside in the scripts/
directory since it is only used in fs/unicode/.
Hence, move it from scripts/ to fs/unicode/.
In some cases, we bypass build artifacts in the normal build. The
conventional way to do so is to surround the code with ifdef REGENERATE_*.
For example,
- 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
- 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")
I rewrote the rule in a more kbuild'ish style.
In the normal build, utf8data.h is just shipped from the check-in file.
$ make
[ snip ]
SHIPPED fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
If you want to generate utf8data.h based on UCD, put *.txt files into
fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
The mkutf8data tool will be automatically compiled to generate the
utf8data.h from the *.txt files.
$ make REGENERATE_UTF8DATA=1
[ snip ]
HOSTCC fs/unicode/mkutf8data
GEN fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
I renamed the check-in utf8data.h to utf8data.h_shipped so that this
will work for the out-of-tree build.
You can update it based on the latest UCD like this:
$ make REGENERATE_UTF8DATA=1 fs/unicode/
$ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped
Also, I added entries to .gitignore and dontdiff.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-28 17:45:36 +00:00
|
|
|
Then, build under fs/unicode/ with REGENERATE_UTF8DATA=1:
|
2019-04-25 17:38:44 +00:00
|
|
|
|
unicode: refactor the rule for regenerating utf8data.h
scripts/mkutf8data is used only when regenerating utf8data.h,
which never happens in the normal kernel build. However, it is
irrespectively built if CONFIG_UNICODE is enabled.
Moreover, there is no good reason for it to reside in the scripts/
directory since it is only used in fs/unicode/.
Hence, move it from scripts/ to fs/unicode/.
In some cases, we bypass build artifacts in the normal build. The
conventional way to do so is to surround the code with ifdef REGENERATE_*.
For example,
- 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
- 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")
I rewrote the rule in a more kbuild'ish style.
In the normal build, utf8data.h is just shipped from the check-in file.
$ make
[ snip ]
SHIPPED fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
If you want to generate utf8data.h based on UCD, put *.txt files into
fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
The mkutf8data tool will be automatically compiled to generate the
utf8data.h from the *.txt files.
$ make REGENERATE_UTF8DATA=1
[ snip ]
HOSTCC fs/unicode/mkutf8data
GEN fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
I renamed the check-in utf8data.h to utf8data.h_shipped so that this
will work for the out-of-tree build.
You can update it based on the latest UCD like this:
$ make REGENERATE_UTF8DATA=1 fs/unicode/
$ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped
Also, I added entries to .gitignore and dontdiff.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-04-28 17:45:36 +00:00
|
|
|
make REGENERATE_UTF8DATA=1 fs/unicode/
|
2019-04-25 17:38:44 +00:00
|
|
|
|
2024-09-12 03:19:32 +00:00
|
|
|
After sanity checking the newly generated utf8data.c file (the
|
2019-04-25 17:59:17 +00:00
|
|
|
version generated from the 12.1.0 UCD should be 4,109 lines long, and
|
|
|
|
have a total size of 324k) and/or comparing it with the older version
|
2024-09-12 03:19:32 +00:00
|
|
|
of utf8data.c_shipped, rename it to utf8data.c_shipped.
|
2019-04-25 17:38:44 +00:00
|
|
|
|
|
|
|
If you are a kernel developer updating to a newer version of the
|
|
|
|
Unicode Character Database, please update this README.utf8data file
|
|
|
|
with the version of the UCD that was used, the md5sum and sha1sums of
|
2024-09-12 03:19:32 +00:00
|
|
|
the *.txt files, before checking in the new versions of the utf8data.c
|
2019-04-25 17:38:44 +00:00
|
|
|
and README.utf8data files.
|