|
最初由 ebizs 发布
[B]
日语,韩语,阿拉伯语呢?这些我都不知道啊 [/B]
你用escape函数可以取得字符的unicode编码(大致就是%uxxxx, 四个x表示4个16进制位),然后根据unicode编码分配表就可以判断是哪个字符集的(就是看前两个16进制位),分配表简单如下:
_______ ___________________________________________________________________
Row(s) Content (script, other groups of characters, reserved area)
_______ ___________________________________________________________________
======= A-ZONE (alphabetical characters and symbols) =======================
00 (Control characters,) Basic Latin, Latin-1 Supplement (=ISO/IEC 8859-1)
01 Latin Extended-A, Latin Extended-B
02 Latin Extended-B, IPA Extensions, Spacing Modifier Letters
03 Combining Diacritical Marks, Basic Greek, Greek Symbols and Coptic
04 Cyrillic
05 Armenian, Hebrew
06 Basic Arabic, Arabic Extended
07--08 (Reserved for future standardization)
09 Devanagari, Bengali
0A Gumukhi, Gujarati
0B Oriya, Tamil
0C Telugu, Kannada
0D Malayalam
0E Thai, Lao
0F (Reserved for future standardization)
10 Georgian
11 Hangul Jamo
12--1D (Reserved for future standardization)
1E Latin Extended Additional
1F Greek Extended
20 General Punctuation, Super/subscripts, Currency, Combining Symbols
21 Letterlike Symbols, Number Forms, Arrows
22 Mathematical Operators
23 Miscellaneous Technical Symbols
24 Control Pictures, OCR, Enclosed Alphanumerics
25 Box Drawing, Block Elements, Geometric Shapes
26 Miscellaneous Symbols
27 Dingbats
28--2F (Reserved for future standardization)
30 CJK Symbols and Punctuation, Hiragana, Katakana
31 Bopomofo, Hangul Compatibility Jamo, CJK Miscellaneous
32 Enclosed CJK Letters and Months
33 CJK Compatibility
34--4D Hangul
======= I-ZONE (ideographic characters) ===================================
4E--9F CJK Unified Ideographs
======= O-ZONE (open zone) ================================================
A0--DF (Reserved for future standardization)
======= R-ZONE (restricted use zone) ======================================
E0--F8 (Private Use Area)
F9--FA CJK Compatibility Ideographs
FB Alphabetic Presentation Forms, Arabic Presentation Forms-A
FC--FD Arabic Presentation Forms-A
FE Combining Half Marks, CJK Compatibility Forms, Small Forms, Arabic-B
FF Halfwidth and Fullwidth Forms, Specials |
|