Changes between Version 2 and Version 3 of CharacterSets
- Timestamp:
- 04/15/05 10:46:29 (8 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
CharacterSets
v2 v3 4 4 5 5 6 [ wiki:CharacterSets#stopwaffle Skip to the advice.]6 [http://xinha.python-hosting.com/wiki/CharacterSets#stopwaffle Skip to the advice.] 7 7 8 8 … … 41 41 == UTF-8 == 42 42 43 UTF-8 is just a character encoding, it says "take a string of bytes, do this algorithm over them, and you'll get a list of numbers which represent characters in the UNICODE character set". The special thing about U NICODE is that it leaves the lower (127 characters) of ASCII intact (remembering that these characters are unchanged in UNICODE), so for most english text, UTF-8 encoded UNICODE is just the same asASCII (which as you might expect is quite useful).43 UTF-8 is just a character encoding, it says "take a string of bytes, do this algorithm over them, and you'll get a list of numbers which represent characters in the UNICODE character set". The special thing about UTF-8 is that it leaves the lower (127 characters) of ASCII intact (remembering that these characters are unchanged in UNICODE), so for most english text, UTF-8 encoded UNICODE is identical to plain old 7-bit ASCII (which as you might expect is quite useful). 44 44 45 45 Slowly but surely the world is progessing to ONE character set (UNICODE) and ONE encoding (UTF-8), gone will be ASCII, BIG-5, SHIFT-JIS, and all those other character sets and encodings, never to darken our doorstep again. 46 46 47 '''The important thing is''' - UTF-8 is ONLY used to get characters INTO Javascript, once it's there, that's it, it's just a list of numbers, nothing more, nothing less, just a list of numbers which represent characters in the UNICODE character set. Not BIG-5, not ASCII, just UNICODE.47 '''The important thing is''' - UTF-8, and any other character encoding, is ONLY used to get characters IN TO Javascript, once it's there, that's it, it's just a list of numbers which are indexes into the big UNICODE character tables, nothing more, nothing less. Not BIG-5, not ASCII, not even UTF-8 anymore, it's just UNICODE index numbers. 48 48 49 49 {{{
