Encoding Basics
August 8, 2020
ASCII
- Character to number mapping
- 0-32 are non-written symbols like breaks, sounds, etc.
-
33 is the first written character and is the exclamation point
(!)
- There are 128 total ASCII characters
UTF-8
-
UTF-8 is capable of encoding all 1,112,064 valid character code
points in Unicode using one to four one-byte code units
-
As of November 2020, UTF-8 accounts on average for 95.8% of all
web pages
-
The first 128 characters correspond 1 to 1 with ASCII
characters, a big reason for its popularity