Encoding Basics

August 8, 2020

ASCII

  • Character to number mapping
  • 0-32 are non-written symbols like breaks, sounds, etc.
  • 33 is the first written character and is the exclamation point (!)
  • There are 128 total ASCII characters

UTF-8

  • UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte code units
  • As of November 2020, UTF-8 accounts on average for 95.8% of all web pages
  • The first 128 characters correspond 1 to 1 with ASCII characters, a big reason for its popularity