Things worth sharing

UTF-8 is a, Text file encoding that stands for 8-bit UCS transformation format. (Here UCS stands for Universal Character Set which is another name for Unicode.)

The key idea of UTF-8 is that in can represent all available characters of Unicode while also read Text files that were originally encoded in ASCII. It does this by forming a form of superset of the ASCII encoding. If a byte starts with a 0 it is interpreted as a plain ASCII character. If it starts with a 1 it represents a non-ASCII character. Those non-ASCII characters are represented as multi byte characters, as Unicode needs 21 bits at the moment. This way, while Unicode and ASCII use different number of bits UTF-8 makes them compatible. In addition to not waste space, the most common Western characters are stored using fewer bytes.

References

https://manderc.com/concepts/ascii/index_eng.php

UTF-8 Encoding

References