In the last lesson we showed you how to use numbers in your programs. Now we want to talk about letters and words.
In computer science the term word relates to the CPU . A word is the number of bits that the CPU can process in a single operation.
A word like a word or a sentence in English is called a string in programming.
You use strings to print words letters, numbers and symbols to the screen. A string contains the data that the computer should print to the screen. It does not tell the computer what size to print strings or which font to use.
Before we talk about stings we have to talk about characters. Characters are the letters, numbers, punctuation and mathematical symbols on your keyboard that a computer can print to the screen.
Each character is represented by a number in a table. Each character in the table has a unique number. The numbers are assigned by the Unicode standards committee.
The Unicode table website contains the complete table. Open the table in you web browser.
The table is very large so it may be slow to load.
If you find the capital letter
A and hover over it with your mouse you will
U+0041 | Dec: 65. The important part is
Dec: 65 which tells you that
the capital letter
A is number 65 in the table.
A capital letter
B is 66. A number
1 is at number 49. An explanation mark,
!, is at number 33 in the table. A space is one before at number 32.
Using the Unicode table, can you find the numbers that are used to represent these letters?
X is the number 88
Y is the number 89
z is the number 122
4 is the number 52
& is the number 38
? is the number 63
So when you type a letter
A the computer knows you really mean number 65. If
you remember that computers can only process numbers. This is why we have to
convert what you see as a letter to a number. This is called an encoding.
The letters are encoded or represented by numbers.
You might be wondering why this table is so large. It is because the table has to work for all of the world’s languages.
The Chinese language has around 3500 characters, actually called logograms, in common use but has as ,many as 109,000 in total. These are all in the table. To make these easy to find you need to group the table by “block”. Once you do that you need to find the block called “CJK Unified Ideographs” to see just some of them.
Using the Unicode table can you find the numbers that are used to represent these letters from other languages?