How do table mappings work in C?

Hope this question makes sense! I'm currently learning C (simple!) And I'm interested in how table mappings work.

I am using extended ASCII table as an experiment. ( http://www.ascii-code.com )

For example, I can create char

and set its value to a tilde, for example:

char charSymbol = '~';

      

And I can also specify the exact value like so:

char charDec = 126;  
char charHex = 0x7E; 
char charOct = 0176;
char charBin = 0b01111110;

      

No matter which of the above declarations I choose (if I understand things correctly), the value that is stored in memory for each of these variables will always be the same. That is, the binary representation (01111110)

My question is, does the compiler hold the extended ASCII table and look up the binary value at compile time? And if so, does the program that the program is running on also keep an extended ASCII table to know that when the program is prompted to print 01111110, to display it to print "~"?

+3


source to share


4 answers


Most of the code in your question doesn't require an ASCII lookup table.

Note that in C, char

is an integer type, like int

, but narrower. A character constant like 'x'

(for historical reasons) has a type int

, but in the ASCII system x

is largely identical 120

.

char charDec = 126;  
char charHex = 0x7E; 
char charOct = 0176;
char charBin =  0b01111110;

      

(The C standard does not support binary constants like 0b01111110

that of the gcc extension.)

When the compiler sees an integer constant of a type 126

, it calculates an integer value from it. To do this, he needs to know what 1

, 2

and 6

are decimal digits and what their meanings are.

char charSymbol = '~';

      

To do this, the compiler simply has to recognize what ~

is a valid character.

The compiler reads all of these characters from a text file, your C source. Each character in this file is stored as a sequence of 8 bits that represent a number between 0 and 255.

So, if your C source code contains:



putchar('~');

      

(and ~

has a value of 126), then all the compiler needs to know is that 126 is a valid symbol value. It generates code that sends the value 126

to the function putchar()

. At run time, it putchar

sends this value to standard output. If standard output goes to a file, the value is 126

stored in that file. If it navigates to the terminal, the terminal software will do some sort of search to match the number 126

to the glyph that appears as a tilde character.

Compilers must recognize specific symbol meanings. They have to recognize that it +

is the plus symbol that is used to represent the addition operator. But no ASCII collation is required for input and output, since each ASCII character is represented as a number in all processing steps, from compilation to execution.

So how '+'

does the compiler recognize the character ? C compilers are usually written in C. Somewhere in the compiler's own sources, it might be something like:

switch (c) {
    ...
    case '+':
        /* code to handle + character */
    ...
}

      

So, the compiler recognizes +

in its input because there is +

in its own source code, and that +

(stored in the compiler source code as an 8-bit number 43) resulted in the number 43, stored in the compiler's own executable machine code.

Obviously the first C compiler was not written in C because there was nothing to compile it. Early C compilers may have been written in B, or BCPL, or assembly language, each handled by a compiler or assembler that probably recognizes +

because it is there +

in its source code. Each generation of the C compiler passes on "knowledge" of what to the +

next C compiler that it compiles. "Knowledge", which +

is 43, is not necessarily written into the source code; it is distributed every time a new compiler is compiled using the old one.

You can read about this in Ken Thompson's article "Reflections on the Trust" .

On the other hand, you can also have, for example, an ASCII-based compiler that generates code for EBCDIC , or vice versa. Such a compiler must have a lookup table mapping from one character set to another.

+3


source


In fact, technically speaking, your text editor is one that has an ASCII (or Unicode) table. The file is saved simply as a sequence of bytes; the compiler doesn't actually need to have an ASCII table, it just needs to know which bytes are doing something. (Yes, the compiler logically interprets bytes as ASCII, but if you look at the machine code of the compiler, all you see is a collection of comparisons of bytes with fixed byte values.)



On the flip side, the executing computer has an ASCII table somewhere to map the bytes output by the program to readable characters. This table is probably in the terminal emulator.

+3


source


The C language has rather weak type safety, so you can always assign an integer to a character variable.

You have used different representations of an integer to assign a symbolic variable - and this is supported in the C programming language.

When you type "~" into a text file in your C program, your text editor actually converts the keystrokes and stores its ASCII equivalent. Therefore, when the compiler analyzed the C code, it did not realize that what was written was ~ (tilde). On parsing, when the compiler encountered the ASCII equivalent "(single quotes), it went into read mode of the next byte as something that matches the variable a char followed by another" (single quote). Since the variable char can have 0-255 different values, it covers the entire ASCII set, including the extended char set.

This is the same when you are using assembler.

Screen printing is a completely different game. It is part of the I / O system. When you enter a specific character on the keyboard, a pulse of the displayed integer enters and drops into the reader's memory. Likewise, when you print a specific integer to a printer or screen, that integer takes the form of the corresponding character.

So if you want to print an integer into an int variable, there are subroutines that convert each of their digits and send the ASCII code for each one, and the I / O system converts them to characters.

+1


source


All these values ​​are exactly equal to each other - they are just different representations of the same value, so the compiler sees them all exactly the same after translation from your written text to a byte value.

0


source







All Articles