What is char size in a computer architecture?

599 Views Asked by At

This Wikipedia article on word sizes provides a table of word sizes in different computer architectures. It has different columns like 'integer size', 'floating point size' etc. I suppose, integer size is the size of arguments for ALU, floating point size is the size of arguments for FPU, unit of address resolution is the number of bits/trits/digits represented by a single address. word size is given as the natural size of data used by the processor (which is still confusing somewhat).

But I'm wondering what does the char size column in the table represents? Is it the smallest object size theoretically possible? Is it the smallest alignment possible? What are the common operations defined over data of char size? In x86, x86-64, ARM architectures char size is 8 bits, which is same as the smallest integer size. But on some other architectures, char size is 5/6/7 bits which is very different from the integer size in that architecture.

1

There are 1 best solutions below

6
On BEST ANSWER

In modern C, a char is guaranteed to be independently modifiable, without disturbing surrounding data. It's usually chosen to be the width of the narrowest load/store instruction. So on Alpha or word-addressable CPUs, a char had to be the word size, or else every char store would have to compile to an atomic RMW on the containing word. (Rather than a much cheaper non-atomic RMW like some early compilers actually used, before C11 introduces a thread-aware memory model to the language.) See Can modern x86 hardware not store a single byte to memory? (which covers modern ISAs in general) and C++ memory model and race conditions on char arrays for the requirements C++11 and C11 place on char.

But that Wikipedia table of word and char sizes in historical machines is clearly not about that, given the sizes. (e.g. smaller than a word on some word-addressable machines, I'm pretty sure).

It's about how software (and character I/O hardware like terminals) packed multiple character of the machine's native character encoding (e.g. a subset of ASCII, EBCDIC, or something earlier) into machine words.

Unicode, and variable-length character encodings like UTF-8 and UTF-16, are recent inventions compared to that history. https://en.wikipedia.org/wiki/Character_encoding#History Many systems used fewer than 8 bits per character, e.g. 6 (64 unique encodings) is enough for the upper and lower case Latin alphabet plus some special characters and control codes.

These historical character sets are what motivated some of the choices for programming languages to use certain special characters or not, because they were developed on systems that had a certain character set.

Historical machines really did do things like pack 3 characters of text into an 18-bit word.

You might want to search on https://retrocomputing.stackexchange.com/, or even ask a question there after doing some more reading.