http://www.ascii-code.com/ the output is ORD ~ =..." /> http://www.ascii-code.com/ the output is ORD ~ =..." /> http://www.ascii-code.com/ the output is ORD ~ =..."/>

Are PHP strings' chars signed or unsigned and why e.g. ord("Ø") is not matching extended ASCII table?

609 Views Asked by At

trying the code below:

<?php

echo "ORD ~ = ".ord("~");

Basing on the extended ASCII table -> http://www.ascii-code.com/ the output is

ORD ~ = 126

Which is correct, but then when outputting something in the extended ASCII table, like Ø:

<?php

echo "ORD Ø = ".ord("Ø");

Gives:

ORD Ø = 195

While in the linked extended ASCII table the correct code for 'Ø' is 216. The same goes e.g. for (ord("√") outputs 226 while the proper extended ASCII char for 226 is â and √ is not even in the table).

So my question is, as the PHP strings basically are an array of strings ($str[0] for the first character, $str[1] for the second, C like, etc...), and as PHP doesn't have a char type, how does PHP handles the 1 byte char when it treats it separately e.g. using the previous ord() function and pack() and unpack() functions?

Are PHP char unsigned or are they signed? What's the difference?

How should I interpret this phrase A string is series of characters, where a character is the same as a byte. This means that PHP only supports a 256-character set taken from the PHP manual?

256-character meaning that it supports extended ASCII? But why then those differences when calling ord() on extended ASCII chars?

Thanks for the attention!

1

There are 1 best solutions below

2
On BEST ANSWER

The PHP core as it stands right now has no notion of character encoding. Strings are just -as the manual states- series of bytes (unsigned 8bit). How the output medium interprest those bytes is ...beyond php.
In your example the Ø might have been utf-8 encoded, i.e. as the two bytes 195 and 152.
PHP not beeing aware of the encoding treats those two bytes as two separate single-byte "characters". ord() only takes the first "character" in a string into account and so you get 195.
So the answer is: unsigned, no charset at all ...just bytes with a length indiciator.