How can I figure out which code page I'm looking at?
I have a device with some documentation on how to send its text. It uses 0x00-0x7F to send special characters like accented characters, euro signs, ...
I assume they copied the existing code page and made some changes, but I have no idea how to determine which code page is closer to the one in my documentation.
In theory, this should be easy to do. For example, they map Γ to 0x41, so if I could find a way to go through all the code pages and find the ones that have that character at that position, that would be a piece of cake.
However, all I can find on the internet are links to codepage dumps like the ones I'm looking at, or software that uses heuristics to read text and guesses the most likely codepage. Surely someone there was able to see what code page they are looking at?
source to share
If it uses 0x00
to 0x7F
for "special" characters, how does it encode regular ASCII characters?
Most character encodings have a Γ
code of 193 ( 0xC1
). If you subtract 128 from that, you get 65 ( 0x41
). Perhaps your "code page" is only the upper half of one of the standard encodings such as ISO-8859-1 or windows-1252, with the most significant bit set to zero instead of one (ie, subtracts 128 from each).
If that happens, I would expect to find a flag that you can set to tell it whether to convert the next chunk of codepoints with "up" or "down" encoding. I donβt know of any system using this scheme, but this is the most reasonable explanation I can think of for this situation.
source to share
In most code pages, 0x41 is just a normal "A", I don't think the standard code pages have a "Γ" in that position. It can have a control character somewhere in front of the A, which added an accent, or it uses a non-standard code page.
I see no point in using the "closest code page", you just need to use the documents you received with the device.
Your last sentence is puzzling what do you mean by "searchable which code page on the page"?
If you include your entire codepage, the people here on SO might be more helpful and give you more information on this issue, since the single data point 0x41 = Γ doesn't help much.
source to share
A bit of a random idea, but if you can get a significant amount of text replicated from the device, you can try running it through something like a function detect
at http://chardet.feedparser.org/ .