GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Feb 13, 2019 10:57:24 GMT -8
Here is the Windows version of the ASCII 256 character table. Printed, it is 7 pages.
Can anyone supply a similar table for other character sets?
|
|
|
Post by swietymiki on Feb 13, 2019 13:52:13 GMT -8
Couldn't think of a more relevant character set than Unicode (ISO/IEC 10646). However it's way too extensive to put in one file and print - it's about 137 thousand characters with a full list available on unicode.org/charts/
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Feb 13, 2019 14:07:23 GMT -8
My understanding is that for deviations from ASCII, there is a preceding string (perhaps beginning and ending tag strings for voluminous verbage) to select the specific replacement(s), as to shift into Cyrillic (for example). Without scanning the entire Unicode presentation, and perhaps learning the 'rules' of Unicode, I wanted to know how to specify the Hebrew character set, for example. We have several European members, with Poland and Rumania being well represented, and an administrator (largely inactive) from Israel ... I am curious.
|
|
Tomalla
Designer
General Modder
Posts: 525
|
Post by Tomalla on Feb 13, 2019 14:29:57 GMT -8
But what for? If you want to write something in Cyrillic or Hebrew, you simply use the UTF-8 encoding and you're done. That way you can use any glyphs from all around the world you want in the same place/document. Now, converting from extended ASCII to UTF-8 is indeed a heck of a trouble as each different extended ASCII code page requires a different translation table, but it can be automated and there are programs for that. Converting between two different extended ASCII code pages however may not be possible at all, as both may contain two sets of completely different glyphs.
The ProBoards forums work in UTF-8, so you can easily write Hebrew (עִבְרִית) or even Chinese (汉语) here. Their usage here is limited however, as you also have to provide the translation in order not to abuse the terms of use.
|
|
GooRoo
Administrator
Owner Administrator
I luv Gruntz!
Posts: 7,425
Display Name: GooRoo
|
Post by GooRoo on Feb 13, 2019 16:05:56 GMT -8
But what for? If you want to write something in Cyrillic or Hebrew, you simply use the UTF-8 encoding and you're done. That way you can use any glyphs from all around the world you want in the same place/document. Now, converting from extended ASCII to UTF-8 is indeed a heck of a trouble as each different extended ASCII code page requires a different translation table, but it can be automated and there are programs for that. Converting between two different extended ASCII code pages however may not be possible at all, as both may contain two sets of completely different glyphs. The ProBoards forums work in UTF-8, so you can easily write Hebrew (עִבְרִית) or even Chinese (汉语) here. Their usage here is limited however, as you also have to provide the translation in order not to abuse the terms of use. In quoting your reply, I cannot see how different character sets are being specified. I will have to look up "UTF-8" to find out more.
My interest is in seeing if Microsoft file names, which appear to be hexadecimal strings (0123456789ABCDEF) will decode to something useful. When I do a virus scan, there are thousands of those strings displayed (faster than anyone human could possibly read them), so I was going to go into a few of the folders containing them to see if there was some obscure meaning to them. Otherwise, the people responsible for maintaining the files have a horrendous job to keep things organized.
|
|
Tomalla
Designer
General Modder
Posts: 525
|
Post by Tomalla on Feb 13, 2019 16:32:14 GMT -8
In quoting your reply, I cannot see how different character sets are being specified. I will have to look up "UTF-8" to find out more. That's because the BBCode operates in UTF-8 by default. When editing your post, the editor still operates in UTF-8, not ASCII (even though the font displayed may be monospace and may suggest raw input) and thus all characters are written as-is. In order to see the byte representation of text, you'd have to view it in a hex editor. In ASCII every character is stored in one byte (8 bits). However, in UTF-8 encoding every character may be stored in 1, 2, 3 or even 4 bytes and thus it's variable. For example, if you interpret the following text: "Chinese (汉语)" forcibly as ASCII characters (for example via a hex editor) you'd see the following string (provided you interpreted the string as ASCII in the same code page as I did): "Chinese (汉čŻ)". These two Chinese characters are actually stored in 5 bytes, which in ASCII look like some unrelated gibberish. My interest is in seeing if Microsoft file names, which appear to be hexadecimal strings (0123456789ABCDEF) will decode to something useful. When I do a virus scan, there are thousands of those strings displayed (faster than anyone human could possibly read them), so I was going to go into a few of the folders containing them to see if there was some obscure meaning to them. Otherwise, the people responsible for maintaining the files have a horrendous job to keep things organized. Do you mean files and folders named like: {ED7BA470-8E54-465E-825C-99712043E01C}? These numbers are not a representation of any human-readable text, in any encoding. It's actually a randomly generated GUID - a Globally Unique ID and its meaning is merely to assign a unique identifier to some resources so that they can be referenced back in a unique way.
|
|