Xah Lee, 20051101
There's a bunch of confusion about the display of non-ascii characters such as the bullet “•”. These confusions are justifiable, because the underlying stuff is technology, computing technologies, are in a laymen perspective, extremely complex.
In order to be able to type the bullet char, post it to a newsgroup, and receive the posted material, and have that bullet display as a bullet as it was intended, truly involves the availability of several technologies, on the sender's computer, on the receiver's computer, and thru the network that received the posting, and the network the post was retrieved, as well as the configuration of the sender and poster's computers. And, cross your fingers, that all things should go well, but unfortunately, because the fucking asses criminals such as Larry Wall in the computing industry, mostly likely things will not go well.
[Disclaimer: all mention of real persons are opinion only and or in jest.]
Here's a quick rundown:
• there needs to be agreed upon a character set. (that is, the set of symbols to be used on computer (e.g. ABCabc123.,+-=...)) Many such character sets include the bullet symbol.
• there needs to be a code map that maps the character set to numbers. (because computers at the core deal with numbers only)
There are various standard bodies that standardize these character sets and code maps. (usually, but not always, they come together as one)
• now, more technically, once each character has a associated number, this number needs to be turned into a binary number. This is the _encoding_ part. There are various standards of encoding a text of a particular character set. That is, turning a sequence of numbers into 1s and 0s. (The issue involves not just turning integers into binary, but for example marking or demarcating combined characters such as umlaut or initiate or terminate right-to-left writings as in Arabics.) Usually but not always, the encoding business is intertwined together with the character set/code map specification, even though they are entirely separate concepts.
• now on your computer, say you are using the Microsoft Windows operating system and the email program Microsoft OutlookExpress, there's a menu or option somewhere you can see that says text encoding or character set. Now, that's where you tell the computer which of these standardized character/encoding stuff set to choose from to actually represent what you type on the keyboard. (in the case of Chinese for example, you can't type directly, you need another technology Input Methods to type stuff.)
• one of these standards, is called Unicode, which has a character set that encompasses practically all the world's language's written symbols, including all Chinese characters and Japanese phonetics and Korean alphabets, as well as Arabic alphabets. (i.e. those hateful Islamic twists the WASPs see)
• once you typed your letter and send it thru a particular encoding in your email/newsreader software, the message went to the network “news” servers. For a ride around internet, there needs to be more protocols. That is, a way to distinguish from a string a binary digits where does your subject actually starts, where is From, where the To address starts, where is your message content, ... , among other things.
• now we are getting really complex... because in the history of software and the internet, in the beginning there's really no support of any character set or all that complex stuff except the ASCII (among others), that is to say, only the characters you can see on the keyboard. There isn't much in the way of Standards. Things basically went on on a as-it-currently-works basis. Later on these common practices are written into documents called RFCs (aka Really Fucking Common). And more later on these protocols improved in a patchy and haphazard way, to allow the use of non-ascii characters or foreign languages, or include pictures or other files such as sound & video as attachment.
• remember that we are bypassing the whole technology of the internet transport protocols themselves. i.e. IP addresses, various layers... down to the physics of wiring, copper or fiber optics etc.
• OK, now the newserver received your message, it distribute to other newservers like a spam. More protocols.
• When you wake up, you open your newsreader hungrily anticipating news. What happens is that your newsreader software (called client) contacts the particular server and download the message. (all thru decoding the various many protocols)
• in order for the bullet character to display on your screen, you assume: (1) your computer supports the whole charset/encoding scheme the sender used. (2) your computer has the proper font to display it. (suppose i write Chinese to you using Unicode, although your computer supports (understands) Unicode, your computer understands everything, but because you don't have Chinese font, your computer cannot help but display gibberish.) (3) and most importantly, nothing has been screwed up in the message's journey on the net.
• Chances are, things did fuck up somewhere. That is why you see things like “E2=80=A2” instead of the bullet (which is due to it being fucked up around the news servers). Or, you may see a gibberish or empty square instead of the bullet. (due to your software didn't use the right charset/decoding or you don't have the right font.)
Now, many of you are actually using groups.google.com to post/read. Here, google website acts as your newsreader software. Google is pretty good on the whole. It won't fuckup the encoding. However, your computer still needs to support Unicode and have the font to show bullet. If you have Windows XP and using Internet Explorer, then you are all fine. If you have latest Mac OS X, you are all fine too. If you have older Windows/Mac, or Linux, Solaris or other unixes, you are quite fucked and nobody can help you. Try to see in the menu if there's a encoding/charset/languages and try to see if it has one item called Unicode, utf8 or “universal alphabet”. Use that.
Now with all the trouble, why would someone use a bullet • that requires some “advanced” technology than simply using the asterisk * ?
Such basically came down to choice. If you really want massive compatibility, you'd go with the universally available asterisk. If you truly care, you really should write on paper with pen instead, with checkmarks, by dipping you pen downward and upward in one stroke. Remember, folks, not everyone on this earth has a computer. But if you have advanced compability or purity obsession, then perhaps any dingbats must be done away with by explicit itemization using English word “Item”.
“O brave new worlds, That have such characters in them!”
Related essays:
Page created: 2005-11. © 2005 by Xah Lee. (excluding mirrored pages or images.)