The Confusion of Emacs's Keystroke Representation

Xah Lee, 2007-05-29

Someone wrote:

Hi, 
how can I find the an overview on how to enter meta-characters 
(e.g. esc, return, linefeed, tab, ...) 
a) in a regular buffer 
b) in the minibuffer when using standard search/replace-functions 
c) in the minibuffer when using search/replace-functions using regular 
expressions 
d) in the .emacs file when defining keybindings 

As far as I can see in all those situations entering meta-characters is 
addressed in a different way which I find confusing, e.g.: 
a) <key> _or_ C-q <key> 
b) C-q C-[, C-q C-m, C-q C-j, C-q C-i 
c) \e, \r, \n, \t 
d) (define-key [(meta c) (control c) (tab c)] "This is confusing!") 

Furthermore, they are displayed in a different way,e.g. 
- actual, visible layout 
- ^E, ^M, ^L, ^I 
- Octals 

I would be happy about pages summarizing such information. 
Any references available? 

The issues involve non-printable chars, its representation, its input method, its input method representation, suppression of a key's normal function, and program language's need to represent non-printables in strings.

Here's a short summary:

The following is a detailed explanation.

Suppressing Normal Function of a Key; Literal Data Entry

Your first item:

Ctrl+q <key>

The Ctrl+q (or, pressing the Control key down then type q) is the keyboard shortcut to invoke the command quoted-insert. After this command is invoked, the key press on your keyboard will force emacs to insert a character represented by that key, and suppress that key's normal function.

For example, if you are doing string replacement, and you want to replace tabs by returns. When emacs prompts you to type a string to replace, you can't just press the tab key, because the normal function of a tab key in emacs will try to do a command completion. (and in other Applications, it usually switches you to the next input field) So, here you can do Ctrl+q first, then press the tab key. Similarly, you can't type the return key and expect it to insert a return character, because normally the return key will activate the OK button or signal “end of input”.

This input mechanism usually don't exist in other text editors. In other text editors, when you want to enter the ASCII Tab character or Carriage Return character in some pop-up dialogue, you often use a special representation such as “/t” or “/r” instead. Or, sometimes, by holding down the mouse, then press the key. Often, they simply provide a graphical menu or check box to represent special characters. The need to input character as is, is frequently used in key remapping applications (e.g. QuicKeys, KeyboardMaestro, IntelliType↗ ).

Data Entry for Non-printable Chars

Ctrl+q Ctrl+[, Ctrl+q Ctrl+m, Ctrl+q Ctrl+j, Ctrl+q Ctrl+i

In this, the Ctrl+q is the keyboard shortcut to invoke the command quoted-insert, which will insert a literal character of whatever character you can type on your keyboard. So, for example, Ctrl+q followed by the tab key will insert the non-printable character “tab”.

The Ctrl+[, Ctrl+m, Ctrl+j etc key-press combinations (Holding down Control key while pressing “[”, “m”, “j”), are methods to input non-printable characters that may not have a corresponding key on the keyboard.

For example, suppose you want to do string replacement, by replacing Carriage Return (ASCII 13) by Line Feed (ASCII 10). Depending what is your operatin system and keyboard, usually your keyboard only has a key that corresponds to just one of these characters. But now with the special method to input non-printable characters, you can now type any of the non-printable characters directly.

Representation of Non-printable Chars

When speaking of non-printable characters, implied in the context is some standard character set. Implicitly, we are talking about ASCII, and this applies to emacs. Now, in ASCII, there are about 30 non-printable characters. Each of these is given a standard abbreviation, and several representations for different purposes. For example, ASCII 13 is the “Carriage return” character, with standard abbreviation code CR, and ^M as its control-key-input representation. (M being the 13th of the English alphabet), and Control-m is the conventional means to input the character, and the conventional method to indicate a control key combination is by using the caret “^” followed by the character.

For the full detail, look at the table in the wikipedia article: ASCII↗. Here's a excerpt of the non-printable ascii chars table.

DecHexAbbrPR†1CS†2CEC†3Description
000NUL^@\0Null character
101SOH^AStart of Header
202STX^BStart of Text
303ETX^CEnd of Text
404EOT^DEnd of Transmission
505ENQ^EEnquiry
606ACK^FAcknowledgment
707BEL^G\aBell
808BS^H\bBackspace
909HT^I\tHorizontal Tab
100ALF^J\nLine feed
110BVT^K\vVertical Tab
120CFF^L\fForm feed
130DCR^M\rCarriage return
140ESO^NShift Out
150FSI^OShift In
1610DLE^PData Link Escape
1711DC1^QDevice Control 1 (oft. XON)
1812DC2^RDevice Control 2
1913DC3^SDevice Control 3 (oft. XOFF)
2014DC4^TDevice Control 4
2115NAK^UNegative Acknowledgment
2216SYN^VSynchronous Idle
2317ETB^WEnd of Trans. Block
2418CAN^XCancel
2519EM^YEnd of Medium
261ASUB^ZSubstitute
271BESC^[\eEscape
281CFS^\File Separator
291DGS^]Group Separator
301ERS^^Record Separator
311FUS^_Unit Separator
1277FDEL^?Delete
DecHexAbbrPR†1CS†2CEC†3Description

†1 RP = Printable Representation (a glyph in unicode). †2 CS = Control key sequence and or Caret Notation. †3 CEC = Character Escape Codes in the C programming language. (adapted in many other langs.)

In general, the practical issues involved for a non-printable character, in the context of a programing language for text editing, are: its display representation, its input method, and the display representation for the character's input method.

(Note: Emacs also has a general way to input non-printable and or non-typable characters of the unicode standard. See Emacs and Unicode Tips )

String Representation of Non-printable Chars in Programing Languages

\e, \r, \n, \t

This is a ad-hoc set of input and display representation for a few non-printable characters, used primarily in programing languages. This set is started by the unix tech geeking morons, and by its free and speedy nature as cigarette given to children, today has spread to many languages (Perl, Java, C++, C#, Python, JavaScript ...) and is a de facto standard. The damage is to such a degree that the general concept of unprintable characters, their representation, and their method of input, all treated in one systematic, simple way, are not in the consciousness of average industrial programers.

There are good reasons that these are preferred than a literal or the more systematic caret notation. Here are some reasons:

Representation of Keystrokes

In the above, we discussed non-printable chars, its representation, its input method, and the representation of its input method. We also discussed, a representation of a subset of these non-printable chars as a “escape mechanism” that arise from C computer language's strings.

However, emacs also need a system to represent keystrokes (as used in its keyboard macro system, and keybinding).

Note here, that keystroke combination and sequence, is not the same and cannot be mapped to character's input/representation in a character set such as ASCII. For example, the F1 key in vast majority of keyboards, isn't a character. The Alt modifier key, isn't a character nor is it a function in one of ASCII's non-printable character. The keys on the number keypad, need a different representation than the ones on the main keyboard section.

Emacs today has several rather confusing ways for keystroke representation, out of mostly historical reasons. Emacs Lisp, started in about mid 1980. At the time, computer hardware are limited, and compiler technology is also limited. Thus, Emacs Lisp has some peculiarities.

Here are examples of multiple representation for the same keystroke:

 ; equivalent code for a single keystroke
 (global-set-key "b" 'cmd)
 (global-set-key [98] 'cmd)
 (global-set-key [?b] 'cmd)
 (global-set-key [(?b)] 'cmd)
 (global-set-key (kbd "b") 'cmd)
 
 ; equivalent code for a named special key: Enter
 (global-set-key "\r" 'cmd)
 (global-set-key [?\r] 'cmd)
 (global-set-key [13] 'cmd)
 (global-set-key [(13)] 'cmd)
 (global-set-key [return] 'cmd)
 (global-set-key [?\^M] 'cmd)
 (global-set-key [?\^m] 'cmd)
 (global-set-key [?\C-M] 'cmd)
 (global-set-key [?\C-m] 'cmd)
 (global-set-key [(?\C-m)] 'cmd)
 (global-set-key (kbd "RET") 'cmd)
 (global-set-key (kbd "<return>") 'cmd)
 
 ; equivalent code for binding 1 mod key + 1 letter key: Meta+b
 (global-set-key "\M-b" 'cmd)
 (global-set-key [?\M-b]  'cmd)
 (global-set-key [(meta 98)]    'cmd)
 (global-set-key [(meta b)]    'cmd)
 (global-set-key [(meta ?b)]    'cmd)
 (global-set-key (kbd "M-b") 'cmd)

 ; equivalent code for binding 1 mod key + 1 special key: Meta+Enter
 (global-set-key [M-return]    'cmd)
 (global-set-key [\M-return]    'cmd)
 (global-set-key [(meta return)]    'cmd)
 (global-set-key (kbd "M-<return>") 'cmd)

; equivalent code for binding Meta + cap letter key: Meta Shift b
 (global-set-key (kbd "M-B") 'kill-this-buffer)
 (global-set-key "\M-\S-b" 'backward-word)
 (global-set-key "\S-\M-b" 'backward-word)
 (global-set-key "\M-B" 'forward-word)

 (global-set-key [?\M-S-b] 'backward-word) ; invalid-read-syntax
 (global-set-key [?\M-?\S-b] 'forward-word) ; invalid-read-syntax
 (global-set-key [?\M-\S-b] 'forward-word) ; compile but no effect

 (global-set-key [?\M-B] 'forward-word)
 (global-set-key [\M-B] 'backward-word) ; compile but no effect

 (global-set-key [(meta shift b)] 'cmd)
 (global-set-key [(shift meta b)] 'cmd)

 (global-set-key (kbd "M-B") 'backward-word)
 (global-set-key (kbd "M-S-b") 'forward-word) ; compile but no effect

; Meta + shifted symbol key.
 (global-set-key (kbd "M-@") 'backward-word) ; good
 (global-set-key (kbd "M-S-2") 'forward-word) ; compile but no effect

; show examples of key sequences

This is the only part of complexity in this article that we can attribute it as a obvious flaw in emacs.

(Note: keystroke representation (aka keycode) using ASCII chars, is not a new concept. For example, xmodmap utitily of X11 reads a keycode file for its keymaps. Apple's OS X has its own keycode. (sample X11 keycode file: dvorakKeymap.txt; For a sample of OS X keycode syntax, see: How To Create Your Own Keybinding In Mac Os X))

Char as Integers

One of emacs's quirk is that its character data type are simply integers. So, a character “c” is just the integer 99 in elisp. Now, elisp has a special read syntax for chars, so that the letter “c” in lisp can also be written as “?c”. This way, it is easier for programers to insert a character data in their program, and the code is much clearer to read. A backslash can be added in front of the char, so that “?'” can be written as “?\'”. This syntax is introduced in part so that Emacs's lisp editing commands don't get confused. Many of the control characters in ASCII also have a backslash representation. Here's a table from the Elisp Manual: Character-Type:

     ?\a ⇒ 7                 ; control-g, C-g
     ?\b ⇒ 8                 ; backspace, <BS>, C-h
     ?\t ⇒ 9                 ; tab, <TAB>, C-i
     ?\n ⇒ 10                ; newline, C-j
     ?\v ⇒ 11                ; vertical tab, C-k
     ?\f ⇒ 12                ; formfeed character, C-l
     ?\r ⇒ 13                ; carriage return, <RET>, C-m
     ?\e ⇒ 27                ; escape character, <ESC>, C-[
     ?\s ⇒ 32                ; space character, <SPC>
     ?\\ ⇒ 92                ; backslash character, \
     ?\d ⇒ 127               ; delete character, <DEL>

So, now, the character tab (ASCII 9), can be represented in elisp as a character type data as: “9”, “?t”, “?\t”. .

Here's more quote from the manual:.

Control characters may be represented using yet another read syntax. This consists of a question mark followed by a backslash, caret, and the corresponding non-control character, in either upper or lower case. For example, both `?\^I' and `?\^i' are valid read syntax for the character C-i, the character whose value is 9.

Instead of the `^', you can use `C-'; thus, `?\C-i' is equivalent to `?\^I' and to `?\^i':

     ?\^I ⇒ 9     ?\C-I ⇒ 9

... The read syntax for meta characters uses `\M-'. For example, `?\M-A' stands for M-A. You can use `\M-' together with octal character codes (see below), with `\C-', or with any other syntax for a character. Thus, you can write M-A as `?\M-A', or as `?\M-\101'. Likewise, you can write C-M-b as `?\M-\C-b', `?\C-\M-b', or `?\M-\002'.

So now, the tab char can be any of:

9    ?t    ?\t    ?\^i    ?\^I    ?\C-i    ?\C-I

THE FOLLOWING PASSAGE IS INCOMPLETE

Key Sequence Data Type

(info "(elisp)Key Sequences")


Related essays:

Page created: 2007-05.
© 2007 by Xah Lee.
Xah Signet