Xah Lee, , …,
This page gives some tips about using emacs and Unicode. If you work in 2 languages, or type a lot math symbols, you'll find this page useful.
This page covers emacs 23 (released in 2009-07). You should use emacs 23 if Unicode is important to you, because emacs 23 uses Unicode as its internal encoding and also support OS fonts.
How to type this character é ?
Here's a table on how to type these chars:
| Character | Key Press |
|---|---|
| é | 【Ctrl+x 8 ' e】 |
| à | 【Ctrl+x 8 ` a】 |
| î | 【Ctrl+x 8 ^ i】 |
| ñ | 【Ctrl+x 8 ~ n】 |
| ü | 【Ctrl+x 8 " u】 |
To see all characters you can type this way, press 【Ctrl+x 8 Ctrl+h】. Examples: ¿ ¡ ¢ £ ¥ ¤ § ¶ ® © ª «» × ÷ ¬ ° ± µ ÀÁÂÃÄÅÆ Ç ÈÉÊË ÌÍÎÏ ÐÑ ÒÓÔÕÖ ØÙÚÛÜÝÞß àáâãäåæç èéêë ìíîï ðñòóôõö øùúûüýþÿ.
If you need to type these chars often, you can set your input method to “latin-9-prefix”. (type 【Alt+x set-input-method】). That will allow you to type these chars without typing 【Ctrl+x 8】 first.
(Emacs's “latin-9-prefix” corresponds to the char set ISO 8859-9)If you are on a Mac, these characters can be typed by holding down the Option key or use Character Palette. On Windows, you can use Windows Alt keycodes or Charmap.
How to insert a Unicode character by name?
Call ucs-insert 【Ctrl+x 8 Enter】, then the name of the Unicode. For example, try insert “λ”. Its name is “GREEK SMALL LETTER LAMDA”.
You can use asterisk * to match chars. For example, call ucs-insert, then type *arrow then Tab ⇆, then emacs will show all chars with “arrow” in their names.
How to insert a Unicode character by its hexadecimal value?
Call ucs-insert 【Ctrl+x 8 Enter】, then the hex of the Unicode. For example, try insert “λ”. Its hex value is “3bb”.
How to insert a Unicode character by its decimal value?
Call eval-expression, then type (ucs-insert 955).
How to open a Unicode character palette?
You can put frequently used Unicode chars into a file and save it, and define a keystroke to open this file, so that you can copy and paste the chars you want. Here's how you can define a keystroke to open a file. Put the following in your emacs init file.
; open my Unicode template with F8 key (global-set-key (kbd "<f8>") (lambda () (interactive) (find-file "~/emacs.d/my_unicode_template.txt")))
Here's a example of a template: unicode.txt.
You can also install the xub Unicode Browser mode. It lets you easily browser a file of Unicode chars.
How to set a keystroke to insert a Unicode char?
For example, put the following code in your emacs init file.
(global-set-key (kbd "<f11>") "λ") ; make F11 key insert lambda
You can also set shortcut by key sequence. Like this:
(global-set-key (kbd "M-i a") "α") (global-set-key (kbd "M-i b") "β")
With the above, typing 【Alt+i a】 will insert α. This way you can set a whole collection of Unicode chars.
Alternatively, you can use “key-translation-map”:
(define-key key-translation-map (kbd "M-i a") (kbd "α")) (define-key key-translation-map (kbd "M-i b") (kbd "β"))
For the difference, see: Emacs: Remapping Keys Using key-translation-map. For OS-wide, see: How to Create a APL or Math Symbols Keyboard Layout.
How to use abbrev to input Unicode chars?
Put the following in your emacs init file:
(define-abbrev-table 'global-abbrev-table '(
("alpha" "α" nil 0)
("beta" "β" nil 0)
("gamma" "γ" nil 0)
("theta" "θ" nil 0)
("inf" "∞" nil 0)
("ar1" "→" nil 0)
("ar2" "⇒" nil 0)
))
(abbrev-mode 1) ; turn on abbrev mode
Select the code above and call eval-region 【Alt+x】.
Now, type alpha , it will become “α ”.
If you do math a lot, use Emacs Math Symbols Input Mode (xmsi-mode).
How to type Chinese?
Regardless what text editor you are using, you need to do two things: ① Set your editor's Character encoding to one that supports your language. ② set your Input method to a particular system suitable for your language.
Char Encoding tells your computer how to map symbols/glyphs/characters into binary code. Input Method allows you to type languages that are not based on Latin alphabet. (For example, in Chinese, you cannot just type a character by pressing a key, instead, you must use a input method to type Chinese.) For English and most European langs, you don't need to worry about input method.
To set your file encoding in emacs, use the menu 〖Options▸Mule (Multilingual Environment)▸Set Language Environment〗.
To set your input method, use the menu 〖Options▸Mule (Multilingual Environment)▸Select Input Method…〗.
After you've pulled the menu, be sure to also pull the menu command 〖Options▸Save Options〗 so that emacs remembers your settings.
For me, i type Chinese sometimes. There are several encoding systems that supports Chinese, for example GB 18030 (used in China), Big5 (popular in Taiwan), UTF-8 and UTF-16. I use the UTF-8 encoding system. Among the Chinese input methods, i use the Pinyin method. Here's how to set them in emacs without using the menu: 【Alt+x set-language-environment UTF-8】 and 【Alt+x set-input-method chinese-py】.
Here's a example of actually typing the Chinese char 美 (meaning beautiful). Type 【Alt+x set-input-method Enter chinese-py】, then type “mei”. Emacs will show you a list of characters with the pronunciation of mei. Type “2” to pick the correct character. Then, emacs will insert the character. To turn off input method, press 【Ctrl+\】.
A in-depth tutorial of using Mac with Chinese is at: http://www.yale.edu/chinesemac/. It includes comprehensive info and resources on Chinese fonts, complete tutorials on several Chinese input methods, etc.
How to find out what's the current input method?
Call describe-variable 【F1 v】 then type “current-input-method”.
I have this character λ on the screen. How to find out its Unicode's hex value or name?
You can find out a char's info by placing your cursor on the character then call describe-char.
Following is the output of describe-char on char “λ” in Emacs 23:
character: λ (955, #o1673, #x3bb)
preferred charset: unicode-bmp (Unicode Basic Multilingual Plane (U+0000..U+FFFF))
code point: 0x03BB
syntax: w which means: word
category: .:Base, G:2-byte Greek, c:Chinese, g:Greek, h:Korean, j:Japanese
buffer code: #xCE #xBB
file code: #xCE #xBB (encoded by coding system utf-8-unix)
display: by this font (glyph code)
uniscribe:-outline-DejaVu Sans Mono-normal-normal-normal-mono-13-*-*-*-c-*-iso10646-1 (#x301)
Unicode data:
Name: GREEK SMALL LETTER LAMDA
Category: Letter, Lowercase
Combining class: Ll
Bidi category: Ll
Old name: GREEK SMALL LETTER LAMBDA
Uppercase: Λ
Titlecase: Λ
Character code properties: customize what to show
name: GREEK SMALL LETTER LAMDA
old-name: GREEK SMALL LETTER LAMBDA
general-category: Ll (Letter, Lowercase)
canonical-combining-class: 0 (Spacing, split, enclosing, reordrant, and Tibetan subjoined)
bidi-class: L (Left-to-Right)
mirrored: N
uppercase: 923 (Λ)
titlecase: 923 (Λ)
There are text properties here:
fontified t
Unicode version 6 (released in 2010-10) added about 1k more symbols. Emacs 23.2 does not have info on these new symbols. (e.g. 😸 GRINNING CAT FACE WITH SMILING EYES) (➲ Unicode 6 Emoticons)
You can get these info by downloading a Unicode data file and let emacs know where it is. Download it at: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt, then, place the following code in your “.emacs”.
;; set Unicode data file location. (used by what-cursor-position and describe-char) (let ((x "~/emacs.d/UnicodeData.txt")) (when (file-exists-p x) (setq describe-char-unicodedata-file x)))
Select the above code, then call eval-region. Then, you will have full Unicode char info when calling describe-char.
See also: xub Unicode Browser mode for Emacs.