Xah Lee, 2005-09-01
The following is a letter frequency of Chinese in pinyin. The purpose of this study is to find out whether the Dvorak Keyboard Layout is efficient for inputing Chinese with pinyin too.
Tones 4 9714 2 7137 1 6805 3 5125 5 1547 Letters i 12620 n 11269 a 9314 u 7075 g 6922 e 6851 h 6815 o 5519 z 3545 d 3363 s 2585 y 2571 j 2299 l 1522 b 1422 x 1361 c 1150 w 1097 r 1073 m 930 f 925 t 881 q 717 k 448 p 255 v 12 (v is u umlaut as in nv (woman) etc)
This table is compiled by Dylan Sung, taken from his post in newsgroup sci.lang of 2005-08-27, subject: “Letter frequency of Chinese pinyin”. (Source)
Originally, i'm curious about frequency of pinyin because i'm wondering whether Dvorak keyboard is also very efficient in typing pinyin than qwerty.
For the list of letter frequencies of English text, see Wikipedia: Letter frequencies.
The following data are from http://fatduck.org/dvorak/, accessed on 2010-09-22. The author is 潘永之.
The following tables are letter distributions on qwerty and dvorak. The input is a 403 words chinese blog written in pinyin.
| q 0.56% | w 2.01% | e 6.51% | r 0.32% | t 1.77% | y 2.81% | u 7.40% | i 12.70% | o 6.51% | p 0.16% | 40.76% |
| a 13.75% | s 1.53% | d 3.54% | f 0.72% | g 4.18% | h 6.27% | j 1.77% | k 0.80% | l 2.41% | 34.97% | |
| z 1.93% | x 1.61% | c 3.22% | v 0.00% | b 2.01% | n 8.52% | m 1.45% | , 1.61% | . 3.94% | 24.28% |
| ' 0.00% | , 1.61% | . 3.94% | p 0.16% | y 2.81% | f 0.72% | g 4.18% | c 3.22% | r 0.32% | l 2.41% | 19.37% |
| a 13.75% | o 6.51% | e 6.51% | u 7.40% | i 12.70% | d 3.54% | h 6.27% | t 1.77% | n 8.52% | s 1.53% | 68.49% |
| q 0.56% | j 1.77% | k 0.80% | x 1.61% | b 2.01% | m 1.45% | w 2.01% | v 0.00% | z 1.93% | 12.14% |
✻ ✻ ✻
The following is distribution of qwerty and dvorak. The input file is all characters in GB2312, a total of 6727 chars. (chinese_characters_GB2312.txt)
| q 1.54% | w 0.97% | e 4.94% | r 0.53% | t 1.29% | y 2.78% | u 9.94% | i 13.26% | o 6.11% | p 1.10% | 42.46% |
| a 11.80% | s 2.29% | d 1.54% | f 0.97% | g 6.53% | h 6.25% | j 2.49% | k 0.94% | l 2.25% | 35.06% | |
| z 2.63% | x 1.93% | c 2.06% | v 0.12% | b 1.52% | n 12.88% | m 1.35% | 22.48% |
| ' 0.00% | , 0.00% | . 0.00% | p 1.10% | y 2.78% | f 0.97% | g 6.53% | c 2.06% | r 0.53% | l 2.25% | 16.22% |
| a 11.80% | o 6.11% | e 4.94% | u 9.94% | i 13.26% | d 1.54% | h 6.25% | t 1.29% | n 12.88% | s 2.29% | 70.30% |
| q 1.54% | j 2.49% | k 0.94% | x 1.93% | b 1.52% | m 1.35% | w 0.97% | v 0.12% | z 2.63% | 13.48% |