Xah Lee, , …,
This page gives a practical example of writing a emacs major mode to do syntax coloring of your own language. You should have few months experience of coding emacs lisp. If you don't know elisp, first take a look at Emacs Lisp Basics.
Your company uses its own in-house language. You want to write a major mode for that language, so that the keywords of the language will be highlighted.
Suppose your language source code looks like this:
Sin[x]^2 + Cos[y]^2 == 1
Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]
You want the words “Sin”, “Cos”, “Sum”, colored as functions, and “Pi” and “Infinity” colored as constants.
Here's the code:
(setq myKeywords '(("Sin\\|Cos\\|Sum" . font-lock-function-name-face) ("Pi\\|Infinity" . font-lock-constant-face) ) ) (define-derived-mode math-lang-mode fundamental-mode (setq font-lock-defaults '(myKeywords)) (setq mode-name "math lang") )
The string "Sin\\|Cos\\|Sum" is a regex, the “font-lock-function-name-face” is a pre-defined variable that holds the value for the default font face used for function keywords.
The line define-derived-mode defines your mode, named “math-lang-mode”, based on the fundamental-mode (which is the most basic mode). The line
(setq font-lock-defaults '(myKeywords))
sets up the syntax highlighting for your mode.
The line (setq mode-name "math lang") gives a easy name to be displayed on the status line, so users know what mode they are in. Otherwise it'll show as *invalid*.
That's all there is to it. Now, just select the above code and call eval-region to let emacs know about it. Now, when you call “math-lang-mode”, emacs will now syntax color the buffer's text. (you must have font-lock-mode on, if not, call font-lock-mode.) Here's what it looks like:
Sin[x]^2 + Cos[y]^2 == 1 Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]
OMG, Emacs is beautiful!
Here's another simple example: Emacs Lisp: html6-mode.
Typically, a language has hundreds of keywords. Elisp has a way to generate regex for your keywords.
Suppose you are writing a mode for the Linden Scripting Language (LSL). LSL has about 553 keywords. First, here's a sample of LSL source code so you get some idea of how we want it colored.
// comment starts with two slashes // Examples of variable declaration and assignment: integer score = 0; string mySay = "i ♥ you"; vector v = <3,4,5>; list myList= [2,4,7,3]; // Example of defining a function. // built-in function's names start with “ll” (Linden Library). integer sum(integer a, integer b) { integer result = a + b; return result; } default { state_entry() { llSay(0, mySay); } touch_start(integer total_number) { if (score == 1) { llSay(0, mySay); } else { llWhisper(0, "Ouch!"); } } }
Each type of keyword uses a different color:
In the following, first, we define the group of words to be colored differently.
;; define several class of keywords (defvar mylsl-keywords '("break" "default" "do" "else" "for" "if" "return" "state" "while") "LSL keywords.") (defvar mylsl-types '("float" "integer" "key" "list" "rotation" "string" "vector") "LSL types.") (defvar mylsl-constants '("ACTIVE" "AGENT" "ALL_SIDES" "ATTACH_BACK") "LSL constants.") (defvar mylsl-events '("at_rot_target" "at_target" "attach") "LSL events.") (defvar mylsl-functions '("llAbs" "llAcos" "llAddToLandBanList" "llAddToLandPassList") "LSL functions.")
In the above, we defined several variables that hold lists. Each list is a category of keywords in LSL language. (For real LSL mode, each list may have hundreds of elements.)
Now we generate the regex for each keyword group:
;; create the regex string for each class of keywords (defvar mylsl-keywords-regexp (regexp-opt mylsl-keywords 'words)) (defvar mylsl-type-regexp (regexp-opt mylsl-types 'words)) (defvar mylsl-constant-regexp (regexp-opt mylsl-constants 'words)) (defvar mylsl-event-regexp (regexp-opt mylsl-events 'words)) (defvar mylsl-functions-regexp (regexp-opt mylsl-functions 'words))
In the above, we generate the regex for each keyword group, using
the built-in function regexp-opt. We
give regexp-opt a second optional
argument 'words. This will create a regex that match only
if it is a complete word. So that, when a word is contained inside a
longer word, it will not be highlighted. (For example, “for” is
usually a keyword for looping, but if you have a user-defined function
named “inform”, you don't want part of the word colored as “for”.)
(info "(elisp) Regexp Functions")
;; clear memory (setq mylsl-keywords nil) (setq mylsl-types nil) (setq mylsl-constants nil) (setq mylsl-events nil) (setq mylsl-functions nil)
In the above, we clear the lists to save memory, because we don't need it anymore.
;; create the list for font-lock. ;; each class of keyword is given a particular face (setq mylsl-font-lock-keywords `( (,mylsl-type-regexp . font-lock-type-face) (,mylsl-constant-regexp . font-lock-constant-face) (,mylsl-event-regexp . font-lock-builtin-face) (,mylsl-functions-regexp . font-lock-function-name-face) (,mylsl-keywords-regexp . font-lock-keyword-face) ;; note: order above matters. “mylsl-keywords-regexp” goes last because ;; otherwise the keyword “state” in the function “state_entry” ;; would be highlighted. ))
In the above, we create a list in preparation to feed it to “font-lock-defaults”.
Note that the highlighting mechanism of “font-lock-defaults” is based on first-come-first-serve basis. Once a piece of text got its coloring, it won't be changed. So, the order of your list is important. Make sure the smallest lengthed text goes last. (this won't fix all cases where a keyword matches part of other keywords. If your language has a lot such keywords, you need to use other forms to solve this problem. (info "(elisp) Search-based Fontification"))
The `( ,a ,b …) is a lisp special syntax to evaluate parts of elements inside the list. Inside the paren, elements preceded by a , will be evaluated.
Finally, we define our mode like this:
;; define the mode (define-derived-mode mylsl-mode fundamental-mode "lsl mode" "Major mode for editing LSL (Linden Scripting Language)…" ;; code for syntax highlighting (setq font-lock-defaults '((mylsl-font-lock-keywords))) ;; clear memory (setq mylsl-keywords-regexp nil) (setq mylsl-types-regexp nil) (setq mylsl-constants-regexp nil) (setq mylsl-events-regexp nil) (setq mylsl-functions-regexp nil) )
In the above, we based our mode on fundamental-mode, which is the most basic mode. If you are actually writing a mode for LSL, it makes sense to base it on c-mode, because the syntax is similar. Basing on a similar language's mode will save you time in coding many features, such as handling comment and indentation.
At the end of your mode, you should add a provide, like this:
(provide 'mylsl-mode)
When a file with this line is loaded, emacs will add the symbol 'mylsl-mode to the variable named “features” (its value is a list). When some file calls (require 'mylsl-mode), emacs will first check if that symbol is in the variable “features” list, if not, then proceed to load it.
For detail, see: Emacs Lisp's Library System: What's require, load, load-file, autoload, feature?.
For comment syntax coloring, you need to use syntax table.
To have a command that does commenting and uncommenting, you'll need to write your own function. See: Emacs Lisp: Implementing Comment Handling in a Major Mode.
There are several names associated with a major mode:
Also, all the symbols in your source code should start with some prefix such as “mylsl-”, because elisp does not have namespaces or a module system. You need to understand the basics of these issues. See: How to Name Your Emacs Major Mode.
In this tutorial, we only covered syntax coloring of fixed strings. For many language, the syntax coloring are not fixed set of strings. For example, in XML, you have <xyz>…</xyz> pattern where the “xyz” can be anything.
In many languages, you have both fixed strings as keywords as well as complex patterns. For example, in HTML, when you have <b>important</b>, you also want to color the enclosed text. Look at this rendering of “html-mode”:
<h1>Some String Inside Pattern Needs Color Depending On Tag</h1> <a href="http://example.org/">complexity in coloring links</a> <p>Note nesting issues <b>here</b>.</p>
Note the bold and large text inside <h1> tag. Even though the text isn't any of the keyword in HTML language, but it needs to be syntax colored in a particular way. Same for other parts above.
Besides syntax coloring, a full featured language mode should also handle comments, indentation, keyword completion, function documentation lookup, function template insertion, graphical menus, supporting emacs's customize-group scheme, or any other features that may be useful for coding the language your mode is designed for.
The following will help you implement other features for a major mode:
(info "(elisp) Major Mode Conventions")
blog comments powered by Disqus