Xah Lee, 2008-06, …, 2012-02-13
This page shows some common programing patterns of emacs lisp for batch text processing. Typically the type of tasks one would do in unix shell tools or Perl. For example, find & replace on a list of given files or dir, process (small sized) log files, compile a bunch of files, generating a report.
If you don't know elisp, see: Emacs Lisp Basics.
Open a file, process it, save, close it.
; open a file, process it, save, close it (defun my-process-file (fPath) "Process the file at FPATH …" (let (myBuffer) (setq myBuffer (find-file fPath)) (widen) (goto-char (point-min)) ;; in case buffer already open ;; do something (save-buffer) (kill-buffer myBuffer)))
For processing hundreds of files, you don't need emacs to keep undo info or fontification. It is hundreds time faster to insert file content into a temp buffer. Like this:
(defun my-process-file (fPath) "Process the file at path FPATH …" (let () ;; create temp buffer, process, when done, then write to fPath (with-temp-file fPath (insert-file-contents fPath) ;; process it … ) ))
If you don't need the result written to the file, use with-temp-buffer.
To read a whole file into a list of lines, you can use this code:
(defun read-lines (fPath) "Return a list of lines of a file at at FPATH." (with-temp-buffer (insert-file-contents fPath) (split-string (buffer-string) "\n" t)))
Once you have a list, you can use mapcar to process each element in the list. If you don't need the resulting list, use mapc.
Note: in elisp, it's more efficient to process text in a buffer than doing complicated string manipulation with string data type. But, if your lines are all short and you don't need to know the line before or after, list of line can be easier to work with. For a example of line by line processing in a buffer, see: Process a File line-by-line in Emacs Lisp.
Commonly used functions to manipulate file names.
(file-name-directory f) ; get dir path (file-name-nondirectory f) ; get file name (file-name-extension f) ; get suffix (file-name-sans-extension f) ; remove suffix (file-relative-name f ) ; get relative path (expand-file-name f ) ; get full path default-directory ; get the current dir (this is a variable)
Commonly used functions to manipulate files and dirs.
(file-exists-p FILENAME) (rename-file FILE NEWNAME &optional OK-IF-ALREADY-EXISTS) (copy-file FILE NEWNAME &optional OK-IF-ALREADY-EXISTS KEEP-TIME PRESERVE-UID-GID) (delete-file FILE) (set-file-modes FILE MODE)
;; get list of file names (directory-files DIR &optional FULL MATCH NOSORT) ;; create a dir. Non existent paren dirs will be created (make-directory DIR &optional PARENTS) ;; copy/delete whole dir (delete-directory DIRECTORY &optional RECURSIVE) ; RECURSIVE option new in emacs 23.2 (copy-directory DIR NEWNAME &optional KEEP-TIME PARENTS) ; new in emacs 23.2
How to find the current elisp script's name programatically?
(or load-file-name buffer-file-name)
Explanation: If your elisp script needs to know its own file name at run time, you need to use the (or load-file-name buffer-file-name), because if user ran your script by eval-buffer, then “load-file-name”'s value would be nil. So, using both {load-file-name, buffer-file-name} is a good way to get the script name regardless whether the script is executed by load or eval buffer.
If you want the full path, call file-name-directory on the result. See also: Emacs Lisp Scripting Quirk: Relative Paths.
Example: make backup file.
(defun make-backup () "Make a backup copy of current buffer's file. Create a backup of current buffer's file. The new file name is the old file name postfixed with “~”, in the same dir. If such a file already exist, append more “~”. If the current buffer is not associated with a file, its a error." (interactive) (let (cfile bfilename) (setq cfile (buffer-file-name)) (setq bfilename (concat cfile "~")) (while (file-exists-p bfilename) (setq bfilename (concat bfilename "~")) ) (copy-file cfile bfilename t) (message (concat "Backup saved as: " (file-name-nondirectory bfilename))) ) )
; idiom for calling a shell command (shell-command "cp /somepath/myfile.txt /somepath") ; idiom for calling a shell command and get its output (shell-command-to-string "ls")
Both shell-command and shell-command-to-string will wait for the shell process to finish before continuing. To not wait, use start-process or start-process-shell-command.
(info "(elisp) Asynchronous Processes")
In the following, “my-process-file” is a function that takes a file full path as input. The “find-lisp-find-files” will generate a list of full paths, using a regex on file name. The mapc will apply the function to elements in a list.
; idiom for traversing a directory (require 'find-lisp) (mapc 'my-process-file (find-lisp-find-files "~/web/emacs/" "\\.html$"))
You can run a elisp program in the Operating System's command line interface (shell), using the --script option. For example:
emacs --script process_log.el
Emacs has few other options and variations to control how you run a elisp script. Here's a table of main options:
| full option name | short key | meaning |
|---|---|---|
--no-init-file | -qDo not load your init files {〔~/.emacs〕, 〔~/.emacs.el〕, 〔~/.emacs.d/init.el〕} nor site-wide〔default.el〕. | |
--no-site-file | ◇| Do not load the site-wide 〔site-start.el〕. | |
--batch | ◇| Do not launch emacs as a editor. Use it together with | --load to specify a lisp file. This implies --no-init-file but not --no-site-file. |
--load="‹path›" | -l ‹path›Execute the elisp file at ‹path›. | |
--script ‹path› | ◇| Run emacs like | --batch with --load set to ‹path›. |
The 〔site-start.el〕 is a init file for site-wide running of emacs. It pretty much means a init file for all users of this emacs installation. It may be added by a sys admin, or it may be part of a particular emacs distribution (e.g. Carbon Emacs, Aquamacs Emacs, ErgoEmacs …). You can usually find this file in the directory where emacs is installed, if it exists. Normally, you shouldn't worry about this file. Only time you need to disable it is if you want a pure GNU Emacs experience (without loading any packages added by third party)
When you write a elisp script to run in batch, make sure your elisp file is self-contained, doesn't call functions in your emacs init file, and do call to load all libraries it needs (using require or load), has necessary load path set in the script (e.g. (add-to-list 'load-path ‹lib path›)), just like you would with a Perl or Python script.
When you write a elisp script to run in batch, make sure your elisp file is:
(add-to-list 'load-path ‹lib path›)) if it needs libs that's not part of standard GNU emacs install, just like you would with a Perl or Python script.If you've done a clean job in your elisp script, then, all you need to use is emacs --script ‹elisp file path›.
If your elisp program requires functions that you've defined in your emacs init file, then you should explicitly load it in your script by (load ‹emacs init file path›), or, you can add the option to load it, like this: --user=xah. (best to actually pull out the function you need)
If you are on a Mac with Carbon Emacs or Aquamacs, call it from the command line like this:
/Applications/Emacs.app/Contents/MacOS/Emacs --script=process_log.el
To get arguments passed from the command line, use the builtin variable “argv”.
See: Getting Command Line Arguments
2010-06-04 Thanks to Rubén Berenguel for a correction.
For some practical examples of batch style text processing, see: