Xah Lee, , …,
This page shows a example of writing a emacs lisp function that cleans up a file's content by repeated application of find & replace operation.
I want to write a command such that it does find & replace on several pairs of {regex string, replace string}, on the current file.
For example, this text:
Graphics3D[{
Polygon[{{0, -0.00004000, 2.000},
{0, -0.00003978, 2.000},
{-0.01043, -0.09920, 1.995},
{0, -0.09975, 1.995}}],
Polygon[{{0, -0.00003978, 2.000},
{0, -0.00003913, 2.000},
{-0.02074, -0.09757, 1.995},
{-0.01043, -0.09920, 1.995}}],
Polygon[{{0, -0.00003913, 2.000},
{-0.00001236, -0.00003804, 2.000},
{-0.03083, -0.09486, 1.995},
{-0.02074, -0.09757, 1.995}}]
}]
Should become this:
Graphics3D[{Polygon[{{0,-0.000,2.000},{0,-0.000,2.000},{-0.010,-0.099,1.995},{0,-0.099,1.995}}],Polygon[{{0,-0.000,2.000},{0,-0.000,2.000},{-0.020,-0.097,1.995},{-0.010,-0.099,1.995}}],Polygon[{{0,-0.000,2.000},{-0.000,-0.000,2.000},{-0.030,-0.094,1.995},{-0.020,-0.097,1.995}}]}]
Spaces, newline char, digits, are replaced by regex patterns.
I have a website of Math Surface Gallery, which contains a Java applet called JavaView that allows people to view 3D objects with real-time rotation by the mouse. For example, this is one of the Java applet page: Costa surface applet. There are about 70 of such surfaces. Each of these surface has a raw data file that the Java applet reads. For example, for the Costa surface above, the raw data file is: costa.mgs.gz. These files are just Mathematica graphics in plain text, and compressed with gzip.
The content of the file looks like this:
Graphics3D[{{
Polygon[{{3.552, -0.001061, 2.689}, {3.552, 0.03079, 2.689},
{3.025, 0.02634, 2.524}, {3.025, -0.001061, 2.524}}],
Polygon[{{3.552, 0.03079, 2.689}, {3.550, 0.1250, 2.689},
{3.023, 0.1074, 2.524}, {3.025, 0.02634, 2.524}}],
Polygon[{…}],
…
}}]
Because the file contains tens of thousands of polygons, and can take a while for the Java applet to load it from the net. One way to reduce file size is to reduce the number of polygons. But given a file, spaces and newline characters can be deleted, and the decimal numbers can be safely truncated to 3 digits. So, typically, i open the file, call query-replace to replace , to ,, and delete newline chars (replacing \n by empty string), delete multiple spaces. To truncate decimals to 3 places, i call query-replace-regexp with pattern \([0-9]\)\.\([0-9][0-9][0-9]\)[0-9]+ and replace it with \1.\2.
For each file, i have to do multiple replacements. This process gets repetitious. It would be nice, to have a emacs command, so i can just press a button and have all these replacements done. This would reduce some 50 keystrokes and eye-balling into a single brainless button punch.
Here's the solution:
(defun clean-mgs-buffer () "Reduce size of a mgs file by removing whitespace and truncating numbers. This command does several find & replace on the current buffer. Removing spaces, removing new lines, truncate numbers to 3 decimals, etc. The goal of these replacement is to reduce the file size of a Mathematica Graphics file (.mgs) that are read over the net by JavaView." (interactive) (goto-char 1) (while (search-forward "\n" nil t) (replace-match "" nil t)) (goto-char 1) (while (search-forward-regexp " +" nil t) (replace-match " " nil t)) (goto-char 1) (while (search-forward ", " nil t) (replace-match "," nil t)) (goto-char 1) (while (search-forward-regexp "\\([0-9]\\)\\.\\([0-9][0-9][0-9]\\)[0-9]+" nil t) (replace-match "\\1.\\2" t nil)))
This function is very simple. It does a series of replacement using the “while” loop, each time moving the cursor to the beginning of file. The core is the following 3 functions: { search-forward, search-forward-regexp, replace-match}.
The search-forward function takes a string and moves the cursor to the end of the string that matches. search-forward-regexp does similar. The replace-match simply replaces the text matched by the last search.
One interesting aspect about search-forward-regexp is that you must use 2 backslashes to represent one backslash. This is because backslash in emacs string needs a backslash to represent it. Then, this string is passed to emacs's regex engine. (➲ Emacs regex tutorial)
Another thing of interest is that the first 2 optional parameters to replace-match function is “fixedcase” and “literal”, both are booleans. (➲ Emacs Functions Documentation Lookup)
You can use this code as a template, whenever you need a command that replace multiple pairs in the current file.
PS: Note that in this tutorial, each replacement pair is done using a while loop, and each start with (goto-char 1). What if you have lots of pairs?
Won't it be great if you can simply write:
'( ["alpha" "α"] ["beta" "β"] ["gamma" "γ"] )
instead of each with a while loop? For a solution for this, see: Elisp Package: Multi-Pair String Replacement: xfrp_find_replace_pairs.el.
Addendum: here's the Mathematica code to export graphics into a text file forcing all numbers to be printed in a simple d.dddd format.
Otherwise, Mathematica may print numbers in various forms such as
2.25`*^-9,
\(7.2389`\),
3.141592653589793238462643383279503`20.
writeToFileRounded[expr_Graphics3D,fileName_?StringQ,prec_:4]:=Module[{},
OpenWrite[fileName];
WriteString[fileName,"Graphics3D["];
WriteString[fileName,
StringReplace[
ToString@
NumberForm[First@SetPrecision[Chop[expr,10^-(prec+1)],prec],
ExponentFunction\[Rule](If[-Infinity<#<Infinity,Null,#]&)],
"],"->"],\n"]];
WriteString[fileName,"]"];
Close[fileName]
];
writeToFileRounded[surf,"helicoid.ma",4]
(*the first argument is a Graphics3D object, the second is a name to
save to, the third is number of decimal places for the coordinate
values.*)
Emacs ♥
blog comments powered by Disqus