Xah Lee, 2005-02
The following script lets you do find and replace of multiple find/replace pairs of strings in one shot. The string themselves can include line line ending characters such as “\n”.
I have many online version of classical literatures. Often, they use plain double quotes " " instead of the better curly quotes “ ”. I wrote this script to replace double quotes by curly ones, based on ajacent characters. (See the result here: Arabian Nights)
For example, here's the find strings and replace strings:
<p>" → <p>“ " " → ” “ !" → !” ?" → ?” \n" → \n“ , " → , “ : " → : “ ." → .” ," → ,”
Note: The business of turning double quotes into bracketing curly quotes is not a mechanical process. Partly because, in the convention of novel printing, the ending quotes are sometimes omitted when quotation is paragraph long. (therefore, one cannot assume that quotes comes in matching pairs) Also, even if quotes are supposed to come in matching pairs, but if the program relies on that, a typographical missing quote in the text will screw up the rest of the text. So, instead, we use a heuristic approach. For example, if the text contain «!"» then we are pretty sure that it is a closing quote there, so we replace it with «!”». Similarly for other cases. This method of turning quotes to curly quotes requires proof reading.
# -*- coding: utf-8 -*- # Python import os,sys mydir= '/Users/t/web/p/arabian_nights' findreplace = [ ('<p>"' , '<p>“'), ('" "' , u'” “'), ('!"' , u'!”'), ('?"' , u'?”'), ('\n"' , u'\n“'), (', "' , u', “'), (': "' , u': “'), ('."' , u'.”'), (',"' , u',”'), ('<p>' , '\n<p>') ] def replaceStringInFile(filePath): "replaces all findStr by repStr in file filePath" print filePath tempName=filePath+'~~~' input = open(filePath) output = open(tempName,'w') s=input.read() for couple in findreplace: outtext=s.replace(couple[0],couple[1]) s=outtext output.write(outtext) output.close() input.close() os.rename(tempName,filePath) def myfun(dummy, dirr, filess): for child in filess: if '.html' == os.path.splitext(child)[1] and os.path.isfile(dirr+'/'+child): replaceStringInFile(dirr+'/'+child) print child os.path.walk(mydir, myfun, 3)
You can use this multi-pair find and replace script on other tasks. For example, you can use it to replace greek letter names by their actual letter alpha α, beta β, gamma γ, Pi π, and Infinity ∞ etc.
For a full-featured script that does find-replace in Perl, see: Find & Replace on Multiple Files with Perl
See also:
Page created: 2005-02. © 2005 by Xah Lee.