Find and Replace Mulitple Pairs of Strings

Xah Lee, 2005-02

Python

The following script lets you do find and replace of multiple find/replace pairs of strings in one shot. The string themselves can include line line ending characters such as “\n”.

I have many online version of classical literatures. Often, they use plain double quotes " " instead of the better curly quotes “ ”. I wrote this script to replace double quotes by curly ones, based on ajacent characters. (See the result here: Arabian Nights)

For example, here's the find strings and replace strings:

<p>"    →     <p>“
" "     →     ” “
!"      →     !”
?"      →     ?”
\n"     →     \n“
, "     →     , “
: "     →     : “
."      →     .”
,"      →     ,”

Note: The business of turning double quotes into bracketing curly quotes is not a mechanical process. Partly because, in the convention of novel printing, the ending quotes are sometimes omitted when quotation is paragraph long. (therefore, one cannot assume that quotes comes in matching pairs) Also, even if quotes are supposed to come in matching pairs, but if the program relies on that, a typographical missing quote in the text will screw up the rest of the text. So, instead, we use a heuristic approach. For example, if the text contain «!"» then we are pretty sure that it is a closing quote there, so we replace it with «!”». Similarly for other cases. This method of turning quotes to curly quotes requires proof reading.

# -*- coding: utf-8 -*-
# Python

import os,sys

mydir= '/Users/t/web/p/arabian_nights'

findreplace = [
('<p>"' , '<p>“'),
('" "' , u'” “'),
('!"' , u'!”'),
('?"' , u'?”'),
('\n"' , u'\n“'),
(', "' , u', “'),
(': "' , u': “'),
('."' , u'.”'),
(',"' , u',”'),
('<p>' , '\n<p>')
]


def replaceStringInFile(filePath):
   "replaces all findStr by repStr in file filePath"
   print filePath
   tempName=filePath+'~~~'
   input = open(filePath)
   output = open(tempName,'w')
   s=input.read()
   for couple in findreplace:
       outtext=s.replace(couple[0],couple[1])
       s=outtext
   output.write(outtext)
   output.close()
   input.close()
   os.rename(tempName,filePath)


def myfun(dummy, dirr, filess):
    for child in filess:
        if '.html' == os.path.splitext(child)[1] and os.path.isfile(dirr+'/'+child):
            replaceStringInFile(dirr+'/'+child)
            print child
os.path.walk(mydir, myfun, 3)

You can use this multi-pair find and replace script on other tasks. For example, you can use it to replace greek letter names by their actual letter alpha α, beta β, gamma γ, Pi π, and Infinity ∞ etc.

For a full-featured script that does find-replace in Perl, see: Find & Replace on Multiple Files with Perl


See also:


Page created: 2005-02.
© 2005 by Xah Lee.
Xah Signet