Python: Find & Replace Mulitple Pairs of Strings

Advertise Here

, 2005-02, 2011-02-08

Python

The following script lets you do find and replace of multiple find replace pairs in one shot.

I have many classic novels on my website. Often, they use straight double quotes instead of curly quotes. I wrote this script to replace double quotes by curly ones. The algorithm is heuristic based on ajacent characters.

For example, here's the find strings and replace strings:

find stringreplacement string
<p>"<p>“
" "” “
!"!”
?"?”
\n"\n“
, ", “
: ": “
.".”
,",”
# -*- coding: utf-8 -*-
# Python

import os,sys

mydir= '/Users/t/web/p/arabian_nights'

findreplace = [
('<p>"' , '<p>“'),
('" "' , u'” “'),
('!"' , u'!”'),
('?"' , u'?”'),
('\n"' , u'\n“'),
(', "' , u', “'),
(': "' , u': “'),
('."' , u'.”'),
(',"' , u',”'),
('<p>' , '\n<p>')
]


def replaceStringInFile(filePath):
   "replaces all findStr by repStr in file filePath"
   print filePath
   tempName=filePath+'~~~'
   input = open(filePath)
   output = open(tempName,'w')
   s=input.read()
   for couple in findreplace:
       outtext=s.replace(couple[0],couple[1])
       s=outtext
   output.write(outtext)
   output.close()
   input.close()
   os.rename(tempName,filePath)


def myfun(dummy, dirr, filess):
    for child in filess:
        if '.html' == os.path.splitext(child)[1] and os.path.isfile(dirr+'/'+child):
            replaceStringInFile(dirr+'/'+child)
            print child
os.path.walk(mydir, myfun, 3)

You can use this multi-pair find and replace script on other tasks. For example, you can use it to replace greek letter names by their actual letter alpha α, beta β, gamma γ, Pi π, Infinity ∞, etc.

Note: turning straight quotes into curly quotes is not a mechanical process. Partly because, in the convention of novel printing, the ending quotes are sometimes omitted when quotation is paragraph long. Therefore, you cannot assume that the quotes are matched. Even if they are, it's bad to make this assumption because one missing quote would screw up the rest of the text. So, instead, we use a heuristic approach, based on adjacent characters and make a guess whether that straight quote is the opening or closing quote. Proof reading still needs to be done afterwards.

See also: Perl: Find & Replace on Multiple Files.

blog comments powered by Disqus