Perl-Python Tutorial: Traverse A Directory

2005-01-27

Python

Suppose you want to walk into a directory, say, to apply a string replacement to all html files. The os.path.walk() rises for the occasion.

# Python
import os
mydir= '/Users/xah/Documents/unix_cilre/python'
def myfun(s1, s2, s3):
     print s2 # current dir
     print s3 # list of files there
     print '------==(^_^)==------'
os.path.walk(mydir, myfun, 'somenull')

The os.path.walk(base_dir,f,arg) will walk a dir tree starting at base_dir, and whenever it sees a directory (including base_dir), it will call f(arg,current_dir,children), where the current_dir is the string of the current directory, and children is a list of all children of the current directory. Specifically: a list of strings that are file names and directory names.

Now, suppose for each file ending in .html we want to apply function g to it. So, when ever myfun is called, we need to loop thru the children list, find files and ending in html, then call g. Here's the code.

import os
mydir= '/Users/xah/web/SpecialPlaneCurves_dir'
def g(s): print "g touched:", s
def myfun(dummy, dirr, filess):
     for child in filess:
         if '.html' == os.path.splitext(child)[1] and os.path.isfile(dirr+'/'+child):
             g(dirr+'/'+child)
os.path.walk(mydir, myfun, 3)

Note that “os.path.splitext()” splits a string into two parts, a portion before the last period, and the rest in the second portion. Effectively it is used for getting file suffix. The “os.path.isfile()” makes sure that this is a actual file and not a dir with “.html” suffix.

One important thing to note: in the mydir, it must not end in a slash. One'd think Python'd take care of such trivia but no. This took me a while to debug. (as of Python 2.4.2, this is fixed.)

Also, the semantics of “os.path.walk()” is nice. The myfun can be a recursive function, calling itself, crystalizing a program's semantic.

Reference: Python Doc↗.

Perl

In Perl, use the package “File::Find”'s “find” function to traverse a dir. Example:

# perl
use File::Find qw(find);
$mydir= '/Users/xah/web/SpecialPlaneCurves_dir';

sub wanted {
    if ($_ =~/\.html$/ && -T $File::Find::name)
      { print $File::Find::name, "\n";}
}

find(\&wanted, $mydir);

The line “use File::Find qw(find);” imports the “find” function. The “find” function is a directory walker. It will visit every file and subdirectorys in a given directory. For each, it sets the variable “$_”'s to the name of the file, sets the variable “$File::Find::name” to the full path of the current file, sets the variable “$File::Find::dir” to the full path of the current dir.

The “find” function has 2 parameters. The first is a reference to a function that will be called each time when “find” visits a file. The second is the path you want to traverse.

Note: The name “wanted” is just a convention used by the “File::Find” package. When your function “wanted” is called, nothing is passed to it as argument. This means, you cannot write your “wanted” function as a functional programing style that takes a file path as its parameter. Instead, you must call the variable “$File::Find::name” or “$_” inside the body of “wanted” to know the current file name.

Note: also, “wanted” cannot be written as a recursive function that calls itself to decent to subdirs.

Reference: perldoc File::Find↗.


See also:


Page created: 2005-01.
© 2005 by Xah Lee.
Xah Signet