Development

Every so often you will find yourself needing to write code that traverse a directory. They tend to be one-off scripts or clean up scripts that run in cron in my experience. Anyway, Python provides a very useful methods of walking a directory structure. We cover best of them.

Testing directory structure

Here is my testing filesystem tree. Root is in /test

~] tree -a /test
/test
├── A
│   ├── AA
│   │   └── aa.png
│   ├── a.png
│   └── a.txt
├── B
│   ├── BB
│   └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt

python os.walk()

os.walk()
os.walk(top, topdown=True, onerror=None, followlinks=False)
  • Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames)
  • dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name). Whether or not the lists are sorted depends on the file system. If a file is removed from or added to the dirpath directory during generating the lists, whether a name for that file be included is unspecified.
  • If optional argument topdown is True or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top-down). If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up). No matter the value of topdown, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated.
  • By default, walk() will not walk down into symbolic links that resolve to directories. Set followlinks to True to visit directories pointed to by symlinks, on systems that support them.

The following is a list of variables that awk sets automatically on certain occasions in order to provide information to your program. The variables that are specific to gawk are marked with a pound sign (#). These variables are gawk extensions. In other awk implementations or if gawk is in compatibility mode (see section Command-Line Options ), they are not special.

VARIABLE NAME DESCRIPTION
FS input field separator variable
OFS Output Field Separator
RS Input Record Separator variable
ORS Output Record Separator Variable
NR Number of Records
NF Number of Fields in a record
FILENAME Name of the current input file
FNR Number of Records relative to the current input file
RLENGTH length of the substring matched by the match() function
RSTART first position in the string matched by match() function

FS - input field separator variable

It represents the (input) field separator and its default value is space. You can also change this by using -F command line option.

« 3/7 »