Development
Every so often you will find yourself needing to write code that traverse a directory. They tend to be one-off scripts or clean up scripts that run in cron in my experience. Anyway, Python provides a very useful methods of walking a directory structure. We cover best of them.
Testing directory structure
Here is my testing filesystem tree. Root is in /test
~] tree -a /test
/test
├── A
│ ├── AA
│ │ └── aa.png
│ ├── a.png
│ └── a.txt
├── B
│ ├── BB
│ └── b.txt
├── broken_symlink -> /aaa
├── symlink -> /etc
├── .test
├── test.png
└── test.txt
python os.walk()
os.walk(top, topdown=True, onerror=None, followlinks=False)
- Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple
(dirpath, dirnames, filenames)
- dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '
.
' and '..
'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, doos.path.join(dirpath, name)
. Whether or not the lists are sorted depends on the file system. If a file is removed from or added to the dirpath directory during generating the lists, whether a name for that file be included is unspecified. - If optional argument topdown is True or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top-down). If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up). No matter the value of topdown, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated.
- By default, walk() will not walk down into symbolic links that resolve to directories. Set followlinks to True to visit directories pointed to by symlinks, on systems that support them.
The following is a list of variables that awk sets automatically on certain occasions in order to provide information to your program. The variables that are specific to gawk are marked with a pound sign (#). These variables are gawk extensions. In other awk implementations or if gawk is in compatibility mode (see section Command-Line Options ), they are not special.
VARIABLE NAME | DESCRIPTION |
---|---|
FS | input field separator variable |
OFS | Output Field Separator |
RS | Input Record Separator variable |
ORS | Output Record Separator Variable |
NR | Number of Records |
NF | Number of Fields in a record |
FILENAME | Name of the current input file |
FNR | Number of Records relative to the current input file |
RLENGTH | length of the substring matched by the match() function |
RSTART | first position in the string matched by match() function |
FS - input field separator variable
It represents the (input) field separator and its default value is space. You can also change this by using -F command line option.
In this awk tutorial, let us review awk conditional if statements with practical examples.
Normally conditional statement checks the condition, before performing any action. If the condition is true action(s) are performed. Similarly action can be performed if the condition is false.
Conditional statement starts with the keyword called "if". Awk supports three different kind of if statement.
awk If Statement
Single Action: Simple If statement is used to check the conditions, if the condition returns true, it performs its corresponding action(s).