AWK as Grep
Structure of AWK programs
AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.
— Alfred V. Aho[13]
An AWK program is a series of pattern action pairs, written as:
BEGIN {
# init code goes here
}
# "body" of the script follows:
condition 1 or /pattern-1/ {
# action1 - what to do with the line matching the pattern?
}
condition n or /pattern-n/ {
# action 1 - what to do with the line matching the pattern? ...
}
END {
# finalizing
}
pattern
is a regular expression, numeric expression, string expression or combinationaction
is executable code, similar to C
awk program structure
condition
is typically an expression and action
is a series of commands. The input is split into records ( lines/rows ), where by default records are separated typicaly by newline characters so that the input is split into lines. The program tests each record (row) against each of the conditions in turn, and executes the action for each condtition that is true.
Either the condition or the action may be omitted. The condition defaults to matching every record. The default action is to print the record that matched condition. This is the same pattern-action structure as sed.
Each line is being exploded into columns
based on the separator
which by default is any number of consecutive white characters. One can change it via the -F
switch or by assigning the FS
variable inside the BEGIN
area.
The "columns" that lines are being exploded into can be accessed via the special variables:
$0 # the whole line
$1 # first column
$2 # second column
...
$n # nth column
BEGIN and END
BEGIN or END causing the action to be executed before or after all records (lines) have been read, or pattern1, pattern2 which matches the range of records starting with a record that matches pattern1 up to and including the record that matches pattern2 before again trying to match against pattern1 on future lines.
BEGIN and END are optional
AWK as linux grep
Table below is basic cheatsheet how use awk
as linux grep:
awk command | Description |
---|---|
awk '{print $1}' file | Print first field for each record in file |
awk '/regex/' file | Print only lines that match regex in file |
awk '!/regex/' file | Print only lines that do not match regex in file |
awk '$2 == "foo"' file | Print any line where field 2 is equal to "foo" in file |
awk '$2 != "foo"' file | Print lines where field 2 is NOT equal to "foo" in file |
awk '$1 ~ /regex/' file | Print line if field 1 matches regex in file |
awk '$1 !~ /regex/' file | Print line if field 1 does NOT match regex in file |
awk '/search_pattern/ { action_to_take_on_matches; another_action; }' file_to_parse
Basic search
For most of the straight forward use cases, you can just use grep to match multiple strings or patterns but for complex use cases, we may consider awk as an alternative. The basic syntax to match a single PATTERN
with awk
would be:
awk '/PATTERN/' FILE
case-insensitive search
To perform case-insensitive search of a string or pattern we can use below syntax:
awk 'BEGIN{IGNORECASE=1} /PATTERN1|PATTERN2|PATTERN3/' FILE
Match multiple patterns with OR condition
To match multiple patterns:
awk '/PATTERN1|PATTERN2|PATTERN3/' FILE
|
in regular-expressions
means logical function or
For example to grep for all the lines having Error or Warning in /var/log/messages we can use:
awk '/Error|warning/' /var/log/messages
But to perform case-insensitive we will use IGNORECASE in this example:
awk 'BEGIN{IGNORECASE=1} /Error|warning/' /var/log/messages
Search for multiple patterns with AND condition
In the above example, we are searching for pattern with OR condition i.e. if either of the multiple provided strings are found, print the respective matched line. But to print the lines when all the provided PATTERN match, we must use AND operator. The syntax would be:
awk '/PATTERN1/ && /PATTERN2/ && /PATTERN3/' FILE
Now we will use this syntax to search for lines containing "Success" and "activated" in our /tmp/somefile
~] awk '/Success/ && /activated/' /tmp/somefile <span class="s font-weight-bold">Success</span>fully **activated** sshd service <span class="s font-weight-bold">Success</span>fully **activated** httpd service
To perform case-insensitive search we will use below syntax:
awk 'BEGIN{IGNORECASE=1} /PATTERN1/ && /PATTERN2/ && /PATTERN3/' FILE
Now we use this syntax in our example:
~] awk 'BEGIN{IGNORECASE=1}; /success/ && /activated/' /tmp/somefile <span class="s font-weight-bold">Success</span>fully **activated** sshd service <span class="s font-weight-bold">Success</span>fully **activated** httpd service
Exclude multiple patterns with awk
We can also exclude certain pre-defined patterns from the search. The general syntax would be:
awk '!/PATTERN1/ && !/PATTERN2/ && !/PATTERN3/' FILE
In this syntax we want to exclude all the three PATTERNS from the search. You can add or remove more patterns in the syntax based on your requirement.
For example, to print all the lines except the ones containing activated
~] awk '!/activated/' /tmp/somefile
Successfully reloaded service
Successfully stopped service
Successfully enabled service