Walker News

Using Linux Awk Regular Expression To Read Big Log File

Before I know there is a DB2 utility called db2diag to analysis db2 diagnostic log file content, I use a Linux command called awk with its regular expression function to extract diagnostic entries of the particular day on a specific hour.

Although I’m not a certified awk programmer, I consider myself familiar with this scripting language that’s efficient for processing files of text.

So, when a DB2 server crashes, it’s quite easy for me to analysis an unmanaged db2diag.log file that has grown too big to open in a general-purpose text editor (e.g. Vi or Vim editor in Linux).

Anyway, db2diag is only good for db2diag.log file. If you would like to open and read a huge log file of other application, the Linux awk and tail commands might work (or save your back)!

How to use Linux awk programming and regular expression to read a big log file?

Use the Linux tail command to analysis the log file content, in order to understand log entries pattern.

Using the db2diag.log as an example, each event / incident is initiated with a line that contains date and time:
2008-01-02- I1840G300          LEVEL: Event

Then, I use the awk and its regular expression to filter out all log entries that match the particular day and hour of interest:

First, find out the record number of first log entry that match the date and time pattern using its regular expression (RegEx) function:
awk '{if ($1 ~ /2008-01-16-17/){print NR}}' < db2diag.log | head -1

Next, find out the record number of last log entry that match the date and time pattern:
awk '{if ($1 ~ /2008-01-16-17/){print NR}}' < db2diag.log | tail -1

Finally, use awk again to extract or filter all log entries within the range of first and last record numbers that we’ve known from last two steps:
awk '{if (NR >= 7529 && NR <= 8382){print $0}}' < db2diag.log

Because the nature of db2diag.log, the last record number I get from awk doesn’t include the detail of DB2 event / incident happened on that particular time. Thus, I purposely top up the “last record number” (suppose the last record number reported by awk command is 8382, I rest it to be 8390):
awk '{if (NR >= 7529 && NR <= 8390){print $0}}' < db2diag.log >tempfile

If you would like to output the extracted log entries to another temporarily file, just redirect the standard output of awk command to a temp file as you wish (e.g. append >tempfile to the end of last awk command sample).

Brief note about the awk programming syntax used in the sample codes at above:

$1 ~ /2008-01-16-17/ means to check if 1st field/column text pattern matches with the regular expression (i.e. 2008-01-16-17).
Unless the field separator (FS) is specified, awk regards space as field separator by default.

The first field (a.k.a column) of a line (awk treats each line as a record) is denoted as $1, 2nd field as $2, and so forth. The $0 is simply means all the fields/columns of the line/record.

Thus, the combination of awk programming and organized text files can form a simple database system!

The awk regular expression pattern is enclosed by a pair of slash character (/).

The awk RegEx operator for match comparison is a tilde/swung dash character (~). (refer to GNU awk notes on Regular Expression).

print NR is meant to print the record number (NR), i.e. the line number in the log file. To print the number of field/column in a line/record, use NF.

Custom Search

  1. Mark Sanborn 07-02-09@02:18

    I definitely need to learn more awk. This helps, thanks.

2018  •  Privacy Policy