awk

The awk language is powerful and entire books have been written on this utility alone. However you cannot talk about serious shell programming without talking about awk.

Named after its principled authors, Al Aho, Peter Weinberger, and Brian Kernighan, awk gives you a more flexible solution to combining regular expressions with custom actions versus grep.

grep as its name implies (Global Regular Expression Print) is good at printing either a line or matching portion of a line when one or more regular expressions is matched. What it doesn't do so well is:

  • Print an unmatched portion of a line

  • Perform an action other than print

And without piping one invocation of grep into another, you cannot:

  • AND one or more regular expressions (only OR)

  • Include lines matching one regular expression while excluding lines matching another

awk on the other hand makes those things easy.

Commonly when you need to find a line that matches more than one regular expression, you will see:

1 #!/bin/sh
2 echo abc | grep a | grep b | grep c

This produces the expected output of abc but it is inefficient to send the line through three separate invocations of grep when we can send the line to a single invocation of awk to achieve the same results:

1 #!/bin/sh
2 echo abc | awk '/a/ && /b/ && /c/'

The syntax of the awk language is PREDICATE { ACTION } and if { ACTION } is missing and PREDICATE evaluates to true (non-zero), the line is printed. This makes it easy to translate most grep commands into awk. In the above example, awk is told to print the line (the default { ACTION }) when the line contains at least one a, one b, and one c.

Commonly when you need to find a line that matches one regular expression but excludes another, you will see:

1 #!/bin/sh
2 printf 'doghouse\nbirdhouse\n' | grep house | grep -v dog

This produces the expected output of birdhouse but is inefficient because it sends both lines to each grep when a single invocation of awk can process the stream once to produce the same results:

This table will help you translate grep regular expression syntax to awk regex syntax:

Element

Portable grep

Extended grep

awk

Grouping

\( and \)

( and )

( and )

Quantity 1 or more

\+ or \{1,\}

+ or {1,}

+

Quantity 0 or 1

\? or \{0,1\}

? or {0,1}

?

Quantity N

\{N\}

{N}

Not portable *

Quantity N or less

\{,N\}

{,N}

Not portable *

OR

\∣

Word Bounding

\< and \>

\< and \>

Unsupported *

* Portable awk solution offered below.

The syntax of regular expressions in awk is most closely like that of egrep (or grep -E) except that numeric quantifiers are not supported beyond the basic + and ? for quantities "0 or 1" and "1 or more" respectively.

Despite the fact that {N} {,N} {N,} and {N,N} are unsupported regex in many flavors of awk, they can be implemented with a function.

Despite the fact that \< and \> are unsupported regex in any/all flavors of awk, they can be implemented with a function.

Last updated

Was this helpful?