Pre-declaring Arrays

Using an array operation on variables intended to later be used as arrays in your awk BEGIN statements can make debugging larger scripts easier, especially if multiple people are working on the same code. Shorter scripts and one-liners probably shouldn't bother pre-declaring arrays as-described.

Attempting to assign a value to a scalar variable that has the same name as an array will produce an error message.

On Linux:

$ awk --version | awk NR==1
GNU Awk 3.1.7

$ awk 'BEGIN { foo = ""; foo["bar"] = 1 }'
awk: fatal: attempt to use scalar `foo' as array

$ awk 'BEGIN { split("", foo); foo = 1 }'
awk: fatal: attempt to use array `foo' in a scalar context

$ awk 'BEGIN { delete foo; foo = 1 }'
awk: fatal: attempt to use array `foo' in a scalar context

All of these error messages are to be expected and make for quicker diagnosis of problems.

FreeBSD does even better, adding line numbers and pre-scanning.

NOTE: I've separated the two statements with a literal newline (Ctrl-V, Ctrl-J) to demonstrate line-numbers below.

$ awk --version | awk NR==1
awk version 20070501 (FreeBSD)

$ awk 'BEGIN { foo = ""
foo["bar"] = 1 }'
awk: can't assign to foo; it's an array name.
source line number 1

$ awk 'BEGIN { split("", foo)
foo = 1 }'
awk: can't assign to foo; it's an array name.
source line number 2

$ awk 'BEGIN { delete foo
foo = 1 }'
awk: can't assign to foo; it's an array name.
source line number 2

Something to notice about the first command is that, unlike Linux's awk, the error raised is not about line 2 trying to assign to an existing scalar, but an error is raised about the previous line assigning a scalar value to what will eventually be used as an array. This is because FreeBSD's awk [https://svnweb.freebsd.org/base/head/contrib/one-true-awk/] pre-scans all namespaces across the entire script before executing.

$ awk 'BEGIN { foo = "" }
{ foo["bar"] = 1 }'
awk: can't assign to foo; it's an array name.
source line number 1

$ awk 'BEGIN { foo = "" }
function baz() { foo["bar"] = 1 }'
awk: can't assign to foo; it's an array name.
source line number 1

$ awk 'BEGIN { foo = "" }
END { foo["bar"] = 1 }'
awk: can't assign to foo; it's an array name.
source line number 1

Attempting to separate the namespace by using a function doesn't work because the scalar assignment in the BEGIN { ... } block creates a global scalar that collides with any local namespaces. Neither one-true-awk nor gawk allow this (a good thing) and prevent the execution of the script. Both versions prematurely terminate and leave the input(s) untouched (which may be important if you've programmed retries and have the ability to recover somehow).

Using delete to pre-declare an array is supported by both FreeBSD's one-true-awk and Linux's GNU awk (aka gawk):

BEGIN {
    delete myarray
}

If you have to support Operating Systems with different awk implementations (for example, nawk and mawk), you should be able to use the more portable but perhaps less obvious idiom for pre-declaring an array:

BEGIN {
    split("", myarray)
}

Last updated