pcre
Perl Compatible Regular Expressions (or pcre) are not supported by grep
. There is a pcregrep
that supports it, but traditional grep
on either FreeBSD or Linux lacks support.
This table compares single-letter pcre character class to POSIX names:
Description
POSIX
PCRE Single-Letter
Word boundaries
See below list for support
\b
Digits (0-9)
[[:digit:]]
or [0-9]
\d
Non-Digits
[^[:digit:]]
or [^0-9]
\D
Whitespace characters
[[:space:]]
or [ \t\r\n\v\f]
\s
Non-whitespace characters
[^[:space:]]
or [^ \t\r\n\v\f]
\S
Alpha-numeric and underscore
[[:alnum:]_]
or [A-Za-z0-9_]
\w
Non-word characters
[^[:alnum:]_]
or [^A-Za-z0-9_]
\W
Word boundary support:
awk
can use(^|[^_[:alnum:]]|$)
grep
can use\(^\|[^_[:alnum:]]\|$\)
or\(\<\|\>\)
egrep
andgrep -E
can use(^|[^_[:alnum:]]|$)
or(\<|\>)
In pcre, the word boundary test (\b
) works for either the left- or right-side of a word. This is different than the \<
and \>
word boundary sequences supported by grep
which have to be used on the appropriate side of a word. This chapter focuses on pcre \b
word bounding for awk
. For information on \<
and \>
support for awk
, see the previous chapter.
Single-letter pcre support can be implemented in awk
for all platforms using:
1 #!/usr/bin/awk -f
2 function expand(seq)
3 {
4 return seq == "\\b" ? "(^|[^_[:alnum:]]|$)" : \
5 seq == "\\d" ? "[[:digit:]]" : \
6 seq == "\\D" ? "[^[:digit:]]" : \
7 seq == "\\s" ? "[[:space:]]" : \
8 seq == "\\S" ? "[^[:space:]]" : \
9 seq == "\\w" ? "[[:alnum:]_]" : \
10 seq == "\\W" ? "[^[:alnum:]_]" : \
11 seq
12 }
13
14 function pcre(re, head, repl, tail, rstr)
15 {
16 tail = re
17 while (match(tail, "\\\\[bdDsSwW]"))
18 {
19 head = substr(tail, 1, RSTART - 1) # text before match
20 repl = substr(tail, RSTART, RLENGTH) # match to replace
21 tail = substr(tail, RSTART + RLENGTH) # text after match
22 if ((match(head, /\\+$/) ? RLENGTH + 1 : 1) % 2 == 1)
23 repl = expand(repl)
24 rstr = rstr head repl
25 }
26 return rstr tail
27 }
28
29 # Test code for processing sample regex from stdin or file argument
30 { print $0 " -> " pcre($0) }
Last updated
Was this helpful?