DESCRIPTION
The
re_comp()
function converts a regular expression string (RE) into an internal form
suitable for pattern matching. The
re_exec()
function compares the string pointed to by the string argument with the
last regular expression passed to
re_comp().
If
re_comp()
is called with a null pointer argument, the current regular expression remains
unchanged.
Strings passed to both
re_comp()
and
re_exec()
must be terminated by a null byte, and may include newline characters.
The
re_comp()
and
re_exec()
functions support
simple regular expressions,
which are defined below.
The following one-character REs match a single character:
- 1.1
An ordinary character (not one of those discussed in 1.2 below) is a
one-character RE that matches itself.
- 1.2
A backslash (\) followed by any special character is a one-character RE that
matches the special character itself. The special characters are:
- a.
., *, [, and
\ (period, asterisk, left square bracket,
and backslash, respectively), which are always special, except
when they appear within square brackets ([]; see 1.4 below).
- b.
^ (caret or circumflex), which is special at the
beginning of an entire RE (see 3.1 and 3.2 below), or when it immediately
follows the left of a pair of square brackets ([]) (see 1.4
below).
- c.
$ (dollar symbol), which is special at the end of an entire RE (see
3.2 below).
- d.
The character used to bound (delimit) an entire RE, which is special for that
RE.
- 1.3
A period (.) is a one-character RE that matches any character except
new-line.
- 1.4
A non-empty string of characters enclosed in square brackets ([])
is a one-character RE that matches any one
character in that string.
If, however, the first character of the string is a circumflex
(^), the one-character RE matches any character except
new-line and the remaining characters in the string.
The ^ has this special meaning only if it occurs first in the string.
The minus (-) may be used to indicate a range of consecutive
ASCII characters;
for example, [0-9] is equivalent to [0123456789].
The - loses this special meaning if it occurs first (after
an initial ^, if any) or last in the string.
The right square bracket (]) does not terminate such a string when it
is the first character within it (after an initial ^, if any);
for example, []a-f] matches either a right square
bracket (]) or one of the letters a through f inclusive.
The four characters listed in 1.2.a above stand for themselves
within such a string of characters.
The following rules may be used to construct REs from one-character REs:
- 2.1
A one-character RE is a RE that matches whatever the one-character RE matches.
- 2.2
A one-character RE followed by an asterisk (*) is a RE that matches
zero or more occurrences of the one-character RE. If there is any choice, the
longest leftmost string that permits a match is chosen.
- 2.3
A one-character RE
followed by \{m\},
\{m,\}, or
\{m,n\} is a
RE that matches a
range of occurrences of the one-character RE. The values of
m
and
n
must be non-negative integers less than 256;
\{m\} matches
exactly
m
occurrences;
\{m,\} matches
at least
m
occurrences;
\{m,n\} matches
any number
of occurrences
between
m
and
n
inclusive. Whenever a choice exists, the RE matches as many occurrences as
possible.
- 2.4
The concatenation of REs is a RE that matches the concatenation of the strings
matched by each component of the RE.
- 2.5
A RE enclosed between the character sequences \( and \) is a
RE that matches whatever the unadorned RE matches.
- 2.6
The expression \n matches the same string of characters
as was matched by an expression enclosed between \( and \)
earlier in the same RE. Here
n
is a digit; the sub-expression specified is that beginning with the
n -th
occurrence of \( counting from the left.
For example, the expression ^\(.*\)\1$ matches a line
consisting of two repeated appearances of the same string.
Finally, an entire
RE may be constrained to match only an initial segment or final segment of a
line (or both).
- 3.1
A circumflex (^) at the beginning of an entire RE constrains that RE to match
an initial segment of a line.
- 3.2
A dollar symbol ($) at the end of an entire RE constrains that RE
to match a final
segment of a line. The construction ^entire RE$ constrains the
entire RE to match the entire line.
The null RE (that is, //) is equivalent to the last RE encountered.
The behaviour of
re_comp()
and
re_exec()
in locales other than the POSIX locale is unspecified.
RETURN VALUE
The
re_comp()
function returns a null pointer when the string pointed to by the string
argument is successfully converted. Otherwise, a pointer to
an unspecified error message string is returned.
Upon successful completion,
re_exec()
returns 1 if string
matches the last compiled regular expression. Otherwise,
re_exec()
returns 0 if string fails to match the last
compiled regular expression, and -1 if the compiled regular
expression is invalid (indicating an internal error).
ERRORS
No errors are defined.
APPLICATION USAGE
For portability to implementations conforming to earlier versions of this
document,
regcomp()
and
regexec()
are preferred to these functions.
SEE ALSO
regcomp(3C),
<re_comp.h>.
CHANGE HISTORY
First released in Issue 4, Version 2.