HPlogo HP-UX Reference > R

regcmp(3X)

TO BE OBSOLETED
HP-UX 11i Version 2: December 2007 Update
» 

Technical documentation

 » Table of Contents

 » Index

NAME

regcmp(), regex() — compile and execute regular expression

SYNOPSIS

#include <libgen.h> char *regcmp( const char *string1, /* string2, */ ... /*, (char *)0 */ ); char *regex(const char *re, const char *subject, ...); extern char *__loc1;

Remarks

The ANSI C ", ... " construct denotes a variable length argument list whose optional [or required] members are given in the associated comment (/* */).

Features documented in this manual entry are obsolescent and may be removed in a future HP-UX release. Use of regcomp(3C) instead is recommended.

DESCRIPTION

regcmp() compiles a regular expression and returns a pointer to the compiled form. malloc(3C) is used to create space for the vector. It is the user's responsibility to free unneeded space so allocated. A NULL return from regcmp() indicates an incorrect argument.

regex() executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. regex() returns NULL on failure, or a pointer to the next unmatched character on success. A global character pointer __loc1 points to where the match began. regcmp() and regex() were largely borrowed from the editor, ed(1); however, the syntax and semantics have been changed slightly. The following are the valid symbols and their associated meanings:

[]*.^

These symbols retain their current meaning.

$

Matches the end of the string; \n matches a new-line.

-

Used within brackets the hyphen signifies a character range. For example, [a-z] is equivalent to [abcd...xyz]. The - can represent itself only if used as the first or last character. For example, the character class expression []-] matches the characters ] and -.

+

A regular expression followed by + means one or more times. For example, [0-9]+ is equivalent to [0-9][0-9]*.

{m} {m,} {m,u}

Integer values enclosed in { } indicate the number of times the preceding regular expression can be applied. The value m is the minimum number and u is a maximum number, which must be no greater than 256. The syntax {m} indicates the exact number of times the regular expression can be applied. The syntax {m,} is analogous to {m,infinity}. The plus (+) and asterisk (*) operations are equivalent to {1,} and {0,} respectively.

(...)$n"

The value of the enclosed regular expression is returned. The value is stored in the (n+1)th argument following the subject argument. A maximum of ten enclosed regular expressions are allowed. regex() makes its assignments unconditionally.

(...)

Parentheses are used for grouping. An operator, such as *, +, or { }, can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0.

Since all of the above defined symbols are special characters, they must be escaped to be used as themselves.

EXAMPLES

Match a leading new-line in the subject string to which the cursor points.

char *cursor, *newcursor, *ptr; ... newcursor = regex((ptr = regcmp("^\n", 0)), cursor); free(ptr);

Match through the string Testing3 and return the address of the character after the last matched character (cursor+11). The string Testing3 will be copied to the character array ret0.

char ret0[9]; char *newcursor, *name; ... name = regcmp("([A-Za-z][A-Za-z0-9_]{0,7})$0", 0); newcursor = regex(name, "123Testing321", ret0);

WARNINGS

User programs that use regcmp() might run out of memory if regcmp() is called iteratively without freeing vectors that are no longer required.

Obsolescent Interfaces

regcmp() and regex() are to be obsoleted at a future date.