HPlogo HP-UX Reference Volume 4 of 5 > r

regcmp(3X)

» 

Technical documentation

Complete book in PDF

 » Table of Contents

 » Index

NAME

regcmp(), regex() — compile and execute regular expression

SYNOPSIS

#include <libgen.h> char *regcmp( const char *string1, /* string2, */ ... /*, (char *)0 */ ); char *regex(const char *re, const char *subject, ...); extern char *__loc1;

Remarks

The ANSI C ", ... " construct denotes a variable length argument list whose optional [or required] members are given in the associated comment (/* */).

Features documented in this manual entry are obsolescent and may be removed in a future HP-UX release. Use of regcomp(3C) instead is recommended.

DESCRIPTION

regcmp() compiles a regular expression and returns a pointer to the compiled form. malloc(3C) is used to create space for the vector. It is the user's responsibility to free unneeded space so allocated. A NULL return from regcmp() indicates an incorrect argument.

regex() executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. regex() returns NULL on failure, or a pointer to the next unmatched character on success. A global character pointer __loc1 points to where the match began. regcmp() and regex() were largely borrowed from the editor, ed(1); however, the syntax and semantics have been changed slightly. The following are the valid symbols and their associated meanings:

[]*.^

These symbols retain their current meaning.

$

Matches the end of the string; \n matches a new-line.

-

Used within brackets the hyphen signifies a character range. For example, [a-z] is equivalent to [abcd...xyz]. The - can represent itself only if used as the first or last character. For example, the character class expression []-] matches the characters ] and -.

+

A regular expression followed by + means one or more times. For example, [0-9]+ is equivalent to [0-9][0-9]*.

{m} {m,} {m,u}

Integer values enclosed in { } indicate the number of times the preceding regular expression can be applied. The value m is the minimum number and u is a maximum number, which must be no greater than 256. The syntax {m} indicates the exact number of times the regular expression can be applied. The syntax {m,} is analogous to {m,infinity}. The plus (+) and asterisk (*) operations are equivalent to {1,} and {0,} respectively.

(...)$n"

The value of the enclosed regular expression is returned. The value is stored in the (n+1)th argument following the subject argument. A maximum of ten enclosed regular expressions are allowed. regex() makes its assignments unconditionally.

(...)

Parentheses are used for grouping. An operator, such as *, +, or { }, can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0.

Since all of the above defined symbols are special characters, they must be escaped to be used as themselves.

EXAMPLES

Match a leading new-line in the subject string to which the cursor points.

char *cursor, *newcursor, *ptr; ... newcursor = regex((ptr = regcmp("^\n", 0)), cursor); free(ptr);

Match through the string Testing3 and return the address of the character after the last matched character (cursor+11). The string Testing3 will be copied to the character array ret0.

char ret0[9]; char *newcursor, *name; ... name = regcmp("([A-Za-z][A-Za-z0-9_]{0,7})$0", 0); newcursor = regex(name, "123Testing321", ret0);

WARNINGS

User programs that use regcmp() might run out of memory if regcmp() is called iteratively without freeing vectors that are no longer required.

© Hewlett-Packard Development Company, L.P.