eyBuildLib API Reference : eyBuildLib
regex - regular expressions extend module
regcomp( ) - compile regular expression
regexec( ) - match a null-terminated string against the precompiled pattern
regfree( ) - free regular expressions resources
regerror( ) - get errmsg string by error code
regcomp2( ) - compile regular expression advanced version
preg_match( ) - perform a regular expression match
preg_match_all( ) - perform regular expression match searches all in target
preg_replace( ) - perform a regular expression search and replace
preg_split( ) - split string by a regular expression
preg_split_free( ) - free split list
This module support POSIX Basic, POSIX Extended, and PERL-Like (see the PREG EXTENDED) regular expressions syntax. If you want to use PERL-Like syntax extended, your should call regcomp2( ), but not regcomp( ), before you call regexec( ) execute a regular expressions. Other PERL-Like routine has prefixed preg_, such as preg_match( ), preg_replace( ).
The cflags of regcomp( )/regexec( ) may be the bitwise-or of one or more of the following:
REG_EXTENDED
Use POSIX Extended Regular Expression syntax when interpreting
regex. If not set, POSIX Basic Regular Expression syntax is
used.REG_ICASE
Do not differentiate case. Subsequent regexec searches using
this pattern buffer will be case insensitive.REG_NOSUB
Support for substring addressing of matches is not required.
The nmatch and pmatch parameters to regexec are ignored if the
pattern buffer supplied was compiled with this flag set.REG_NEWLINE
Match-any-character operators don't match a newline.A non-matching list ([^...]) not containing a newline does not
match a newline.Match-beginning-of-line operator (^) matches the empty string
immediately after a newline, regardless of whether eflags, the
execution flags of regexec, contains REG_NOTBOL.Match-end-of-line operator ($) matches the empty string immedi-
ately before a newline, regardless of whether eflags contains
REG_NOTEOL.
Unless REG_NOSUB was set for the compilation of the pattern buffer, it is possible to obtain substring match addressing information. pmatch must be dimensioned to have at least nmatch elements. These are filled in by regexec with substring match addresses. Any unused structure elements will contain the value -1.
The regmatch_t structure which is the type of pmatch is defined in regex.h.
typedef struct
{
regoff_t rm_so;
regoff_t rm_eo;
} regmatch_t;Each rm_so element that is not -1 indicates the start offset of the next largest substring match within the string. The relative rm_eo element indicates the end offset of the match.
regerror is used to turn the error codes that can be returned by both regcomp and regexec into error message strings.
regerror is passed the error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer, errbuf_size. It returns the size of the errbuf required to contain the null-terminated error message string. If both errbuf and errbuf_size are non-zero, errbuf is filled in with the first errbuf_size - 1 characters of the error message and a terminating null.
Supplying regfree with a precompiled pattern buffer, preg will free the memory allocated to the pattern buffer by the compiling process, reg- comp.
regcomp returns zero for a successful compilation or an error code for failure. regexec returns zero for a successful match or REG_NOMATCH for failure.
The format of pattern parameter of PERL-Like routines should be as following:
"/regular expression/option"The double backslash "//" can't be ommit, "option" may be empty or the combination of the following:
/i - case insensitive, the mean same with option REG_ICASE
/m - multiline, the mean same with option REG_NEWLINE
/g - all, this option will be ignore nowExample:
(1) matches "CSP" case insensitive: "/CSP/i"
(2) matches word jumpover spaces: "/\s+(\w+)/m"PERL-Like routines always work together with macro _REGEX( ) to avoid escape the C escape characters. Like following routine, you needn't write double slash(\\) for (\b) and (\w). Macro _REGEX( ) can help you add it. Note, you can't input double quotation marks in _REGEX( ), you should input ` rather than input
ret = regcomp2(®, _REGEX("/\bregex[\w]?/i"), 0);In PERL-Like regular expression you can use following extended characters:| Extended character | Means -----|-----------------------|----------------------------------------- 1 | \b | beginning or end of a word 2 | \B | *not* at the beginning or end of a word. 3 | \w | matches A-Z, a-z, 0-9 and _ 4 | \W | matches *not* A-Z, a-z, 0-9 and _ 5 | \d | number 0-9 6 | \D | *not* number 0-9 7 | \s | white space character ('\x20', and '\t') 8 | \S | *not* white space character 9 | \\ | the backslash character '\' 10 | \t, \n, \r, \f, \v | table character 11 | \cx | matches the CTRL + [A-Z] 12 | \xhh | look it as a HEX number 13 | \[other character] | just remove '\', don't change itSo when want to match a Unicode word, you can directly write as: _REGEX("/[\x80-x7F]{2}/");
To match all items in a string, you can use folloing macro replace call preg_match_all( ).
/* match foreach */ void preg_match_foreach ( pattern, /* PERL-Like regular expressions */ string, /* string to match */ mlist, /* where store match result */ max, /* max to store */ ret /* return code, same with preg_match() */ ); preg_match_next(string, mlist, ret); /* match next */ preg_match_break(ret); /* like break keyword */ preg_match_clean(); /* clean up */Example:/* this is for loop sentence */ preg_match_foreach(pattern, string, mlist, 5, ret) { if (cnt >= 3) preg_match_break(ret); /* break foreach */ for (i=0; i<=ret; i++) { len = mlist[i].rm_eo - mlist[i].rm_so; printf ("%.*s\n", len, string + mlist[i].rm_so); } preg_match_next(); /* contine next */ }
PERL-Like routines always returned a value less zero if fail to compare, your can get error message with following routine.
regerror(REG_GETERR(ret), NULL, errbuf, sizeof (errbuf));Macro REG_GETERR( ) is to get correct error number.
When you wan't get the matches, you should qoute the targets with a pair of parentheses "()". After then you can get the value one by one from left to right. When parentheses nested, it also follow this rule.
Match but don't store value:
(?: exp) - is to match, but don't store the valueLaziness instead of greediness:
*? - is to match, but at most zero or once
+? - is to match, but at most once
regex.h, sys/types.h
regcomp( ) - compile regular expression
int regcomp ( regex_t * preg, const char * regex, int cflags )
this routine is to compile a regular expression into a form that is suitable for subsequent regexec searches. regcomp is supplied with preg, a pointer to a pattern buffer storage area; regex, a pointer to the null-terminated string and cflags, flags used to determine the type of compilation.
zero for a successful compilation or an error code for failure
regexec( ) - match a null-terminated string against the precompiled pattern
int regexec ( const regex_t * preg, const char * string, size_t nmatch, regmatch_t pmatch[], int eflags )
regexec is to match a null-terminated string against the precompiled pattern buffer, preg. nmatch and pmatch are used to provide information regarding the location of any matches.
returns zero for a successful match or REG_NOMATCH for failure
regfree( ) - free regular expressions resources
void regfree ( regex_t * preg )
this routine same regfree( ), except that it always return 0.
NONE
regerror( ) - get errmsg string by error code
size_t regerror ( int errcode, const regex_t * preg, char * errbuf, size_t errbuf_size )
this routine turn the error codes that can be returned by both regcomp and regexec into error message strings. errcode for PERL-like routine should get with macro REG_GETERR( ).
OK
regcomp2( ) - compile regular expression advanced version
int regcomp2 ( regex_t * preg, const char * pattern, int cflags )
this routine is to compile a regular expression into a form that is suitable for subsequent regexec searches.
You can call regcomp( ) with option "REG_EXTENDED | REG_PERL_LIKE" to get the same feature.
OK, or a none zero error number
preg_match( ) - perform a regular expression match
int preg_match ( const char * pattern, /* pattern */ const char * string, /* string to match */ regmatch_t mlist[], /* match list */ int max /* max to store */ )
this routine searches subject for a match to the regular expression given in pattern. If matches list (mlist) is provided, then it is filled with the results of search. mlist[0] will contain the range that matched the full pattern, mlist[1] will have the range that matched the first captured parenthesized subpattern, and so on. The (max) limits the max number to store into matches list (mlist).
/* test "CSP" exist or not */ ret = preg_match(_REGEX("/CSP/i"), "Hello CSP/eybuild", NULL, 0); printf("CSP %s\n", ret > 0 ? "exist", "not exist");
matches number if ret >= 0, otherwise fail to matches.
preg_match_all( ) - perform regular expression match searches all in target
int preg_match_all ( const char * pattern, /* pattern */ const char * string, /* string to match */ regmatch_t * pmlist[], /* to return match list */ size_t max /* max to store */ )
this routine searches subject for all matches to the regular expression given in pattern. If matches list (pmlist) is provided, then it is filled with the results of search. mlist[0] will contain the range that matched the full pattern, mlist[1] will have the range that matched the first captured parenthesized subpattern, and so on. The (max) limits the max number to store into matches list (mlist).
regmatch_t * pmlist = NULL; if ((ret=preg_match_all(pattern, string, &plist, -1)) < 0) return ERROR; /* output each matches */ for (i=1; i<=ret; i++) { len = plist[i].rm_eo - plist[i].rm_so; printf("%.*s\n", len, string+plist[i].rm_so); } free(plist);
matches number if ret >= 0, otherwise fail to matches.
preg_replace( ) - perform a regular expression search and replace
int preg_replace ( const char * pattern, /* pattern */ char * string, /* string to replace */ const char * repstr[], /* replace rule */ char * * ppdest, /* dest buffer address */ size_t size, /* max size of dest buffer */ size_t max /* max to replace */ )
this routine perform a regular expression search and replace, you limits the max number to search and replace with paramter max. The parameter repstr is a \0 terminaled character-pointer-array, it is to specify the replacement rule. repstr[0] will replace the first match, and so on.
Paramter ppdest must be the address of a pointer. If the pointer value is NULL, preg_replace( ) will allocate a memory space for it to hold the replace result, otherwize preg_replace( ) will write the replace result into *ppdest.
The replacement rule is NULL terminal pointer array, for each replacement rule, you can includes folling reference:
| Reference Rule | Means -----|-----------------------|----------------------------------------- 1 | $[0] | the first match string 2 | $[1] | the second match string 3 | ... | ... 4 | $[*] | the same index of matches string 5 | $[*-1] | the previous index of matches string 6 | $[*+1] | the next index of matches string 7 | ... | ... 8 | $[-] | don't replace this math 9 | $[#] | replace with sequence number (1 ~ n)If the index outof range avaiable, it will be look as a empty string. Example:char * pattern = _REGEX("/(\w+)\s+(\w+)\s+(\w+)/i"); char string[256] = "\tabc 123 def "; char * pdst = NULL; char * repstr[] = { "__$[0]", /* add "__" for first match */ "**", /* replace second match into "**" */ "($[*])", /* add "()" for the third match */ NULL /* remove all others */ }; printf("before: %s\n", string); if (preg_replace(pattern, string, repstr, &pdst, 256, 1) > 0) { printf("result: %s\n", pdst); free(pdst); }The $[-] is very useful when want to replace the outer parentheses. Because of the replace rule always replace the inter-parentheses first, so when you want to replace outer parentheses, you can forbidden replace it. When you do it, this routine will try replace outer parentheses.char * pattern = _REGEX("/(<(\d{3})>)/i"); char * repstr[] = { "[$[*+1]]", /* replace <\d{3}> into [\d{3}] */ "$[-]", /* forbidden replace this */ NULL /* remove all others */ };When you want to replace all the source string, you can write as following format. Assume you want to replace string "This car is blue." into "I have a bule car."char * pattern = _REGEX("/(This (\w+) is (\w+))/"); char * string = "This car is blue."; char * repstr[10]; preg_srep_init(repstr, 10, "I have a $[2] $[1]."); if (preg_replace(pattern, string, repstr, &pdst, 256, 1) > 0) { printf("result: %s\n", pdst); free(pdst); }
replace number if ret >= 0, otherwise fail to matches.
preg_split( ) - split string by a regular expression
int preg_split ( const char * pattern, /* pattern */ const char * string, /* string to match */ char * plist[], /* to return split result */ size_t max /* max to store, should > 1 */ )
this routine split string by a regular expression. If matches list (plist) is provided, then it is filled with the results of split. plist[0] will have the range that matched the first captured parenthesized subpattern, and so on. The (max) limits the max number to store into matches list.
this routine will be filled NULL for the last one of plist, so it should be larger one pointer-size.
char * plist[32]; if ((ret=preg_split(pattern, string, plist, 32)) < 0) return ERROR; /* output each matches */ for (i=0; NULL != plist[i]; i++) printf("%s\n", plist[i]); preg_split_free(plist);
split number if ret >= 0, otherwise fail to matches.
preg_split_free( ) - free split list
int preg_split_free ( char * plist[] )
this routine is to free the list split by preg_split( )
OK/ERROR