eyBuildLib API Reference : eyBuildLib

regex

NAME

regex - regular expressions extend module

ROUTINES

regcomp( ) - compile regular expression
regexec( ) - match a null-terminated string against the precompiled pattern
regfree( ) - free regular expressions resources
regerror( ) - get errmsg string by error code
regcomp2( ) - compile regular expression advanced version
preg_match( ) - perform a regular expression match
preg_match_all( ) - perform regular expression match searches all in target
preg_replace( ) - perform a regular expression search and replace
preg_split( ) - split string by a regular expression
preg_split_free( ) - free split list

DESCRIPTION

This module support POSIX Basic, POSIX Extended, and PERL-Like (see the PREG EXTENDED) regular expressions syntax. If you want to use PERL-Like syntax extended, your should call regcomp2( ), but not regcomp( ), before you call regexec( ) execute a regular expressions. Other PERL-Like routine has prefixed preg_, such as preg_match( ), preg_replace( ).

REGULAR FLAGS

The cflags of regcomp( )/regexec( ) may be the bitwise-or of one or more of the following:

REG_EXTENDED
    Use POSIX Extended Regular Expression syntax  when  interpreting
    regex.   If  not  set,  POSIX Basic Regular Expression syntax is
    used.

REG_ICASE
      Do not differentiate case.  Subsequent  regexec  searches  using
      this pattern buffer will be case insensitive.

REG_NOSUB
      Support  for  substring  addressing  of matches is not required.
      The nmatch and pmatch parameters to regexec are ignored  if  the
      pattern buffer supplied was compiled with this flag set.

REG_NEWLINE
      Match-any-character operators don't match a newline.

      A  non-matching list ([^...])  not containing a newline does not
      match a newline.

      Match-beginning-of-line operator (^) matches  the  empty  string
      immediately  after  a newline, regardless of whether eflags, the
      execution flags of regexec, contains REG_NOTBOL.

      Match-end-of-line operator ($) matches the empty string  immedi-
      ately  before  a  newline, regardless of whether eflags contains
      REG_NOTEOL.

BYTE OFFSETS

Unless REG_NOSUB was set for the compilation of the pattern buffer, it is possible to obtain substring match addressing information. pmatch must be dimensioned to have at least nmatch elements. These are filled in by regexec with substring match addresses. Any unused structure elements will contain the value -1.

The regmatch_t structure which is the type of pmatch is defined in regex.h.

    typedef struct
    {
        regoff_t rm_so;
        regoff_t rm_eo;
    } regmatch_t;

Each rm_so element that is not -1 indicates the start offset of the next largest substring match within the string. The relative rm_eo element indicates the end offset of the match.

POSIX ERROR REPORTING

regerror is used to turn the error codes that can be returned by both regcomp and regexec into error message strings.

regerror is passed the error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer, errbuf_size. It returns the size of the errbuf required to contain the null-terminated error message string. If both errbuf and errbuf_size are non-zero, errbuf is filled in with the first errbuf_size - 1 characters of the error message and a terminating null.

POSIX PATTERN BUFFER FREEING

Supplying regfree with a precompiled pattern buffer, preg will free the memory allocated to the pattern buffer by the compiling process, reg- comp.

RETURN VALUE

regcomp returns zero for a successful compilation or an error code for failure. regexec returns zero for a successful match or REG_NOMATCH for failure.

PREG EXTENDED

The format of pattern parameter of PERL-Like routines should be as following:
    "/regular expression/option"

The double backslash "//" can't be ommit, "option" may be empty or the combination of the following:
    /i - case insensitive, the mean same with option REG_ICASE
    /m - multiline, the mean same with option REG_NEWLINE
    /g - all, this option will be ignore now

Example:
  (1) matches "CSP" case insensitive:  "/CSP/i" 
  (2) matches word jumpover spaces:   "/\s+(\w+)/m"

PERL-Like routines always work together with macro _REGEX( ) to avoid escape the C escape characters. Like following routine, you needn't write double slash(\\) for (\b) and (\w). Macro _REGEX( ) can help you add it. Note, you can't input double quotation marks in _REGEX( ), you should input ` rather than input

  ret = regcomp2(&reg, _REGEX("/\bregex[\w]?/i"), 0);
In PERL-Like regular expression you can use following extended characters:
     | Extended character    | Means
-----|-----------------------|-----------------------------------------
 1   | \b                    | beginning or end of a word  
 2   | \B                    | *not* at the beginning or end of a word.
 3   | \w                    | matches A-Z, a-z, 0-9 and _
 4   | \W                    | matches *not* A-Z, a-z, 0-9 and _
 5   | \d                    | number 0-9
 6   | \D                    | *not* number 0-9
 7   | \s                    | white space character ('\x20', and '\t')
 8   | \S                    | *not* white space character
 9   | \\                    | the backslash character '\'
 10  | \t, \n, \r, \f, \v    | table character
 11  | \cx                   | matches the CTRL + [A-Z] 
 12  | \xhh                  | look it as a HEX number
 13  | \[other character]    | just remove '\', don't change it 
So when want to match a Unicode word, you can directly write as: _REGEX("/[\x80-x7F]{2}/");

PREG EXTENDED MACRO

To match all items in a string, you can use folloing macro replace call preg_match_all( ).

    /* match foreach */
    void preg_match_foreach
        (
        pattern,            /* PERL-Like regular expressions */
        string,             /* string to match */
        mlist,              /* where store match result */
        max,                /* max to store */
        ret                 /* return code, same with preg_match() */
        );
    
    preg_match_next(string, mlist, ret);    /* match next */
    preg_match_break(ret);  /* like break keyword */
    preg_match_clean();     /* clean up */
Example:
    /* this is for loop sentence */
    preg_match_foreach(pattern, string, mlist, 5, ret) 
    {
        if (cnt >= 3)
            preg_match_break(ret);       /* break foreach */

        for (i=0; i<=ret; i++) {
            len = mlist[i].rm_eo - mlist[i].rm_so;
            printf ("%.*s\n", len, string + mlist[i].rm_so);
        }
        
        preg_match_next();      /* contine next */
    }

PREG RETURN VALUE

PERL-Like routines always returned a value less zero if fail to compare, your can get error message with following routine.

    regerror(REG_GETERR(ret), NULL, errbuf, sizeof (errbuf));
Macro REG_GETERR( ) is to get correct error number.

NOTE FOR MATCHES

When you wan't get the matches, you should qoute the targets with a pair of parentheses "()". After then you can get the value one by one from left to right. When parentheses nested, it also follow this rule.

USEFUL RULE

Match but don't store value:
  (?: exp) - is to match, but don't store the value

Laziness instead of greediness:
  *?       - is to match, but at most zero or once
  +?       - is to match, but at most once

INCLUDE

regex.h, sys/types.h


eyBuildLib : Routines

regcomp( )

NAME

regcomp( ) - compile regular expression

SYNOPSIS

int regcomp
    (
    regex_t *    preg,
    const char * regex,
    int          cflags
    )

DESCRIPTION

this routine is to compile a regular expression into a form that is suitable for subsequent regexec searches. regcomp is supplied with preg, a pointer to a pattern buffer storage area; regex, a pointer to the null-terminated string and cflags, flags used to determine the type of compilation.

RETURNS

zero for a successful compilation or an error code for failure

SEE ALSO

regex


eyBuildLib : Routines

regexec( )

NAME

regexec( ) - match a null-terminated string against the precompiled pattern

SYNOPSIS

int regexec
    (
    const regex_t * preg,
    const char *    string,
    size_t          nmatch,
    regmatch_t      pmatch[],
    int             eflags
    )

DESCRIPTION

regexec is to match a null-terminated string against the precompiled pattern buffer, preg. nmatch and pmatch are used to provide information regarding the location of any matches.

RETURNS

returns zero for a successful match or REG_NOMATCH for failure

SEE ALSO

regex


eyBuildLib : Routines

regfree( )

NAME

regfree( ) - free regular expressions resources

SYNOPSIS

void regfree
    (
    regex_t * preg
    )

DESCRIPTION

this routine same regfree( ), except that it always return 0.

RETURNS

NONE

SEE ALSO

regex


eyBuildLib : Routines

regerror( )

NAME

regerror( ) - get errmsg string by error code

SYNOPSIS

size_t regerror
    (
    int             errcode,
    const regex_t * preg,
    char *          errbuf,
    size_t          errbuf_size
    )

DESCRIPTION

this routine turn the error codes that can be returned by both regcomp and regexec into error message strings. errcode for PERL-like routine should get with macro REG_GETERR( ).

RETURNS

OK

SEE ALSO

regex


eyBuildLib : Routines

regcomp2( )

NAME

regcomp2( ) - compile regular expression advanced version

SYNOPSIS

int regcomp2
    (
    regex_t *    preg,
    const char * pattern,
    int          cflags
    )

DESCRIPTION

this routine is to compile a regular expression into a form that is suitable for subsequent regexec searches.

NOTE

You can call regcomp( ) with option "REG_EXTENDED | REG_PERL_LIKE" to get the same feature.

RETURNS

OK, or a none zero error number

SEE ALSO

regex


eyBuildLib : Routines

preg_match( )

NAME

preg_match( ) - perform a regular expression match

SYNOPSIS

int preg_match
    (
    const char * pattern,     /* pattern */
    const char * string,      /* string to match */
    regmatch_t   mlist[],     /* match list */
    int          max          /* max to store */
    )

DESCRIPTION

this routine searches subject for a match to the regular expression given in pattern. If matches list (mlist) is provided, then it is filled with the results of search. mlist[0] will contain the range that matched the full pattern, mlist[1] will have the range that matched the first captured parenthesized subpattern, and so on. The (max) limits the max number to store into matches list (mlist).

  /* test "CSP" exist or not */
  ret = preg_match(_REGEX("/CSP/i"), "Hello CSP/eybuild", NULL, 0);
  printf("CSP %s\n", ret > 0 ? "exist", "not exist");

RETURNS

matches number if ret >= 0, otherwise fail to matches.

SEE ALSO

regex


eyBuildLib : Routines

preg_match_all( )

NAME

preg_match_all( ) - perform regular expression match searches all in target

SYNOPSIS

int preg_match_all
    (
    const char * pattern,     /* pattern */
    const char * string,      /* string to match */
    regmatch_t * pmlist[],    /* to return match list */
    size_t       max          /* max to store */
    )

DESCRIPTION

this routine searches subject for all matches to the regular expression given in pattern. If matches list (pmlist) is provided, then it is filled with the results of search. mlist[0] will contain the range that matched the full pattern, mlist[1] will have the range that matched the first captured parenthesized subpattern, and so on. The (max) limits the max number to store into matches list (mlist).

  regmatch_t *        pmlist = NULL;

  if ((ret=preg_match_all(pattern, string, &plist, -1)) < 0)
      return ERROR;
  
  /* output each matches */
  for (i=1; i<=ret; i++) {
      len = plist[i].rm_eo - plist[i].rm_so;
      printf("%.*s\n", len, string+plist[i].rm_so);
  }

  free(plist);

RETURNS

matches number if ret >= 0, otherwise fail to matches.

SEE ALSO

regex


eyBuildLib : Routines

preg_replace( )

NAME

preg_replace( ) - perform a regular expression search and replace

SYNOPSIS

int preg_replace
    (
    const char * pattern,     /* pattern */
    char *       string,      /* string to replace */
    const char * repstr[],    /* replace rule */
    char * *     ppdest,      /* dest buffer address */
    size_t       size,        /* max size of dest buffer */
    size_t       max          /* max to replace */
    )

DESCRIPTION

this routine perform a regular expression search and replace, you limits the max number to search and replace with paramter max. The parameter repstr is a \0 terminaled character-pointer-array, it is to specify the replacement rule. repstr[0] will replace the first match, and so on.

Paramter ppdest must be the address of a pointer. If the pointer value is NULL, preg_replace( ) will allocate a memory space for it to hold the replace result, otherwize preg_replace( ) will write the replace result into *ppdest.

The replacement rule is NULL terminal pointer array, for each replacement rule, you can includes folling reference:

     | Reference Rule        | Means
-----|-----------------------|-----------------------------------------
 1   | $[0]                  | the first match string  
 2   | $[1]                  | the second match string
 3   | ...                   | ...
 4   | $[*]                  | the same index of matches string
 5   | $[*-1]                | the previous index of matches string
 6   | $[*+1]                | the next index of matches string
 7   | ...                   | ...
 8   | $[-]                  | don't replace this math
 9   | $[#]                  | replace with sequence number (1 ~ n)
If the index outof range avaiable, it will be look as a empty string. Example:
  char * pattern = _REGEX("/(\w+)\s+(\w+)\s+(\w+)/i");
  char   string[256] = "\tabc 123 def ";
  char * pdst = NULL;
  char * repstr[] = {
          "__$[0]",       /* add "__" for first match */
          "**",           /* replace second match into "**" */
          "($[*])",       /* add "()" for the third match */
          NULL            /* remove all others */
  };

  printf("before: %s\n", string);
  if (preg_replace(pattern, string, repstr, &pdst, 256, 1) > 0) {
      printf("result: %s\n", pdst);
      free(pdst);
  }
The $[-] is very useful when want to replace the outer parentheses. Because of the replace rule always replace the inter-parentheses first, so when you want to replace outer parentheses, you can forbidden replace it. When you do it, this routine will try replace outer parentheses.
  char * pattern = _REGEX("/(<(\d{3})>)/i");
  char * repstr[] = {
          "[$[*+1]]",     /* replace <\d{3}> into [\d{3}] */
          "$[-]",         /* forbidden replace this */
          NULL            /* remove all others */
  };
When you want to replace all the source string, you can write as following format. Assume you want to replace string "This car is blue." into "I have a bule car."
  char * pattern = _REGEX("/(This (\w+) is (\w+))/");
  char * string = "This car is blue.";
  char * repstr[10];

  preg_srep_init(repstr, 10, "I have a $[2] $[1].");
  if (preg_replace(pattern, string, repstr, &pdst, 256, 1) > 0) {
      printf("result: %s\n", pdst);
      free(pdst);
  }

RETURNS

replace number if ret >= 0, otherwise fail to matches.

SEE ALSO

regex


eyBuildLib : Routines

preg_split( )

NAME

preg_split( ) - split string by a regular expression

SYNOPSIS

int preg_split
    (
    const char * pattern,     /* pattern */
    const char * string,      /* string to match */
    char *       plist[],     /* to return split result */
    size_t       max          /* max to store, should > 1 */
    )

DESCRIPTION

this routine split string by a regular expression. If matches list (plist) is provided, then it is filled with the results of split. plist[0] will have the range that matched the first captured parenthesized subpattern, and so on. The (max) limits the max number to store into matches list.

NOTE

this routine will be filled NULL for the last one of plist, so it should be larger one pointer-size.

  char *    plist[32];

  if ((ret=preg_split(pattern, string, plist, 32)) < 0)
      return ERROR;

  /* output each matches */
  for (i=0; NULL != plist[i]; i++)
      printf("%s\n", plist[i]);
  preg_split_free(plist);

RETURNS

split number if ret >= 0, otherwise fail to matches.

SEE ALSO

regex


eyBuildLib : Routines

preg_split_free( )

NAME

preg_split_free( ) - free split list

SYNOPSIS

int preg_split_free
    (
    char * plist[]
    )

DESCRIPTION

this routine is to free the list split by preg_split( )

RETURNS

OK/ERROR

SEE ALSO

regex