Transformation Language Reference > Functions > REG_EXTRACT
  

REG_EXTRACT

Extracts subpatterns of a regular expression within an input value. For example, from a regular expression pattern for a full name, you can extract the first name or last name.

Syntax

REG_EXTRACT( subject, 'pattern', subPatternNum, match_from_start )
The following table describes the arguments for this command:
Argument
Required/
Optional
Description
subject
Required
String datatype. Passes the value you want to compare against the regular expression pattern.
pattern
Required
String datatype. Regular expression pattern that you want to match. You must use perl compatible regular expression syntax. Enclose the pattern in single quotation marks. Enclose each subpattern in parentheses.
subPatternNum
Optional
Integer value. Subpattern number of the regular expression you want to match. Use the following guidelines to determine the subpattern number:
  • - no value or 1. Extracts the first regular expression subpattern.
  • - 2. Extracts the second regular expression subpattern.
  • - n. Extracts the nth regular expression subpattern.
Default is 1.
match_from_start
Optional
Numeric value. Returns the substring if a match is found from the start of the string. Use the following guidelines to determine the match from start value:
  • - 0. Matches pattern with subject string from the starting index or any index.
  • - Non-zero. Matches pattern with subject string from the starting index.

Using perl Compatible Regular Expression Syntax

You must use perl compatible regular expression syntax with REG_EXTRACT, REG_MATCH and REG_REPLACE functions.
The following table provides perl compatible regular expression syntax guidelines:
Syntax
Description
. (a period)
Matches any one character.
[a-z]
Matches one instance of a character in lower case. For example, [a-z] matches ab. Use [A-Z] to match characters in upper case.
\d
Matches one instance of any digit from 0-9.
\s
Matches a whitespace character.
\w
Matches one alphanumeric character, including underscore (_)
()
Groups an expression. For example, the parentheses in (\d-\d-\d\d) groups the expression \d\d-\d\d, which finds any two numbers followed by a hyphen and any two numbers, as in 12-34.
{}
Matches the number of characters. For example, \d{3} matches any three numbers, such as 650 or 510. Or, [a-z]{2} matches any two letters, such as CA or NY.
?
Matches the preceding character or group of characters zero or one time. For example, \d{3}(-{d{4})? matches any three numbers, which can be followed by a hyphen and any four numbers.
* (an asterisk)
Matches zero or more instances of the values that follow the asterisk. For example, *0 is any value that precedes a 0.
+
Matches one or more instances of the values that follow the plus sign. For example, \w+ is any value that follows an alphanumeric character.
For example, the following regular expression finds 5-digit U.S.A. zip codes, such as 93930, and 9-digit zip codes, such as 93930-5407:
\d{5}(-\d{4})?
\d{5} refers to any five numbers, such as 93930. The parentheses surrounding -\d{4} group this segment of the expression. The hyphen represents the hyphen of a 9-digit zip code, as in 93930-5407. \d{4} refers to any four numbers, such as 5407. The question mark states that the hyphen and last four digits are optional or can appear one time.

Converting COBOL Syntax to perl Compatible Regular Expression Syntax

If you are familiar with COBOL syntax, you can use the following information to write perl compatible regular expressions.
The following table shows examples of COBOL syntax and their perl equivalents:
COBOL Syntax
perl Syntax
Description
9
\d
Matches one instance of any digit from 0-9.
9999
\d\d\d\d
or
\d{4}
Matches any four digits from 0-9, as in 1234 or 5936.
x
[a-z]
Matches one instance of a letter.
9xx9
\d[a-z][a-z]\d
Matches any number followed by two letters and another number, as in 1ab2.

Converting SQL Syntax to perl Compatible Regular Expression Syntax

If you are familiar with SQL syntax, you can use the following information to write perl compatible regular expressions.
The following table shows examples of SQL syntax and their perl equivalents:
SQL Syntax
perl Syntax
Description
%
. *
Matches any string.
A%
A.*
Matches the letter “A” followed by any string, as in Area.
_
. (a period)
Matches any one character.
A_
A.
Matches “A” followed by any one character, such as AZ.

Return Value

Returns the value of the nth subpattern that is part of the input value. The nth subpattern is based on the value you specify for subPatternNum.
NULL if the input is a null value or if the pattern is null.

Example

You might use REG_EXTRACT in an expression to extract middle names from a regular expression that matches first name, middle name, and last name. For example, the following expression returns the middle name of a regular expression:
REG_EXTRACT( Employee_Name, '(\w+)\s+(\w+)\s+(\w+)',2)
Employee_Name
Return Value
Stephen Graham Smith
Graham
Juan Carlos Fernando
Carlos