You are viewing an older version of this section. View current production version.
Regular Expression Commands
Suppose you want to find the starting position of a string within another string. Call the former the search string and the latter the target string. To find dog
within The dog chased the cat
, you run the query SELECT INSTR('The dog chased the cat','dog');
. This query returns 5
, indicating the position where it found dog
.
But INSTR
does not allow search strings that contain placeholders. The following examples show search strings containing placeholders, denoted by the ?
symbol.
Search String | Placeholder |
---|---|
The dog chased the ? |
Any string |
The ? chased the cat |
Any three letter string |
The dog ? the cat |
The string chased or found |
Regular expressions enable you to define search strings with placeholders. You specify these strings using patterns. For instance, the pattern The .{3,5} chased the mouse|frog
matches many target strings. Two of them are The dog chased the frog
and The rat chased the mouse
.
Patterns contain literals and metacharacters. Literals are characters that a pattern matcher attempts to locate in the target string exactly as the characters are written in the search string. Metacharacters describe the placeholders that the pattern uses.
The pattern The .{3,5} chased the mouse|frog
contains the literals The
,chased
,the
,mouse
and frog
. It contains the metacharacters .
,{3,5}
and |
.
Regular Expression Formats
MemSQL supports extended regular expressions (ERE) as defined in the POSIX standard and advanced regular expressions (ARE) as defined in PostgreSQL. ARE are nearly a superset of ERE and support common Perl regular expression extensions.
You can use both formats with the built-in MemSQL functions RLIKE
, REGEXP
,REGEXP_INSTR
and REGEXP_REPLACE
, which are described in the next section.
Set the global variable regexp_format
to specify the regular expression format to be used by the built-in functions you call.
To use ERE globally, run:
SET GLOBAL regexp_format = 'extended';
To use ARE globally, run:
SET GLOBAL regexp_format = 'advanced';
By default, regexp_format
is set to 'extended'
.
You can only set regexp_format
globally; all nodes in the cluster will use the global setting.
You should set regexp_format
to 'advanced'
if you are migrating regular expression logic from another database system and this logic uses advanced regular expressions or common Perl regular expression extensions.
You should set regexp_format
to 'advanced'
if you are developing new regular expression logic.
Regular Expression Functions
MemSQL has four built-in functions that allow you to work with regular expressions.
-
REGEXP
: Searches a string for a regular expression pattern and returns1
if the string is found,0
otherwise. -
RLIKE
: Has the same functionality asREGEXP
. -
REGEXP_INSTR
: Searches a string for a regular expression pattern and returns an integer indicating the beginning position or ending position of the matched substring. -
REGEXP_REPLACE
: Searches a string for the first occurrence of a regular expression pattern and replaces that occurrence with a replacement string. Can also perform a search and replace of all occurrences.
Extended Regular Expression (ERE) Support
MemSQL supports ERE as defined in the POSIX standard. This support is summarized below.
Common metacharacters used in ERE are listed in the following table.
Metacharacter | Meaning | Examples |
---|---|---|
. |
Matches any single character. | a.bc matches aabc , abbc , apbc , azbc , a8bc , a)bc , etc. |
? |
Matches the previous element zero or one time. | xy?z matches xz , xyz |
* |
Matches the previous element zero or more times. | xy*z matches xz , xyz , xyyz , xyyyz etc. |
+ |
Matches the previous element one or more times. | xy+z matches xyz , xyyz , xyyyz etc. |
^ |
Matches the beginning of a string. | ^hi matches hibye but not ibye |
$ |
Matches the end of a string. | ye$ matches hibye but not y |
abc|def |
Matches the characters to the left or right of | . |
abc|def matches abc , def |
[ab] |
Matches either a or b . |
[xyz] matches x , y , or z , but not e |
^[ab] |
Matches any single character except a or b . |
^[xyz] matches p or q , but not x |
{n} |
Matches the previous element n times. |
ab{3}z matches abbbz |
{m,} |
Matches the previous element m or more times. |
ab{3,}z matches abbbbbz |
{n,m} |
Matches the previous element n to m times, where n < m . |
ab{1,3}z matches abz , abbz , abbbz |
[:range:] |
Matches a range of characters, also called a character class. | [:digit:] matches 5 . [:alpha:] matches b |
Patterns may contain multiple metacharacters, as shown in the following examples.
Pattern | Example Matches |
---|---|
a.b.c |
aybzc , a4b6c |
ab?c?d. |
abcdt , ads , abdr |
Metacharacters often match elements. An element is a single character or a group of characters enclosed in parentheses. A group is matched as a unit. Groups can be nested. The following examples show patterns that use groups.
Pattern | Example Matches |
---|---|
a([:digit:]ef){2}g |
a3ef5efg ,a1ef8efg |
a(bcd)|(efg)h |
abcdh , aefgh |
z(ab)*(cd)+ |
zcd , zabcd , zababcd |
z((ab)|(cd){2}e){2}y |
zababy , zabcdcdey , zcdcdecdcdey |
The |
metacharacter can match a series of characters not enclosed in the parentheses. For example, the pattern abc|def
matches abc
or def
. But |
has the lowest precedence of all metacharacters. For more complicated patterns involving |
, you will need to group the characters you want to match. For example, to match abc
or two occurences of def
, use the pattern abc|(def){2}
.
Advanced Regular Expression (ARE) Support
MemSQL supports ARE as defined in PostgreSQL. This support is summarized below.
Backreferences
A pattern containing a backreference matches the same string that it matched earlier in the string. The earlier string must be enclosed in parentheses. A pattern may contain multiple backreferences that are denoted by \\n
, where n
is the number of the backreference.
For example, the pattern a(bc)d(ef)g\\1\\2
matches abcdefgbcef
.
Escapes
Escapes make it easier to write some types of patterns. Common escapes are listed below. See section 9.7.3.3 of the PostgreSQL pattern matching documentation for a full explanation of the supported escapes.
Metacharacter | Meaning |
---|---|
\d |
Matches any digit. |
\s |
Matches a space. |
\w |
Matches any alpha-numeric character including the underscore. |
\D |
Matches any character except a digit. |
\S |
Matches any character except a space. |
\W |
Matches any non-alpha-numeric character. Does not match the underscore. |
\m |
Matches the beginning of a word. |
\M |
Matches the end of a word. |
In PostgreSQL, escapes in patterns begin with one \
character.
In MemSQL, escapes in patterns also begin with one \
character. Additionally, a \
literal in an MemSQL string needs to escaped with a leading \
. Hence, pattern strings containing escapes use two \
characters. For example, SELECT REGEXP_INSTR('dog58','\\d');