Outdated Version

You are viewing an older version of this section. View current production version.

REGEXP_SUBSTR

Searches a string for a regular expression pattern and returns the matching substring.

REGEXP_SUBSTR supports the extended and advanced regular expression formats. You can set the global variable regexp_format to specify which format you want to use.

Syntax

REGEXP_SUBSTR ( source_string, pattern [, position [, occurrence [, parameters ] ] ] )

Arguments

source_string

A string expression, such as a column name, to be searched.

pattern

A string literal representing a regular expression.

position

Optional if occurrence and parameters are not specified.

A positive integer indicating the position within source_string to begin searching. The position is based on the number of characters, not bytes, so that multibyte characters are counted as single characters. The default is 1. If position is less than 1, the search begins at the first character of source_string. If position is greater than the number of characters in source_string, the result is NULL.

occurrence

Optional if parameters is not specified.

A positive integer indicating which occurrence of pattern in source_string to return. The search for the occurrence begins at position. If occurrence is not present, defaults to 1. If occurrence is less than 1 or greater than the number of characters in source_string, the search is ignored and REGEXP_SUBSTR returns NULL.

parameters

Optional. If present, has one of the following values:

c: Perform case-sensitive matching.

i: Perform case-insensitive matching.

If c or i is not specified, performs case-sensitive matching.

Remarks

REGEXP_SUBSTR returns NULL if no match is found, source_string is NULL, or pattern is NULL.

Examples

The following examples demonstrate how REGEXP_SUBSTR works when the global variable regexp_format is set to 'extended'.

Call to REGEXP_SUBSTR Result Comments
SELECT REGEXP_SUBSTR('adog','o'); o Returns the substring o.
SELECT REGEXP_SUBSTR('adog','.o'); do Returns the character preceding the first occurrence of ‘o’, followed by the substring ‘o’.
SELECT REGEXP_SUBSTR('dog58','[[:digit:]]'); 5 Returns the first digit.
SELECT REGEXP_SUBSTR('1dogbat','[[:alpha:]]'); d Returns the first alphabet.
SELECT REGEXP_SUBSTR('rat','b|c'); NULL Looks for b or c. Since neither character is found, returns NULL.
SELECT REGEXP_SUBSTR('mousecat','cat$'); cat Returns the substring cat that appears at the end of the string.
SELECT REGEXP_SUBSTR('2dogcat','^dog'); NULL Searches for dog at the beginning of the string. Since dog is not found at the beginning of the string, returns NULL.
SELECT REGEXP_SUBSTR('catcat','ca.',1); cat Returns the first occurrence of ca followed by any character.
SELECT REGEXP_SUBSTR('catcat','cat',1,2); cat Returns the second occurrence of cat.
SELECT REGEXP_SUBSTR('bat3BaT','BaT',1,1,'c'); BaT Returns the first occurrence of BaT (case sensitive search).
SELECT REGEXP_SUBSTR('bat3BaT','BaT',1,1,'i'); bat Returns the first occurrence of BaT (case-insensitive search).

The following examples demonstrate how REGEXP_SUBSTR works when the global variable regexp_format is set to 'advanced'.

Call to REGEXP_SUBSTR Result Comments
SELECT REGEXP_SUBSTR('dog58','\\d'); 5 Returns the first digit.
SELECT REGEXP_SUBSTR('dog58','\\D'); d Returns the first non-digit.
SELECT REGEXP_SUBSTR('%26dog','\\w'); 2 Returns the first alpha-numeric character.
SELECT REGEXP_SUBSTR('%26dog','\\W'); % Returns the first non-alpha-numeric character.
SELECT REGEXP_SUBSTR('cat dog','\\mdog'); dog Returns the first occurrence of the word dog.

Using REGEXP_SUBSTR With a Table

The following example shows how to call REGEXP_SUBSTR once per table row. This example uses extended regular expression matching. Advanced regular expression matching could be used as well.

SET GLOBAL regexp_format = 'extended';

DROP DATABASE IF EXISTS memsql_docs_example;
CREATE DATABASE memsql_docs_example;
use memsql_docs_example;
CREATE TABLE example_table(id INT, text_field TEXT);

INSERT INTO example_table VALUES (1, 'the'), (2, 'athe'), (3, 'abthe'),
(4, 'abcthe'), (5, 'thethe');

SELECT id, text_field, REGEXP_SUBSTR(text_field,'..the') FROM example_table
ORDER BY id;
****
+------+------------+-----------------------------------+
| id   | text_field | REGEXP_SUBSTR(text_field,'..the') |
+------+------------+-----------------------------------+
|    1 | the        | NULL                              |
|    2 | athe       | NULL                              |
|    3 | abthe      | abthe                             |
|    4 | abcthe     | bcthe                             |
|    5 | thethe     | hethe                             |
+------+------------+-----------------------------------+