Outdated Version

You are viewing an older version of this section. View current production version.

Performing Upserts min read


Info

Upserts can only be performed on rowstore tables.

An upsert inserts a row r into a table if the table does not already contain an existing row with the same primary key or unique index as r. Otherwise, the upsert updates the existing row. An upsert can be performed by specifying the ON DUPLICATE KEY UPDATE clause of the INSERT statement.

  • If the ON DUPLICATE KEY UPDATE clause is specified, and a row is to be inserted that would result in a duplicate value in a PRIMARY KEY or UNIQUE index, MemSQL will instead perform an UPDATE of the old row.
  • When using ON DUPLICATE KEY UPDATE, the affected row count will be 0 if no existing row is changed, 1 if a new row is inserted, and 2 if an existing row is updated. In Example 1 below, the ON DUPLICATE KEY UPDATE clause updates the existing row instead of inserting a new row to avoid duplicate entry in a PRIMARY KEY field, and the query result shows that 2 rows were affected.

Examples

The following examples use the cust table.

DESC cust;
****
+--------+-------------+------+------+---------+-------+
| Field  | Type        | Null | Key  | Default | Extra |
+--------+-------------+------+------+---------+-------+
| NAME   | varchar(32) | YES  |      | NULL    |       |
| ID     | int(11)     | NO   | PRI  | 0       |       |
| ORDERS | int(11)     | YES  |      | NULL    |       |
+--------+-------------+------+------+---------+-------+

The following example demonstrates the use of INSERT with ON DUPLICATE KEY UPDATE.

SELECT * FROM cust;
****
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      2 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 3)
ON DUPLICATE KEY UPDATE ORDERS=3;
****
Query OK, 2 rows affected
SELECT * FROM cust;
****
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      3 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+

The VALUES() function can be used to refer to column values from the INSERT clause of the INSERT ... ON DUPLICATE KEY UPDATE statement. For example:

SELECT * FROM cust;
****
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      2 |
| Adam  | 3412 |      5 |
| Elen  | 8301 |      4 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 4) ON DUPLICATE KEY UPDATE ORDERS = VALUES(ORDERS) + ORDERS;
****
Query OK, 2 rows affected
SELECT * FROM cust;
****
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Adam  | 3412 |      5 |
| Elen  | 8301 |      4 |
+-------+------+--------+

Here, the VALUES() function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.

The following code block demonstrates the use of SELECT with ON DUPLICATE KEY UPDATE:

SELECT * FROM cust_new;
****
+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Sam  |   22 |      2 |
+------+------+--------+
SELECT * FROM cust;
****
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+
INSERT INTO cust (NAME, ID, ORDERS) SELECT * FROM cust_new ON DUPLICATE KEY UPDATE NAME = VALUES(NAME), ORDERS =  VALUES (ORDERS);
****
Query OK, 4 rows affected
Records: 3  Duplicates: 1  Warnings: 0
SELECT * FROM cust;
****
+------+------+--------+
| NAME | ID   | ORDERS |
+------+------+--------+
| Bill |   21 |      5 |
| Gwen | 7214 |      3 |
| Elen | 8301 |      4 |
| Sam  |   22 |      2 |
| Adam | 3412 |      5 |
+------+------+--------+

Here, the VALUES() function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.

The above examples are functionally equivalent to this common two-step operation.

Note: While this works, it is much less performant than the single-steps above.

SELECT * FROM cust;
****
+-------+------+--------+
| NAME  | ID   | ORDERS |
+-------+------+--------+
| Chris | 7214 |      6 |
| Elen  | 8301 |      4 |
| Adam  | 3412 |      5 |
+-------+------+--------+
SELECT * FROM cust_new;
****
+--------+------+--------+
| NAME   | ID   | ORDERS |
+--------+------+--------+
| Chris  | 7214 |      8 |
| Rachel | 9125 |      2 |
+--------+------+--------+

Copy all the unique values from the cust to cust_new table.

INSERT IGNORE INTO cust_new SELECT * FROM cust;

SELECT * FROM cust_new;
****
+--------+------+--------+
| NAME   | ID   | ORDERS |
+--------+------+--------+
| Chris  | 7214 |      8 |
| Elen   | 8301 |      4 |
| Rachel | 9125 |      2 |
| Adam   | 3412 |      5 |
+--------+------+--------+

Now, perform an update to copy all the non-unique (duplicate) values.

UPDATE cust_new
JOIN cust ON ( cust.ID=cust_new.ID )
SET cust_new.NAME = cust.NAME,
    cust_new.ORDERS = cust.ORDERS;

SELECT * FROM cust_new;
****
+--------+------+--------+
| NAME   | ID   | ORDERS |
+--------+------+--------+
| Chris  | 7214 |      6 |
| Elen   | 8301 |      4 |
| Rachel | 9125 |      2 |
| Adam   | 3412 |      5 |
+--------+------+--------+

The following example demonstrates an upsert on a Pipeline.

Assume the directory /order_files has one file orders1.txt, which contains the following data:

Chris,7214,6
Elen,8301,4
Adam,3412,5
Rachel,9125,2
Susan,8301,7
George,3412,9

Create a Pipeline with an ON DUPLICATE KEY UPDATE clause:

CREATE PIPELINE p AS LOAD DATA FS '/order_files/orders*.txt' INTO TABLE cust FIELDS TERMINATED BY ',' ON DUPLICATE KEY UPDATE orders = 3;

Start the Pipeline:

START PIPELINE p;

As the Pipeline ingests the data from orders1.txt into the cust table, it encounters the fifth and sixth records in the file, which contain the duplicate keys 8301 and 3412. The second and third records containing those duplicate keys (which have already been imported into cust), are updated with 3, because orders = 3 is specified in the ON DUPLICATE KEY UPDATE clause.

SELECT * FROM cust ORDER BY name;
****
+--------+------+--------+
| name   | id   | orders |
+--------+------+--------+
| Adam   | 3412 |      3 |
| Chris  | 7214 |      6 |
| Elen   | 8301 |      3 |
| Rachel | 9125 |      2 |
+--------+------+--------+