Upserts can only be performed on rowstore tables.
An upsert inserts a row r
into a table if the table does not already contain an existing row with the same primary key or unique index as r
. Otherwise, the upsert updates the existing row. An upsert can be performed by specifying the ON DUPLICATE KEY UPDATE
clause of the INSERT
statement.
- If the
ON DUPLICATE KEY UPDATE
clause is specified, and a row is to be inserted that would result in a duplicate value in aPRIMARY KEY
orUNIQUE
index, MemSQL will instead perform anUPDATE
of the old row. - When using
ON DUPLICATE KEY UPDATE
, the affected row count will be 0 if no existing row is changed, 1 if a new row is inserted, and 2 if an existing row is updated. In Example 1 below, theON DUPLICATE KEY UPDATE
clause updates the existing row instead of inserting a new row to avoid duplicate entry in aPRIMARY KEY
field, and the query result shows that 2 rows were affected.
Examples
The following examples use the cust
table.
DESC cust;
****
+--------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+------+---------+-------+
| NAME | varchar(32) | YES | | NULL | |
| ID | int(11) | NO | PRI | 0 | |
| ORDERS | int(11) | YES | | NULL | |
+--------+-------------+------+------+---------+-------+
The following example demonstrates the use of INSERT
with ON DUPLICATE KEY UPDATE
.
SELECT * FROM cust;
****
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 2 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 3)
ON DUPLICATE KEY UPDATE ORDERS=3;
****
Query OK, 2 rows affected
SELECT * FROM cust;
****
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 3 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
The VALUES()
function can be used to refer to column values from the INSERT
clause of the INSERT ... ON DUPLICATE KEY UPDATE
statement. For example:
SELECT * FROM cust;
****
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 2 |
| Adam | 3412 | 5 |
| Elen | 8301 | 4 |
+-------+------+--------+
INSERT INTO cust (ID, ORDERS) VALUES (7214, 4) ON DUPLICATE KEY UPDATE ORDERS = VALUES(ORDERS) + ORDERS;
****
Query OK, 2 rows affected
SELECT * FROM cust;
****
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 6 |
| Adam | 3412 | 5 |
| Elen | 8301 | 4 |
+-------+------+--------+
Here, the VALUES()
function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.
The following code block demonstrates the use of SELECT
with ON DUPLICATE KEY UPDATE
:
SELECT * FROM cust_new;
****
+------+------+--------+
| NAME | ID | ORDERS |
+------+------+--------+
| Bill | 21 | 5 |
| Gwen | 7214 | 3 |
| Sam | 22 | 2 |
+------+------+--------+
SELECT * FROM cust;
****
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 6 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
INSERT INTO cust (NAME, ID, ORDERS) SELECT * FROM cust_new ON DUPLICATE KEY UPDATE NAME = VALUES(NAME), ORDERS = VALUES (ORDERS);
****
Query OK, 4 rows affected
Records: 3 Duplicates: 1 Warnings: 0
SELECT * FROM cust;
****
+------+------+--------+
| NAME | ID | ORDERS |
+------+------+--------+
| Bill | 21 | 5 |
| Gwen | 7214 | 3 |
| Elen | 8301 | 4 |
| Sam | 22 | 2 |
| Adam | 3412 | 5 |
+------+------+--------+
Here, the VALUES()
function refers to the value that would have been inserted, if there were no duplicate-key conflicts, and should refer to a column name of the destination table.
The above examples are functionally equivalent to this common two-step operation.
Note: While this works, it is much less performant than the single-steps above.
SELECT * FROM cust;
****
+-------+------+--------+
| NAME | ID | ORDERS |
+-------+------+--------+
| Chris | 7214 | 6 |
| Elen | 8301 | 4 |
| Adam | 3412 | 5 |
+-------+------+--------+
SELECT * FROM cust_new;
****
+--------+------+--------+
| NAME | ID | ORDERS |
+--------+------+--------+
| Chris | 7214 | 8 |
| Rachel | 9125 | 2 |
+--------+------+--------+
Copy all the unique values from the cust
to cust_new
table.
INSERT IGNORE INTO cust_new SELECT * FROM cust;
SELECT * FROM cust_new;
****
+--------+------+--------+
| NAME | ID | ORDERS |
+--------+------+--------+
| Chris | 7214 | 8 |
| Elen | 8301 | 4 |
| Rachel | 9125 | 2 |
| Adam | 3412 | 5 |
+--------+------+--------+
Now, perform an update to copy all the non-unique (duplicate) values.
UPDATE cust_new
JOIN cust ON ( cust.ID=cust_new.ID )
SET cust_new.NAME = cust.NAME,
cust_new.ORDERS = cust.ORDERS;
SELECT * FROM cust_new;
****
+--------+------+--------+
| NAME | ID | ORDERS |
+--------+------+--------+
| Chris | 7214 | 6 |
| Elen | 8301 | 4 |
| Rachel | 9125 | 2 |
| Adam | 3412 | 5 |
+--------+------+--------+
The following example demonstrates an upsert on a Pipeline.
Assume the directory /order_files
has one file orders1.txt
, which contains the following data:
Chris,7214,6
Elen,8301,4
Adam,3412,5
Rachel,9125,2
Susan,8301,7
George,3412,9
Create a Pipeline with an ON DUPLICATE KEY UPDATE
clause:
CREATE PIPELINE p AS LOAD DATA FS '/order_files/orders*.txt' INTO TABLE cust FIELDS TERMINATED BY ',' ON DUPLICATE KEY UPDATE orders = 3;
Start the Pipeline:
START PIPELINE p;
As the Pipeline ingests the data from orders1.txt
into the cust
table, it encounters the fifth and sixth records in the file, which contain the duplicate keys 8301
and 3412
. The second and third records containing those duplicate keys (which have already been imported into cust
), are updated with 3
, because orders = 3
is specified in the ON DUPLICATE KEY UPDATE
clause.
SELECT * FROM cust ORDER BY name;
****
+--------+------+--------+
| name | id | orders |
+--------+------+--------+
| Adam | 3412 | 3 |
| Chris | 7214 | 6 |
| Elen | 8301 | 3 |
| Rachel | 9125 | 2 |
+--------+------+--------+