Monday, December 22, 2014

Using the MERGE Statement to Perform an UPSERT

http://www.databasejournal.com/features/mssql/using-the-merge-statement-to-perform-an-upsert.html

The term UPSERT has been coined to refer to an operation that inserts rows into a table if they don’t exist, otherwise they are updated.   To perform the UPSERT operation Microsoft introduced the MERGE statement.  The MERGE statement was included into the set of TSQL statements when SQL Server 2008 was introduced.  Not only does the MERGE statement support the UPSERT concept, but it will also support deleting records.   In this article I discuss how to use the MERGE statement to UPDATE, INSERT and DELETE records from a target table.

What is the Value of the MERGE Statement?

Prior to the introduction of SQL Server 2008 if you needed to write logic that inserted rows into a target table if they didn’t exist, or updated them if they did exist you needed a series of “if then else” logic and needed to perform both the UPDATE and INSERT statement.  With the introduction of the MERGE statement with SQL Server 2008 you can perform either an INSERT or UPDATE statement using a single MERGE statement.   Being able to use the MERGE statement to perform inserts or updates to a table makes it easier to code your UPSERT logic in TSQL. 

Performing UPSERT Using MERGE Statement

The MERGE statement supports inserting and updating rows in a table with a single operation.  In order to accomplish this the MERGE statement requires both a Source and Target table.  The Source table is used to identify the rows that needed be inserted or update, and the Target table is the table that rows will be inserted or updated. 
In order to demo the MERGE statement I will need a Source and Target table, which will be used in my MERGE example.   I will use the following code to create my Source and Target table:
USE tempdb;
GO
SET NOCOUNT ON;
CREATE TABLE dbo.Product (
       Id int identity, 
       ProductName varchar(100),
       Qty int);
INSERT INTO dbo.Product (ProductName, Qty) VALUES('Magic Stripper', 5);
CREATE TABLE dbo.NewInventory(
       Id int identity, 
       ProductName varchar(100), 
       Qty int);
INSERT INTO dbo.NewInventory (ProductName, Qty) VALUES
       ('Sandpaper',20), 
       ('Paint Brush 1 inch',15),
       ('Magic Stripper',5);   
With this code I created two tables.   The first table I created was the Product table, which will be my Target table.  This is the table that I will be updating or inserting rows using the MERGE statement.  The other table, NewInventory, is the Source table.  My Source table identifies the records that will be used to determine if a new record needs to be inserted into my Product table.  The Source table is also used to determine if an UPDATE needs to be performed against any existing records in my Product table.  These two tables will be used in the following MERGE statement, which performs an UPSERT operation:
USE tempdb;
GO
MERGE dbo.Product ASUSING dbo.NewInventory AS S 
ON T.ProductName = S.ProductName
WHEN MATCHED THEN  
  UPDATE SET T.Qty = T.Qty + S.Qty
WHEN NOT MATCHED THEN  
  INSERT (ProductName,Qty) VALUES (S.ProductName,S.Qty);
SELECT * FROM dbo.Product;
 
When I run this code I get the following output:
Id          ProductName                                                     Qty
----------- --------------------------------------------------------------- -----------
1           Magic Stripper                                                  10
2           Sandpaper                                                       20
3           Paint Brush 1 inch                                              15
 
Let me review the above code and explain how it worked.  The table identified right after the MERGE statement, the one with the table alias of T is known as the Target.  The Target table is the table in which I will be performing an UPDATE, or an INSERT.  The table following the USING clause, with the table alias of S is known as the Source table.  The Source table identifies the potential records that will be inserted or updated in the Target table.  The ON clause identifies how to join the Target and Source tables.  In my example above I’m joining the Source and Target table based on the ProductName column.  Following the ON clause are two WHEN clauses.  These clauses identify the conditions when a UPDATE or an INSERT will be performed based on the results of the join between the Target and Source tables. 
The first WHEN clause says “WHEN MATCHED”.  The “WHEN MATCHED” conditions means when the Target and Source tables are joined based on the “ON” clause if there is a match then the UPDATE statement will be performed.  In the UPDATE statement I take the existing Qty value in the Target table and add to it the Qty value in the Source table to increase the amount of inventory I have on hand for a given ProductName.
The second WHEN clause has “WHEN NOT MATCHED”.  The “WHEN NOT MATCH” condition means when joining the Target and Source if there is a ProductName in the Source table that is not found in the Target table then this condition will be met.  When this condition is met based on the Source and Target table join operation the Source row will be inserted into the Product table.  This condition allows me to insert a new row into the Product table when new ProductName’s are found in the Source table.
If you review the original rows I had in my Product table you will see that I only had the “Magic Stripper” product in my Product table and that product had a Qty of 5.  After the MERGE statement ran, the “Magic Stripper” product now has a Qty of 10.  This happened because the MERGE statement’s WHEN MATCH condition was met and therefore the Qty value from the Product table, which was 5 and the Qty value from the New Inventory, which was also 5 were summed together to UPDATE the matched row in the Target table. The other two Products “Sandpaper” and “Paint Brush 1 inch” where inserted into the Product table because these two products didn’t already exist in the Product table so they met the “WHEN NOT MATCHED” condition. When the “WHEN NOT MATCHED” condition is met the unmatched rows in the Source table were inserted into the Target table.

What about Deleting Rows Using the Merge Statement?

Can the MERGE statement be used to delete records from a table?  Yes it can!   To see how this works let’s suppose we have a business requirement to delete rows from the Target table of the MERGE statement when they don’t exist in the Source table.  In order to show how this works I’m going to using my exiting Product table, which now after running the prior examples has records for the following three products:
  • Magic Stripper
  • Sandpaper
  • Paint Brush 1 Inch
But this time I’m going to use the NewInventory2 table that is created with the following TSQL code.  This new table will be the Source table for the MERGE statement:
USE tempdb;
GO
CREATE TABLE dbo.NewInventory2(
       Id int identity, 
       ProductName varchar(100), 
       Qty int);
INSERT INTO dbo.NewInventory2 (ProductName, Qty) VALUES
       ('Sandpaper',5), 
       ('Paint Brush 1 inch',10);
 
To use the MERGE statement to perform a delete from my Product table I will use the following code:
USE tempdb;
GO
MERGE dbo.Product ASUSING dbo.NewInventory2 AS S
ON T.ProductName = S.ProductName 
WHEN NOT MATCHED BY SOURCE THEN  
  DELETE;
SELECT * FROM dbo.Product;
When I run this code I get the following output:
Id          ProductName                                                      Qty
----------- ---------------------------------------------------------------- -----------
2           Sandpaper                                                        20
3           Paint Brush 1 inch                                               15
By reviewing the result set above you can see that the “Magic Stripper” product got deleted from the Product table.   This was accomplished by using the “WHEN NOT MATCHED BY SOURCE” condition of the above MERGE statement.  This condition is met when there is no row in the Source table that matches a Target table row.  When this occurs SQL Server deletes rows in the Target table that don’t have a corresponding matching row in the Source table.   Note you can include the WHEN NOT MATCHED BY SOURCE, WHEN MATCHED and WHEN NOT MATCHED is the same MERGE statement.

Beware of Filtering Rows Using ON Clause

If you read Books Online (BOL) there is a cautionary note on Filtering Rows by associating additional constraints with the ON clause of the MERGE statement.  Here is the actual text regarding trying to perform additional filtering using the ON clause that was found in the BOL:
It is important to specify only the columns from the target table that are used for matching purposes. That is, specify columns from the target table that are compared to the corresponding column of the source table. Do not attempt to improve query performance by filtering out rows in the target table in the ON clause, such as by specifying AND NOT target_table.column_x = value. Doing so may return unexpected and incorrect results.
To show you an example of how adding additional filtering using the ON clause of a MERGE statement can cause unexpected results. I’m going to first run this code to recreate my Target and Source tables:
USE tempdb;
GO
DROP table Product, NewInventory
SET NOCOUNT ON;
CREATE TABLE dbo.Product (
       Id int identity, 
       ProductName varchar(100),
       Qty int);
INSERT INTO dbo.Product (ProductName, Qty) VALUES('Magic Stripper', 5);
CREATE TABLE dbo.NewInventory(
       Id int identity, 
       ProductName varchar(100), 
       Qty int);
INSERT INTO dbo.NewInventory (ProductName, Qty) VALUES
       ('Sandpaper',20), 
       ('Paint Brush 1 inch',15),
       ('Magic Stripper',5); 
            
This code just sets my Target and Source back to what it was originally, where there is only 1 row in the Product table and 3 rows in the NewInventory table.
Suppose I only want to add a subset of the rows in my NewInventory table to my Product table.  For this example, suppose I only want to add rows to my Product table if the Qty value in my NewInventory table is greater than 5.
To demonstrate how a MERGE statement can produce unexpected results if you add additional filtering criteria using the ON clause, let me run the following code: 
USE tempdb;
GO
MERGE dbo.Product ASUSING dbo.NewInventory AS S 
ON T.ProductName = S.ProductName
   AND S.QTY > 5
WHEN MATCHED THEN  
  UPDATE SET T.Qty = T.Qty + S.Qty
WHEN NOT MATCHED THEN  
  INSERT (ProductName,Qty) VALUES (S.ProductName,S.Qty);
SELECT * FROM dbo.Product;
When I run this code I get this output:
Id          ProductName                                                   Qty
----------- ------------------------------------------------------------- -----------
1           Magic Stripper                                                5
2           Sandpaper                                                     20
3           Paint Brush 1 inch                                            15
4           Magic Stripper                                                5
 
If you look at this output you can see it doesn’t meet my requirement by only performing an UPSERT operation of the records in my Source table that have a Qty value greater than 5.   Why did it do this?  It did this because I added an additional filter rule that wasn’t an equality comparison between a column in the Target table and a column in the Source tables.  This additional filter criteria is the “S.Qty > 5” condition.   By adding this additional condition that only identified a source column I told SQL Server this was part of the matching criteria.  This additional condition was used to determine which records match between the Source and Target tables.   In this particular example no row in the Source table matched any rows in the Target table therefore all three records in my Source table where inserted into the Target table.  This is why I now have two product records with a ProductName of “Magic Stripper”.
In order to meet my requirements I can write my MERGE statement like below:
USE tempdb;
GO
MERGE dbo.Product ASUSING dbo.NewInventory AS S 
ON T.ProductName = S.ProductName
WHEN MATCHED AND S.Qty > 5 THEN  
  UPDATE SET T.Qty = T.Qty + S.Qty
WHEN NOT MATCHED AND S.Qty > 5 THEN  
  INSERT (ProductName,Qty) VALUES (S.ProductName,S.Qty);
SELECT * FROM dbo.Product;
In this code I place the additional filter rule on the WHEN clause.  Now SQL Server first matches the rows between the Source and Target tables based on the ON condition and then when SQL Server executes the WHEN conditions it excludes the Source rows that have a Qty less than 6 from being inserted or updated.   To test this for yourself you will first need to run the first code block in this section that creates the original Source and Target rows, and then run this code.

Duplicates in Source Table

Another condition that causes problems with the MERGE statement is when the Source table contains multiple records with the same column values associated with the ON clause.  When this occurs SQL Server will throw a duplicate error message.  To demonstrate let me create a new Source table “NewInventory3” and then run it through my MERGE statement.  Here is the code that will throw the MERGE error:
USE tempdb;
GO
CREATE TABLE NewInventory3 (
       Id int identity, 
       ProductName varchar(100), 
       Qty int);
INSERT INTO NewInventory3 values 
       ('Magic Stripper',15),
       ('Magic Stripper',5); 
       USE tempdb;
GO
MERGE dbo.Product ASUSING dbo.NewInventory3 AS S 
ON T.ProductName = S.ProductName
WHEN MATCHED THEN  
  UPDATE SET T.Qty = T.Qty + S.Qty
WHEN NOT MATCHED THEN  
  INSERT (ProductName,Qty) VALUES (S.ProductName,S.Qty);
SELECT * FROM dbo.Product;   
When I run this code I get this error:
Msg 8672, Level 16, State 1, Line 12
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
This error was caused because I had two duplicate rows in my Source table that matched to a single Target row.   To resolve this problem I need to eliminate the duplicate rows based on the matching criteria.  Suppose I wanted both of those rows to actually add to my inventory of “Magic Stripper”.  In this case, I could run the following MERGE statement to accomplish this:
MERGE dbo.Product ASUSING (SELECT ProductName, SUM(Qty) as Qty
      FROM dbo.NewInventory3
         GROUP by ProductName) AS S 
ON T.ProductName = S.ProductName
WHEN MATCHED THEN  
  UPDATE SET T.Qty = T.Qty + S.Qty
WHEN NOT MATCHED THEN  
  INSERT (ProductName,Qty) VALUES (S.ProductName,S.Qty);
SELECT * FROM dbo.Product;  
Here I removed the duplicate row in my Source table by first aggregating all the similar ProductName rows in my Source table by using a sub-query when I identified the Source rows for my MERGE statement.

Summary

The MERGE statement allows you to write a single TSQL statement that allows you to INSERT, UPDATE, and/or DELETE records from a Target table. The MERGE statement controls whether an INSERT, UPDATE, or DELETE clause is executed by matching a set of rows in the Source table against the Target table.  When a Source row matches a Target row the Target row is updated, with information from the Source row.  If the Source row does not match a Target row then the Source row is inserted into the Target table.  The MERGE statement can also DELETE rows from the Target table when Target table rows are not found within the Source table. Using the MERGE statement greatly simplifies the amount of code you would need to write using “if then else” logic to perform INSERT, UPDATE, and/or DELETE operations against a Target table.   Next time you need to perform an UPSERT operation look into using the MERGE statement if you are on SQL Server 2008 and above.

No comments:

Post a Comment