https://sqlstudies.com/
Friday, February 17, 2017
Using EXCEPT to Determine Row Differences
http://www.sqlservercentral.com/articles/EXCEPT/107520/?utm_source=SSC&utm_medium=pubemail
Using EXCEPT to Determine Row Differences
By M. Douglas Bowan II, 2017/02/17 (first published: 2014/03/25)
Introduction
I was tasked with the data transmission, synchronization and balancing for a large project at work, so I began scrutinizing the data sources and noticed that several of our sources did not have any mechanism to signify when a row had been modified.
This presented the problem of how to effectively determine when a row had changed. Some of the staging tables would be hundreds of columns wide and inevitably some hand-coded field comparisons would be missed, duplicated or otherwise mangled due to NULL comparisons – all which would then lead to synchronicity issues.
It was distasteful to me, but I began with the knee-jerk approach of performing column-by-column comparisons, and in the process I wondered if I could somehow leverage the EXCEPT statement.
The EXCEPT Statement
In the simplest sense, the EXCEPT statement is used to compare the differences between two sets of data. As long as the number of columns are the same, order, data types and nullability, it quickly returns all rows from the first set of data that do not lie within the second.
Common examples look similar to the following:
SELECT [t1].[ID] , [t1].[Col1] , [t1].[Col2] FROM [table1] [t1] EXCEPT SELECT [t2].[ID] , [t2].[Col1] , [t2].[Col2] FROM [table2] [t2];
Which as stated, would return all rows from table1 that do not have an exact match in table2.
Perhaps the most attractive quality about the EXCEPT statement is the fact that it treats null values as equal to each other. Typically this is not the case and this means that they must be handled with special logic on a column-by-column basis to perform a comparison and can impose logic issues when handled inadequately.
Comparing Row Differences
The EXCEPT statement had a lot going for it: I would not have to specify a comparison for each column, it handled null values desirably and, as long as its rules are met there was nothing complicated that would yield unexpected results.
This really did not help me with the task at hand though, and in mulling the problem over I realized that a slight change in approach would let me leverage the EXCEPT statement.
If we constrain the keys between the two sets to be equal, then any product from the EXCEPT statement will be rows where at least one of the other columns is different between them.
In other words, by joining the two datasets together by their keys, we no longer have the dataset differences between them, but the row differences.
Applying this slight modification to the example above, we have the following:
SELECT [t1].[ID] , [t1].[Col1] , [t1].[Col2] FROM [table1] [t1] INNER JOIN [table2] [t2] ON [t1].[ID] = [t2].[ID] EXCEPT SELECT [t2].[ID] , [t2].[Col1] , [t2].[Col2] FROM [table2] [t2] INNER JOIN [table1] [t1] ON [t2].[ID] = [t1].[ID];
This will cause the EXCEPT statement to compare each equivalent row between the two sets.
A Demonstration
In order to set up the scenario, we will create two temporary tables that have the imaginative names of #Source and #Target and identical column definitions (with the noted difference of an identity attribute on the source table’s primary key.)
CREATE TABLE [#Source] ( [ID] INT NOT NULL PRIMARY KEY IDENTITY(1, 1) , [Item] VARCHAR(100) NOT NULL UNIQUE , [Price] MONEY NOT NULL , [OrderDate] DATE NOT NULL , [Units] NUMERIC(10, 4) NULL , [ShipmentDate] DATE NULL ); CREATE TABLE [#Target] ( [ID] INT NOT NULL PRIMARY KEY , [Item] VARCHAR(100) NOT NULL UNIQUE , [Price] MONEY NOT NULL , [OrderDate] DATE NOT NULL , [Units] NUMERIC(10, 4) NULL , [ShipmentDate] DATE NULL );
Let’s insert some initial values into our source table to set up our scenario. We will then duplicate the data by bulk-copying them to our target so that we can start with equivalent datasets.
INSERT INTO [#Source] ( [Item], [Price], [OrderDate], [Units], [ShipmentDate] ) VALUES ( 'Apple', 2.49, '1/1/2001', NULL, '1/02/2001' ) , ( 'Coconut', 0.99, '3/3/2003', 1.35, '3/4/2003' ) , ( 'Eggplant', 1.19, '5/5/2005', NULL, '5/6/2005' ) , ( 'Fig', 0.49, '6/6/2006', NULL, '6/7/2006' ) , ( 'Kiwi', 0.69, '11/11/2011', NULL, '11/12/2011' ) , ( 'Lychee', 0.29, '12/12/2012', NULL, '12/14/2012' ); INSERT INTO [#Target] ( [ID] , [Item] , [Price] , [OrderDate] , [Units] , [ShipmentDate] ) SELECT [S].[ID] , [S].[Item] , [S].[Price] , [S].[OrderDate] , [S].[Units] , [S].[ShipmentDate] FROM [#Source] [S];
Now that we have some initial test data in our tables; if we ran our query against the tables at this point it would produce no results since the data is identical. So let’s begin by running a couple of updates against our source to mimic production changes that might occur and would require detection so that they could be applied to our target table.
We will need to test: setting a value to a null, modifying a value to another value, and finally setting a null value to a value.
UPDATE [#Source] SET [ShipmentDate] = NULL WHERE [Item] = 'Coconut'; UPDATE [#Source] SET [Price] = 22.44 WHERE [Item] = 'Eggplant'; UPDATE [#Source] SET [Units] = 1000.55 WHERE [Item] = 'Lychee';
After we have run our updates, we can now run the modified EXCEPT statement query against our source and target tables.
SELECT [S].[ID] , [S].[Item] , [S].[Price] , [S].[OrderDate] , [S].[Units] , [S].[ShipmentDate] FROM [#Source] [S] INNER JOIN [#Target] [T] ON [S].[ID] = [T].[ID] EXCEPT SELECT [T].[ID] , [T].[Item] , [T].[Price] , [T].[OrderDate] , [T].[Units] , [T].[ShipmentDate] FROM [#Target] [T] INNER JOIN [#Source] [S] ON [T].[ID] = [S].[ID];
We will see that the query produces the three rows that we have made changes to as expected.
Conclusion
By joining two datasets together by their common keys, we cause the EXCEPT statement to compare the remaining columns. This approach to check row differences can reduce errors by removing the tedious hand-coding of column-by-column comparisons as well as required null-handling logic.
Wednesday, February 15, 2017
Find currently executing Query
https://blog.sqlauthority.com/2009/07/19/sql-server-get-last-running-query-based-on-spid/
We often need to find the last running query or based on SPID need to know which query was executed. SPID is returns sessions ID of the current user process. The acronym SPID comes from the name of its earlier version, Server Process ID.
To know which sessions are running currently, run the following command:
SELECT @@SPIDGO
In our case, we got SPID 57, which means the session that is running this command has ID of 57.
Now, let us open another session and run the same command. Here we get different IDs for different sessions.
In our case, we got SPID 61. Please note here that this ID may or may not be sequential.
In session with SPID 61, we will run any query. In session with SPID 57, we will see which query was run in session with SPID 61.
Let us run a simple SELECT statement in session with SPID 61 and in session with SPID 57 run the following command.
DBCC INPUTBUFFER(61)GO
Now, here in DBCC command we have passed the SPID of previous session; we will see the text below. The following image illustrates that we get the latest run query in our input buffer.
There are several ways to find out what is the latest run query from system table sys.sysprocesses.
DECLARE @sqltext VARBINARY(128)SELECT @sqltext = sql_handleFROM sys.sysprocessesWHERE spid = 61SELECT TEXT
FROM sys.dm_exec_sql_text(@sqltext)GO
The following image portrays that we get the latest run query in our input buffer.
Please note that in any session there may be multiple running queries, but the buffer only saves the last query and that is what we will be able to retrieve.
There is one more way to achieve the same thing – using function fn_get_sql
DECLARE @sqltext VARBINARY(128)SELECT @sqltext = sql_handleFROM sys.sysprocessesWHERE spid = 61SELECT TEXT
FROM ::fn_get_sql(@sqltext)GO
All the three methods are same but I always prefer method 2 where I have used sys.sysprocesses.
Today, we have explored a very simple topic. Let me know if you find it useful.
SQL SERVER – Get Query Running in Session
I was recently looking for syntax where I needed a query running in any particular session. I always remembered the syntax and ha d actually written it down before, but somehow it was not coming to mind quickly this time. I searched online and I ended up on my own article written last year SQL SERVER – Get Last Running Query Based on SPID. I felt that I am getting old because I forgot this really simple syntax.
This post is a refresher to me. I knew it was something so familiar since I have used this syntax so many times during my performance tuning project.
Run the following query to find out what the latest query that was executed in the session. There are various methods mentioned in my earlier post, so here I am picking only the one that I use most of the time.
Please use Shortcut – CTRL+T or enable “result to text” in the resultset to get formatted output.
DECLARE @sqltext VARBINARY(128)SELECT @sqltext = sql_handleFROM sys.sysprocessesWHERE spid = (YourSessionID)SELECT TEXT
FROM sys.dm_exec_sql_text(@sqltext)GO
You can find a range of active session IDs in your system by running system stored procedure sp_who2.
The following is the resultset where I have selected the session id that is the same as from where I am running above statement.
Additionally, you can use following T-SQL script as well.
SELECT TEXT
FROM sys.dm_exec_connectionsCROSS APPLY sys.dm_exec_sql_text(most_recent_sql_handle)WHERE session_id = (yoursessionID)GO
Subscribe to:
Posts (Atom)