Thursday, August 27, 2015

Change Tracking in SQL Server

https://www.mssqltips.com/sqlservertip/1819/using-change-tracking-in-sql-server-2008/

ProblemIn the tip "Using Change Data Capture (CDC) in SQL Server 2008", Ray Barley beautifully described what Change Data Capture is, how to configure it and how it works. On a similar note, there is a new feature in SQL Server 2008 called Change Tracking.
Change Tracking is a light-weight feature which provides a synchronization mechanism between two applications. In other words, it tracks a table for net DML (INSERT, UPDATE and DELETE) changes that occur on a table, so that an application (like a caching application) can refresh itself with just the changed dataset.

In this tip, I am going to discuss in detail about what Change Tracking is, how it works, how to configure it, an application scenario and how it differs from Change Data Capture.

SolutionChange Tracking is a new feature in SQL Server 2008, which allows an application to refresh itself with the latest/changed data from other sources. In other words, an application can pull only changed data since the last pull. When you enable change tracking for a table, an internal table is created by SQL Server to store the change information. Please note it will maintain only the net change information, which means if a record is changed more than once the tracking table will have information about the last change only.

Though it was possible to have a customized application (using triggers, timestamp columns and writing procedure for cleanup) for change tracking purposes in prior versions of SQL Server, you had to write hundreds of lines code to create such an application. With the built-in Change Tracking feature of SQL Server 2008, you just have to enable it and you are all set to use the change tracking functions. The advantages of using this feature are; you have reduced development time, no need for schema creation and modification, built-in tracking table cleanup and reduced overhead as compared to other alternatives like using triggers.
How it works
To enable Change Tracking for your table, you need to first enable it at a database level. So once you have enabled it on a database level, then you can individually choose which tables you want to be tracked and enable these tables (other than that you are not required to create triggers or tracking tables on your own).
SQL Server creates an internal table (the naming convention used is: change_tracking_<object_id>) which stores the net DML change information for that table, it means any DML operation occuring on that table will be recorded in this internal table (again only the net change). Since it's an internal table you cannot directly query it, hence SQL Server has provided change tracking functions (discussed below) for that purpose. Gradually the data in the tracking tables will grow, so SQL Server also provides settings for doing automatic clean up of these tables.
In order to enable Change Tracking on a table, it needs to have a primary key on it as this is required to identify the row in the tracked user table. When a DML operation happens on a Change Tracking enabled table, SQL server stores the primary key value along with the changed information (columns impacted and type of command issued for example INSERT, UPDATE and DELETE) in the internal tracking table. You can use these primary key values from the internal table using change tracking functions to join and pull records from the tracked/user table to get the latest data.
Change Tracking Functions
SQL Server provides a few change tracking functions to get change tracking information as given below:
  • CHANGETABLE - This function is one of the most important and frequently used functions for Change Tracking. It returns change tracking information either for a specific row or for all the rows. It accepts two parameter, the first parameter could be either CHANGE or VERSION along with table name. The second parameter could be either <PreviousSyncVersion> if you are using CHANGE or primary key values if you are using VERSION. This function returns a couple of different columns as discussed below:
Column NameDescription
SYS_CHANGE_VERSIONIt represents the last version number when a particular row was updated.
SYS_CHANGE_CREATION_VERSIONIt represents the version number when a record was inserted. It will not be over-written as in case of SYS_CHANGE_VERSION.
SYS_CHANGE_OPERATIONIt represents the DML operations (I=INSERT, U=UPDATE and D=DELETE)
SYS_CHANGE_COLUMNSIt represents all the columns impacted since last baseline version. This column will have values only for UPDATE operations and if columns are not impacted during update it will have NULL.
<primary key columns>It represents the primary key columns of your tracked user table. You can join your tracked user table with this columns to get only changed data from the tracked user table.
  • CHANGE_TRACKING_MIN_VALID_VERSION - This function returns the minimum valid version for a table after the change tracking information is available or retained. If this returns a higher value than your <PreviousSyncVersion>, you need to do a full pull from your tracking table.
  • CHANGE_TRACKING_CURRENT_VERSION - This function returns the current version after the last committed transaction at the database level. You will get NULL if you have not turned on Change Tracking at the database level.
  • CHANGE_TRACKING_IS_COLUMN_IN_MASK - The SYS_CHANGE_COLUMNS column in the table returned by CHANGETABLE function contains information in the binary format. CHANGE_TRACKING_IS_COLUMN_IN_MASK function takes two parameter, first one is the table name and second one is column name and will return 1 or 0 depending on whether that column was updated or not.
  • WITH CHANGE_TRACKING_CONTEXT - If you want to differentiate the changes being done by your own application compared to others you can use this statement with your DML operations.
How it differs from Change Data Capture
The basic differences are,
  • Change Tracking is a light-weight synchronous process designed to have very minimal impact on DML operations whereas Change Data Capture is an asynchronous process which reads the transaction log asynchronously in the background (also minimal impact on the system) to track and record the DML changes.
  • Since Change Tracking is a synchronous process, change information is available immediately after a commit whereas since Change Data Capture is an asynchronous process there might be some delay in recording.
  • Change Tracking only stores the fact that some DML operations have happened (no recording of changed data) whereas Change Data Capture stores both the fact that DML operations have happened and also its related data along with complete history.
  • Change Tracking has far less storage requirements than Change Data Capture.
  • Change Tracking does not require SQL Server Agent to be running whereas Change Data Capture requires SQL Server Agent to be running.
Usage Scenario
Consider a scenario where you have an ASP .Net application. You use caching to store master information, which are changed infrequently at your application layer, but you need to make sure you get only changed data when you query your database to refresh your cache. Another scenario would be in data warehousing application where you need to pull incremental data (net changes only) from your source. In these kind of scenarios, Change Tracking is very helpful.

Examples
Script #1 is a very simple script to set up an environment for Change Tracking testing. In this script I am creating a database and a table. Then I am inserting three records in the table.
Script #1
USE master
GO
CREATE DATABASE LearningCT
GO
USE LearningCT
GO
CREATE TABLE Employee
(
EmployeeID INT IDENTITY PRIMARY KEY,
FirstName VARCHAR(100),
LastName VARCHAR(100),
CurrentPayScale DECIMAL
)
GO
INSERT INTO Employee(FirstName, LastName, CurrentPayScale)
VALUES
('Steve', 'Savage', 10000),
('Ranjit', 'Srivastava', 12000),
('Akram', 'Haque', 12000)
GO
Script #2 enables Change Tracking at the database level and then at the table level. When enabling Change Tracking at the database level, you can use the CHANGE_RETENTION (default 2 days) clause to specify a timeframe or period for which net DML change information for tracked user tables will be maintained in the tracking table and the AUTO_CLEANUP (default ON) clause to instruct SQL Server to cleanup the tracking table, keeping only net DML change information for the period specified with CHANGE_RETENTION.
You can also change this setting later if required after enabling Change Tracking. If you disable it, the tracking table will keep on growing. At the table level, if you set TRACK_COLUMNS_UPDATED (default OFF) to ON, additional information about the impacted columns will be stored in an internal tracking table which improves performance by allowing applications to pull data for impacted columns only, though it has additional storage requirements to store this information.
Script #2
--Enabling Change Tracking at Database Level
ALTER DATABASE LearningCT
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
--AUTO_CLEANUP -> With this option you can switch ON or OFF automatic 
--tracking table clean up process
--CHANGE_RETENTION -> With this option, you can specify the time frame 
--for which tracked information will be maintained
--Enabling Change Tracking at Table Level
ALTER TABLE Employee
ENABLE CHANGE_TRACKING
WITH (TRACK_COLUMNS_UPDATED = ON)
--TRACK_COLUMNS_UPDATED -> With this option, you can include columns 
--also whose values were changed
Script #3 demonstrates the system tables which provide information about Change Tracking. Tablesys.change_tracking_databases shows a row for each database if Change Tracking is enabled for it whereassys.change_tracking_tables shows a row for each table if it has Change Tracking enabled in the current database only.
As I mentioned before SQL Server creates an internal tracking table, if you want to know more about it you can query the sys.internal_tables system table. Note the naming convention for internal tracking tables, the naming convention is: change_tracking_<table object id> as shown below.
Script #3
SELECT * FROM sys.change_tracking_databases 
SELECT * FROM sys.change_tracking_tables
SELECT * FROM sys.internal_tables
WHERE parent_object_id = OBJECT_ID('Employee')
Query result:
Script #4 shows you, how you can use the CHANGE_TRACKING_CURRENT_VERSION function to get the current version number at the database level, possibly the higher boundary for retained change information. The other function, CHANGE_TRACKING_MIN_VALID_VERSION gives the minimum version after the information for a table change has been retained or lower boundary for a table change information. Further I am using the CHANGETABLE function to retrieve change information after version 0. Since I have not performed any DML operations yet after enabling Change Tracking, you can see this function returns no records.
Script #4
SELECT CHANGE_TRACKING_CURRENT_VERSION ()
SELECT CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('Employee'))
SELECT * FROM CHANGETABLE 
(CHANGES Employee,0) as CT ORDER BY SYS_CHANGE_VERSION
Query result:
Script #5, I am performing some DML operations. Now the CHANGE_TRACKING_CURRENT_VERSION function returns 3, because 3 DML operations have been performed after enabling Change Tracking.
CHANGE_TRACKING_MIN_VALID_VERSION still returns 0 indicating the change information has been retained after version 0 for table Employee.
Next I am querying from the CHANGETRACKING function by passing the minimum valid version and this time this function returns all the change information after that specified version. You don't need to pass minimum valid version always, you can pass any version number between the minimum valid version and current version and this function will give you change information if there is any changes between these boundaries. You can see the last statement where I am passing 1 as the previous version number to get all changes after that version.
Script #5
INSERT INTO Employee(FirstName, LastName, CurrentPayScale)
VALUES('Ahmad', 'Jamal', 10000)
GO
DELETE FROM Employee
WHERE EmployeeID = 2
GO
UPDATE Employee
SET CurrentPayScale = 15000, FirstName = 'Akramul'
WHERE EmployeeID = 3
GO
SELECT CHANGE_TRACKING_CURRENT_VERSION ()
SELECT CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('Employee'))
SELECT * FROM CHANGETABLE 
(CHANGES Employee,0) as CT ORDER BY SYS_CHANGE_VERSION
Query result:
SELECT * FROM CHANGETABLE 
(CHANGES Employee,1) as CT ORDER BY SYS_CHANGE_VERSION
GO
Query result:
Script #6 shows you how you can retrieve all DML changes and the latest data after a specified version number. I am assuming, my caching application has been refreshed up to version 1 and hence I need only changes that happened after version 1. That's the reason I have assigned 1 to @PreviousVersion to pull change information after this version.
Script #6
-- Get all DML changes (Inserts, Updates and Deletes) after the previous synchronized version 
DECLARE @PreviousVersion bigint
SET @PreviousVersion = 1
SELECT CTTable.EmployeeID, CTTable.SYS_CHANGE_OPERATION, 
Emp.FirstName, Emp.LastName, Emp.CurrentPayScale,
CTTable.SYS_CHANGE_VERSION, CTTable.SYS_CHANGE_COLUMNS, CTTable.SYS_CHANGE_CONTEXT 
FROM CHANGETABLE (CHANGES Employee, @PreviousVersion) AS CTTable
LEFT OUTER JOIN Employee AS Emp
ON emp.EmployeeID = CTTable.EmployeeID
GO
Query result:
So far I have shown you how change tracking information is stored and how you can retrieve all the DML changes (and its related data from the tracking user table) after the specified version, as you know an update can impact one or more columns of a table. So once you are aware a record has been updated, you would like to know which columns of the row have been updated.
Script #7 shows you the use of the CHANGE_TRACKING_IS_COLUMN_IN_MASK function to check which columns have been impacted in the row update. This way your caching application will refresh only impacted columns and not all the columns of the changed row.
Script #7
-- Get column information impacted during updates
DECLARE @PreviousVersion bigint
SET @PreviousVersion = 1
SELECT CTTable.EmployeeID, CTTable.SYS_CHANGE_OPERATION, 
Emp.FirstName, Emp.LastName, Emp.CurrentPayScale,
[FirstNameChanged?] = 
CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID('Employee'), 
'FirstName', 'ColumnId'), SYS_CHANGE_COLUMNS),
[LastNameChanged?] = 
CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID('Employee'), 
'LastName', 'ColumnId'), SYS_CHANGE_COLUMNS),
[CurrentPayScaleChanged?] = 
CHANGE_TRACKING_IS_COLUMN_IN_MASK(COLUMNPROPERTY(OBJECT_ID('Employee'), 
'CurrentPayScale', 'ColumnId'), SYS_CHANGE_COLUMNS)
FROM CHANGETABLE (CHANGES Employee, @PreviousVersion) AS CTTable
LEFT OUTER JOIN Employee AS Emp
ON emp.EmployeeID = CTTable.EmployeeID
WHERE CTTable.SYS_CHANGE_OPERATION = 'U'
GO
Query result:
You would like to refresh your caching application data only if it has been updated by other applications. In other words, you want to differentiate the originator changing your source data.  For this you can use the WITH CHANGE_TRACKING_CONTEXT statement.
Script #8 shows you how you can specify the change context while performing DML operations.
Script #8
-- specifying a context while changing the records
DECLARE @RequesterAppID varbinary(128) = CAST('MyCachingAppID' AS varbinary(128));
WITH CHANGE_TRACKING_CONTEXT (@RequesterAppID)
UPDATE Employee
SET CurrentPayScale = 20000
WHERE EmployeeID = 1 
GO
--The internal tracking table will store, associted context as well
SELECT CTTable.EmployeeID, CTTable.SYS_CHANGE_VERSION, 
CAST(CTTable.SYS_CHANGE_CONTEXT AS VARCHAR(128)) AS RequesterAppID
FROM CHANGETABLE(CHANGES Employee, 1) AS CTTable;
GO
Query result:
Script #9 shows you how you can disable Change Tracking. Note: before disabling Change Tracking on a database level, you need to disable it first on all tables of the database if it has been enabled. To see if a database or a table has Change Tracking enabled, refer to Script #3.
Script #9
--Disabling Change Tracking at Table Level
ALTER TABLE Employee
DISABLE CHANGE_TRACKING;
--Disabling Change Tracking at Database Level
ALTER DATABASE LearningCT
SET CHANGE_TRACKING = OFF

Apart from using T-SQL to enable Change Tracking, you can also use SSMS to enable it on a database level as well as on table level. For a database, go to Database-> [user-database] -> Properties -> click on Change Tracking page on left side and you will see options similar to the options shown below.
For a table, go to Database-> [user-database] -> Tables -> [user-table] -> Properties -> click on Change Tracking page on the left side and you will see options similar to the options below.
Notes
  • You need to have a primary key on a table in order to enable Change Tracking. If you try to drop a primary key constraint after enabling Change Tracking, you will get an error.
  • Primary key update is not treated as update, but rather it will be treated as deletion of old value and insertion of new value in case of Change Tracking.
  • You need to enable Change Tracking on a database level first then you can enable it on a table level.
  • In order to disable Change Tracking on a database level, you need to disable it on all tables, within specified database, if they have Change Tracking enabled.
  • Change Tracking works properly only if a database has a compatibility level 90 or higher.
  • Similar to the Filtered Index, you cannot specify a filter predicate on a table for Change Tracking.
  • Column adds and drops will not be tracked (no DDL), only the updates to the column are tracked.
  • If you perform TRUNCATE on Change Tracking enabled table, no tracking is done for the records deleted, but rather a minimum valid version is updated and your application data will need to be re-initialized.
  • Change Tracking puts extra overhead on DML operations as additional information is also recorded as part of the DML operation. Though it has been designed to have minimal impact compared to other alternatives to track changes.
  • As discussed above, an internal table is created (created in the same file group as the user table) for each Change Tracking enabled table. Also one internal transaction table is created in the database which stores one row for each committed transaction. Hence you will have additional storage requirements for these internal tables.
Use of snapshot isolation level is highly recommended to have consistent change tracking information.

No comments:

Post a Comment