Wednesday, October 5, 2016

Find text/string in Stored Procedures, View, Trigger, and Function

http://www.codeproject.com/Tips/603593/Find-text-in-Stored-Procedures-View-Trigger-and-Fu

DECLARE @SEARCHSTRING VARCHAR(255), @notcontain Varchar(255)

SELECT @SEARCHSTRING = 'HID', @notcontain = ''

SELECT DISTINCT sysobjects.name AS [Object Name] ,
case when sysobjects.xtype = 'P' then 'Stored Proc'
when sysobjects.xtype = 'TF' then 'Function'
when sysobjects.xtype = 'TR' then 'Trigger'
when sysobjects.xtype = 'V' then 'View'
end as [Object Type]
FROM sysobjects,syscomments
WHERE sysobjects.id = syscomments.id
AND sysobjects.type in ('P','TF','TR','V')
AND sysobjects.category = 0
AND CHARINDEX(@SEARCHSTRING,syscomments.text)>0
AND ((CHARINDEX(@notcontain,syscomments.text)=0
or CHARINDEX(@notcontain,syscomments.text)<>0))

Tuesday, September 6, 2016

Minimizing impact of widening an IDENTITY column – part 1 Posted by Aaron Bertrand on January 29, 2016 Add comments . Jan 29

http://sqlperformance.com/2016/01/sql-indexes/widening-identity-column-1

292016
 
A problem that I have seen crop up a few times recently is the scenario where you have created an IDENTITY column as an INT, and now are nearing the upper bound and need to make it larger (BIGINT). If your table is large enough that you're hitting the upper bound of an integer (over 2 billion), this is not an operation you can pull off between lunch and your coffee break on a Tuesday. This series will explore the mechanics behind such a change, and different ways to make it happen with varying impacts on uptime. In the first part, I wanted to take a close look at the physical impact of changing an INT to a BIGINT without any of the other variables.

What Actually Happens When You Widen an INT?

INT and BIGINT are fixed size data types, therefore a conversion from one to the other has to touch the page, making this a size-of-data operation. This is counter-intuitive, because it seems like it would not be possible for a data type change from INT to BIGINT to require the additional space on the page immediately (and for an IDENTITY column, ever). Thinking logically, this is space that could not possibly be needed until later, when an existing INT value was changed to a value > 4 bytes. But this isn't how it works today. Let's create a simple table and see:
CREATE TABLE dbo.FirstTest
(
  RowID  int         IDENTITY(1,1), 
  Filler char(2500)  NOT NULL DEFAULT 'x'
);
GO
 
INSERT dbo.FirstTest WITH (TABLOCKX) (Filler)
SELECT TOP (20) 'x' FROM sys.all_columns AS c;
GO
A simple query can tell me the low and high page allocated to this object, as well as the total page count:
SELECT 
  lo_page    = MIN(allocated_page_page_id), 
  hi_page    = MAX(allocated_page_page_id), 
  page_count = COUNT(*)
FROM sys.dm_db_database_page_allocations
(
  DB_ID(), OBJECT_ID(N'dbo.FirstTest'), NULL, NULL, NULL
);
Now if I run that query before and after changing the data type from INT to BIGINT:
ALTER TABLE dbo.FirstTest ALTER COLUMN RowID bigint;
I see these results:
-- before:
 
lo_page    hi_page    page_count
-------    -------    ----------
243        303        17
 
-- after:
 
lo_page    hi_page    page_count
-------    -------    ----------
243        319        33
It is clear that 16 new pages were added to make room for the additional space required (even though we know none of the values in the table actually require 8 bytes). But this wasn't actually accomplished the way you might think – rather than widen the column on the existing pages, the rows were moved to new pages, with pointers left behind in their place. Looking at page 243 before and after (with the undocumented DBCC PAGE):
-- ******** Page 243, before: ********
 
Slot 0 Offset 0x60 Length 12
 
Record Type = PRIMARY_RECORD        Record Attributes =  NULL_BITMAP    Record Size = 12
 
Memory Dump @0x000000E34B9FA060
 
0000000000000000:   10000900 01000000 78020000                    .. .....x...
 
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
 
RowID = 1                           
 
Slot 0 Column 2 Offset 0x8 Length 1 Length (physical) 1
 
filler = x                          
 
 
-- ******** Page 243, after: ********
 
Slot 0 Offset 0x60 Length 9
 
Record Type = FORWARDING_STUB       Record Attributes =                 Record Size = 9
 
Memory Dump @0x000000E34B9FA060
 
0000000000000000:   04280100 00010078 01                          .(.....x.
Forwarding to  =  file 1 page 296 slot 376
Then if we look at the target of the pointer, page 296, slot 376, we see:
Slot 376 Offset 0x8ca Length 34
 
Record Type = FORWARDED_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 34                    
Memory Dump @0x000000E33BBFA8CA
 
0000000000000000:   32001100 01000000 78010000 00000000 00030000  2.......x...........
0000000000000014:   01002280 0004f300 00000100 0000               .."...รณ.......
Forwarded from  =  file 1 page 243 slot 0                                
 
Slot 376 Column 67108865 Offset 0x4 Length 0 Length (physical) 4
 
DROPPED = NULL                      
 
Slot 376 Column 2 Offset 0x8 Length 1 Length (physical) 1
 
filler = x                          
 
Slot 376 Column 1 Offset 0x9 Length 8 Length (physical) 8
 
RowID = 1
This is a very disruptive change to the structure of the table, obviously. (And an interesting side observation: the physical order of the columns, RowID and filler, have been flipped on the page.) Reserved space jumps from 136 KB to 264 KB, and average fragmentation bumps up modestly from 33.3% to 40%. This space does not get recovered by a rebuild, online or not, or a reorg, and – as we'll see shortly – this is not because the table is too small to benefit.
Note: this is true even in the most recent builds of SQL Server 2016 – while more and more operations like this have been improved to become metadata-only operations in modern versions, this one hasn't been fixed yet, though clearly it could be – again, especially in the case where the column is an IDENTITY column, which can't be updated by definition.
Performing the operation with the new ALTER COLUMN / ONLINE syntax, which I talked about last year, yields some differences:
-- drop / re-create here
ALTER TABLE dbo.FirstTest ALTER COLUMN RowID bigint WITH (ONLINE = ON);
Now the before and after becomes:
-- before:
 
lo_page    hi_page    page_count
-------    -------    ----------
243        303        17
 
-- after:
 
lo_page    hi_page    page_count
-------    -------    ----------
307        351        17
In this case, it was still a size-of-data operation, but the existing pages were copied and re-created due to the ONLINE option. You might wonder why, when we changed the column size as an ONLINE operation, the table is able to cram more data into the same number of pages? Each page is now denser (fewer rows but more data per page), at the cost of scatter – fragmentation doubles from 33.3% to 66.7%. Space used shows more data in the same reserved space (from 72 KB / 136 KB to 96 KB / 136 KB).

And at Larger Scale?

Let's drop the table, re-create it, and populate it with a lot more data:
CREATE TABLE dbo.FirstTest
(
  RowID INT IDENTITY(1,1), 
  filler CHAR(1) NOT NULL DEFAULT 'x'
);
GO
 
INSERT dbo.FirstTest WITH (TABLOCKX) (filler) 
SELECT TOP (5000000) 'x' FROM sys.all_columns AS c1
  CROSS JOIN sys.all_columns AS c2;
From the outset, we now have 8,657 pages, a fragmentation level of 0.09%, and space used is 69,208 KB / 69,256 KB.
If we change the data type to bigint, we jump to 25,630 pages, fragmentation is reduced to 0.06%, and space used is 205,032 KB / 205,064 KB. An online rebuild changes nothing, nor does a reorg. The whole process, including a rebuild, takes about 97 seconds on my machine (the data population took all of 2 seconds).
If we change the data type to bigint using ONLINE, the bump is only to 11,140 pages, fragmentation goes to 85.5%, and space used is 89,088 KB / 89160 KB. Online rebuilds and reorgs still change nothing. This time, whole process only takes about a minute. So the new syntax definitely leads to faster operations and less additional disk space, but high fragmentation. I'll take it.

Up Next

I am sure you are looking at my tests above, and wondering about a few things. Most importantly, why is the table a heap? I wanted to investigate what actually happens to the page structure and page count with no indexes, keys, or constraints fuzzying up the details. You might also wonder why this change was so easy – in a scenario where you have to change a true IDENTITY column, it is probably also the clustered primary key, and has foreign key dependencies in other tables. This definitely introduces some hiccups to the process. We'll take a closer look at these things in the next post in the series.

[ Part 1 | Part 2 | Part 3 | Part 4 ]

Monday, July 25, 2016

SQL SERVER – 2008 – Configure Database Mail – Send Email From SQL Database

http://blog.sqlauthority.com/2008/08/23/sql-server-2008-configure-database-mail-send-email-from-sql-database/

Today in this article I would discuss about the Database Mail which is used to send the Email using SQL Server.  Previously I had discussed about SQL SERVER – Difference Between Database Mail and SQLMail. Database mail is the replacement of the SQLMail with many enhancements. So one should stop using the SQL Mail and upgrade to the Database Mail. Special thanks to Software Developer Monicawho helped with all the images and extensive testing of subject matter of this article
In order to send mail using Database Mail in SQL Server, there are 3 basic steps that need to be carried out. 1) Create Profile and Account 2) Configure Email 3) Send Email.
Step 1) Create Profile and Account:
You need to create a profile and account using the Configure Database Mail Wizard which can be accessed from the Configure Database Mail context menu of the Database Mail node in Management Node. This wizard is used to manage accounts, profiles, and Database Mail global settings which are shown below:
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-2 openWizard
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-4 CreateProfile
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-5 CreateProfile2
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-6 AddAccount
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-7 SMTPAccounts
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-8 Manage
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-9 Parameters
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-10 Complete
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database step-11 Success
Step 2) Configure Email:
After the Account and the Profile are created successfully, we need to configure the Database Mail. To configure it, we need to enable the Database Mail XPs parameter through the sp_configure stored procedure, as shown here:
sp_CONFIGURE 'show advanced'1
GO
RECONFIGUREGOsp_CONFIGURE 'Database Mail XPs'1
GO
RECONFIGUREGO
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database ConfigQuery
Step 3) Send Email:
After all configurations are done, we are now ready to send an email. To send mail, we need to execute a stored procedure sp_send_dbmail and provide the required parameters as shown below:
USE msdb
GO
EXEC sp_send_dbmail @profile_name='PinalProfile',@recipients='test@Example.com',@subject='Test message',@body='This is the body of the test message.
Congrates Database Mail Received By you Successfully.'
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database SendQuery
After all validations of the parameters entered are done, certain stored procedures are executed and the mail is queued by Service Broker, read more at SQL SERVER – Introduction to Service Broker.
Database Mail keeps copies of outgoing e-mail messages and displays them in thesysmail_allitems, sysmail_sentitemssysmail_unsentitemssysmail_faileditems . The status of the mail sent can be seen in sysmail_mailitems table, when the mail is sent successfully the sent_status field of the sysmail_mailitems table is set to 1 which can again be seen in sysmail_sentitems table. The mails that are failed will have the sent_status field  value to 2 and those are unsent will have value 3.
The log can be checked in sysmail_log table as shown below:
SELECT *FROM sysmail_mailitems
GO
SELECT *FROM sysmail_log
GO
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database MailLog
Status can be verified using sysmail_sentitems table.
SQL SERVER - 2008 - Configure Database Mail - Send Email From SQL Database sent_status
After sending mail you can check the mail received in your inbox, just as I received as shown below.

Details:

https://msdn.microsoft.com/en-in/library/ms190307.aspx

sp_send_dbmail (Transact-SQL)

 
THIS TOPIC APPLIES TO:yesSQL Server (starting with 2008)noAzure SQL DatabasenoAzure SQL Data Warehouse noParallel Data Warehouse
Sends an e-mail message to the specified recipients. The message may include a query result set, file attachments, or both. When mail is successfully placed in the Database Mail queue, sp_send_dbmail returns the mailitem_id of the message. This stored procedure is in the msdb database.
  
sp_send_dbmail [ [ @profile_name = ] 'profile_name' ]  
    [ , [ @recipients = ] 'recipients [ ; ...n ]' ]  
    [ , [ @copy_recipients = ] 'copy_recipient [ ; ...n ]' ]  
    [ , [ @blind_copy_recipients = ] 'blind_copy_recipient [ ; ...n ]' ]  
    [ , [ @from_address = ] 'from_address' ]  
    [ , [ @reply_to = ] 'reply_to' ]   
    [ , [ @subject = ] 'subject' ]   
    [ , [ @body = ] 'body' ]   
    [ , [ @body_format = ] 'body_format' ]  
    [ , [ @importance = ] 'importance' ]  
    [ , [ @sensitivity = ] 'sensitivity' ]  
    [ , [ @file_attachments = ] 'attachment [ ; ...n ]' ]  
    [ , [ @query = ] 'query' ]  
    [ , [ @execute_query_database = ] 'execute_query_database' ]  
    [ , [ @attach_query_result_as_file = ] attach_query_result_as_file ]  
    [ , [ @query_attachment_filename = ] query_attachment_filename ]  
    [ , [ @query_result_header = ] query_result_header ]  
    [ , [ @query_result_width = ] query_result_width ]  
    [ , [ @query_result_separator = ] 'query_result_separator' ]  
    [ , [ @exclude_query_output = ] exclude_query_output ]  
    [ , [ @append_query_error = ] append_query_error ]  
    [ , [ @query_no_truncate = ] query_no_truncate ]   
…………[ , [@query_result_no_padding = ] @query_result_no_padding ]   
    [ , [ @mailitem_id = ] mailitem_id ] [ OUTPUT ]  

@profile_name= ] 'profile_name'
Is the name of the profile to send the message from. The profile_name is of type sysname, with a default of NULL. The profile_name must be the name of an existing Database Mail profile. When no profile_name is specified, sp_send_dbmail uses the default private profile for the current user. If the user does not have a default private profile, sp_send_dbmail uses the default public profile for the msdbdatabase. If the user does not have a default private profile and there is no default public profile for the database, @profile_name must be specified.
@recipients= ] 'recipients'
Is a semicolon-delimited list of e-mail addresses to send the message to. The recipients list is of type varchar(max). Although this parameter is optional, at least one of @recipients@copy_recipients, or @blind_copy_recipients must be specified, orsp_send_dbmail returns an error.
@copy_recipients= ] 'copy_recipients'
Is a semicolon-delimited list of e-mail addresses to carbon copy the message to. The copy recipients list is of type varchar(max). Although this parameter is optional, at least one of @recipients@copy_recipients, or @blind_copy_recipients must be specified, orsp_send_dbmail returns an error.
@blind_copy_recipients= ] 'blind_copy_recipients'
Is a semicolon-delimited list of e-mail addresses to blind carbon copy the message to. The blind copy recipients list is of typevarchar(max). Although this parameter is optional, at least one of @recipients@copy_recipients, or @blind_copy_recipients must be specified, or sp_send_dbmail returns an error.
@from_address= ] 'from_address'
Is the value of the 'from address' of the email message. This is an optional parameter used to override the settings in the mail profile. This parameter is of type varchar(MAX). SMTP security settings determine if these overrides are accepted. If no parameter is specified, the default is NULL.
@reply_to= ] 'reply_to'
Is the value of the 'reply to address' of the email message. It accepts only one email address as a valid value. This is an optional parameter used to override the settings in the mail profile. This parameter is of type varchar(MAX). SMTP secuirty settings determine if these overrides are accepted. If no parameter is specified, the default is NULL.
@subject= ] 'subject'
Is the subject of the e-mail message. The subject is of type nvarchar(255). If no subject is specified, the default is 'SQL Server Message'.
@body= ] 'body'
Is the body of the e-mail message. The message body is of type nvarchar(max), with a default of NULL.
@body_format= ] 'body_format'
Is the format of the message body. The parameter is of type varchar(20), with a default of NULL. When specified, the headers of the outgoing message are set to indicate that the message body has the specified format. The parameter may contain one of the following values:
  • TEXT
  • HTML
Defaults to TEXT.
@importance= ] 'importance'
Is the importance of the message. The parameter is of type varchar(6). The parameter may contain one of the following values:
  • Low
  • Normal
  • High
Defaults to Normal.
@sensitivity= ] 'sensitivity'
Is the sensitivity of the message. The parameter is of type varchar(12). The parameter may contain one of the following values:
  • Normal
  • Personal
  • Private
  • Confidential
Defaults to Normal.
@file_attachments= ] 'file_attachments'
Is a semicolon-delimited list of file names to attach to the e-mail message. Files in the list must be specified as absolute paths. The attachments list is of type nvarchar(max). By default, Database Mail limits file attachments to 1 MB per file.
@query= ] 'query'
Is a query to execute. The results of the query can be attached as a file, or included in the body of the e-mail message. The query is of type nvarchar(max), and can contain any valid Transact-SQL statements. Note that the query is executed in a separate session, so local variables in the script calling sp_send_dbmail are not available to the query.
@execute_query_database= ] 'execute_query_database'
Is the database context within which the stored procedure runs the query. The parameter is of type sysname, with a default of the current database. This parameter is only applicable if @query is specified.
@attach_query_result_as_file= ] attach_query_result_as_file
Specifies whether the result set of the query is returned as an attached file. attach_query_result_as_file is of type bit, with a default of 0.
When the value is 0, the query results are included in the body of the e-mail message, after the contents of the @body parameter. When the value is 1, the results are returned as an attachment. This parameter is only applicable if @query is specified.
@query_attachment_filename= ] query_attachment_filename
Specifies the file name to use for the result set of the query attachment. query_attachment_filename is of type nvarchar(255), with a default of NULL. This parameter is ignored when attach_query_result is 0. When attach_query_result is 1 and this parameter is NULL, Database Mail creates an arbitrary filename.
@query_result_header= ] query_result_header
Specifies whether the query results include column headers. The query_result_header value is of type bit. When the value is 1, query results contain column headers. When the value is 0, query results do not include column headers. This parameter defaults to 1. This parameter is only applicable if @query is specified.
@query_result_width = ] query_result_width
Is the line width, in characters, to use for formatting the results of the query. The query_result_width is of type int, with a default of 256. The value provided must be between 10 and 32767. This parameter is only applicable if @query is specified.
@query_result_separator= ] 'query_result_separator'
Is the character used to separate columns in the query output. The separator is of type char(1). Defaults to ' ' (space).
@exclude_query_output= ] exclude_query_output
Specifies whether to return the output of the query execution in the e-mail message. exclude_query_output is bit, with a default of 0. When this parameter is 0, the execution of the sp_send_dbmail stored procedure prints the message returned as the result of the query execution on the console. When this parameter is 1, the execution of the sp_send_dbmail stored procedure does not print any of the query execution messages on the console.
@append_query_error= ] append_query_error
Specifies whether to send the e-mail when an error returns from the query specified in the @query argument. append_query_error isbit, with a default of 0. When this parameter is 1, Database Mail sends the e-mail message and includes the query error message in the body of the e-mail message. When this parameter is 0, Database Mail does not send the e-mail message, and sp_send_dbmail ends with return code 1, indicating failure.
@query_no_truncate= ] query_no_truncate
Specifies whether to execute the query with the option that avoids truncation of large variable length data types (varchar(max),nvarchar(max)varbinary(max)xmltextntextimage, and user-defined data types). When set, query results do not include column headers. The query_no_truncate value is of type bit. When the value is 0 or not specified, columns in the query truncate to 256 characters. When the value is 1, columns in the query are not truncated. This parameter defaults to 0.
System_CAPS_ICON_note.jpg Note

When used with large amounts of data, the @query_no_truncate option consumes additional resources and can slow server performance.
@query_result_no_padding ] @query_result_no_padding
The type is bit. The default is 0. When you set to 1, the query results are not padded, possibly reducing the file size.If you set @query_result_no_padding to 1 and you set the @query_result_width parameter, the @query_result_no_padding parameter overwrites the @query_result_width parameter.
In this case no error occurs.
If you set the @query_result_no_padding to 1 and you set the @query_no_truncate parameter, an error is raised.
@mailitem_id= ] mailitem_id [ OUTPUT ]
Optional output parameter returns the mailitem_id of the message. The mailitem_id is of type int.
A return code of 0 means success. Any other value means failure. The error code for the statement that failed is stored in the @@ERRROR variable.
On success, returns the message "Mail queued."
Before use, Database Mail must be enabled using the Database Mail Configuration Wizard, or sp_configure.
sysmail_stop_sp stops Database Mail by stopping the Service Broker objects that the external program uses. sp_send_dbmail still accepts mail when Database Mail is stopped using sysmail_stop_sp. To start Database Mail, use sysmail_start_sp.
When @profile is not specified, sp_send_dbmail uses a default profile. If the user sending the e-mail message has a default private profile, Database Mail uses that profile. If the user has no default private profile, sp_send_dbmail uses the default public profile. If there is no default private profile for the user and no default public profile, sp_send_dbmail returns an error.
sp_send_dbmail does not support e-mail messages with no content. To send an e-mail message, you must specify at least one of@body@query@file_attachments, or @subject. Otherwise, sp_send_dbmail returns an error.
Database Mail uses the Microsoft Windows security context of the current user to control access to files. Therefore, users who are authenticated with SQL Server Authentication cannot attach files using @file_attachments. Windows does not allow SQL Server to provide credentials from a remote computer to another remote computer. Therefore, Database Mail may not be able to attach files from a network share in cases where the command is run from a computer other than the computer that SQL Server runs on.
If both @query and @file_attachments are specified and the file cannot be found, the query is still executed but the e-mail is not sent.
When a query is specified, the result set is formatted as inline text. Binary data in the result is sent in hexadecimal format.
The parameters @recipients@copy_recipients, and @blind_copy_recipients are semicolon-delimited lists of e-mail addresses. At least one of these parameters must be provided, or sp_send_dbmail returns an error.
When executing sp_send_dbmail without a transaction context, Database Mail starts and commits an implicit transaction. When executing sp_send_dbmail from within an existing transaction, Database Mail relies on the user to either commit or roll back any changes. It does not start an inner transaction.
Execute permissions for sp_send_dbmail default to all members of the DatabaseMailUser database role in the msdb database. However, when the user sending the message does not have permission to use the profile for the request, sp_send_dbmail returns an error and does not send the message.

A. Sending an e-mail message

This example sends an e-mail message to your friend using the e-mail address myfriend@Adventure-Works.com. The message has the subject Automated Success Message. The body of the message contains the sentence 'The stored procedure finished successfully'.
EXEC msdb.dbo.sp_send_dbmail  
    @profile_name = 'Adventure Works Administrator',  
    @recipients = 'yourfriend@Adventure-Works.com',  
    @body = 'The stored procedure finished successfully.',  
    @subject = 'Automated Success Message' ;  

B. Sending an e-mail message with the results of a query

This example sends an e-mail message to your friend using the e-mail address yourfriend@Adventure-Works.com. The message has the subject Work Order Count, and executes a query that shows the number of work orders with a DueDate less than two days after April 30, 2004. Database Mail attaches the result as a text file.
EXEC msdb.dbo.sp_send_dbmail  
    @profile_name = 'Adventure Works Administrator',  
    @recipients = 'yourfriend@Adventure-Works.com',  
    @query = 'SELECT COUNT(*) FROM AdventureWorks2012.Production.WorkOrder  
                  WHERE DueDate > ''2004-04-30''  
                  AND  DATEDIFF(dd, ''2004-04-30'', DueDate) < 2' ,  
    @subject = 'Work Order Count',  
    @attach_query_result_as_file = 1 ;  

C. Sending an HTML e-mail message

This example sends an e-mail message to your friend using the e-mail address yourfriend@Adventure-Works.com. The message has the subject Work Order List, and contains an HTML document that shows the work orders with a DueDate less than two days after April 30, 2004. Database Mail sends the message in HTML format.
DECLARE @tableHTML  NVARCHAR(MAX) ;  
  
SET @tableHTML =  
    N'<H1>Work Order Report</H1>' +  
    N'<table border="1">' +  
    N'<tr><th>Work Order ID</th><th>Product ID</th>' +  
    N'<th>Name</th><th>Order Qty</th><th>Due Date</th>' +  
    N'<th>Expected Revenue</th></tr>' +  
    CAST ( ( SELECT td = wo.WorkOrderID,       '',  
                    td = p.ProductID, '',  
                    td = p.Name, '',  
                    td = wo.OrderQty, '',  
                    td = wo.DueDate, '',  
                    td = (p.ListPrice - p.StandardCost) * wo.OrderQty  
              FROM AdventureWorks.Production.WorkOrder as wo  
              JOIN AdventureWorks.Production.Product AS p  
              ON wo.ProductID = p.ProductID  
              WHERE DueDate > '2004-04-30'  
                AND DATEDIFF(dd, '2004-04-30', DueDate) < 2   
              ORDER BY DueDate ASC,  
                       (p.ListPrice - p.StandardCost) * wo.OrderQty DESC  
              FOR XML PATH('tr'), TYPE   
    ) AS NVARCHAR(MAX) ) +  
    N'</table>' ;  
  
EXEC msdb.dbo.sp_send_dbmail @recipients='yourfriend@Adventure-Works.com',  
    @subject = 'Work Order List',  
    @body = @tableHTML,  
    @body_format = 'HTML' ;  

Thursday, June 16, 2016

SQL Server Statistics Questions

https://www.simple-talk.com/sql/performance/sql-server-statistics-questions-we-were-too-shy-to-ask/

Try as I might, I find it hard to over-emphasize the importance of statistics to SQL Server. Bad or missing statistics leads to poor choices by the optimizer: The result is horrific performance. Recently, for example, I watched a query go from taking hours to taking seconds just because we got a good set of statistics on the data. The topic of statistics and their maintenance is not straightforward. Lots of questions occur to people when I’m doing a presentation about statistics, and some get asked in front of the rest of the audience. Then there are other questions that get asked later on, in conversation. Here are some of those other questions….

What’s the difference between statistics on a table and statistics on an index?

There is no essential difference between the statistics on an index and the statistics on a table. They’re created at different points and, unless you’re creating the statistics manually yourself, they’re created slightly differently. The statistics on an index are created with the index. So, for an index created on pre-existing data, you’ll get a full scan against that data as part of the task of creating the index which is also then used to create the statistics for the index. The automatically-created statistics on columns are also usually created against existing data when the column is referenced in one of the filtering statements (such as WHERE or JOIN .. ON). But these are created using sampled data, not a full scan. Other than the source and type of creation, these two types of statistics are largely the same.

How can the Query Optimizer work out how many rows will be returned from looking at the statistics?

This is the purpose of the histogram within the statistics. Here’s an example histogram from the Person.Addresstable in AdventureWorks2012:
If I were doing a search for the address ‘1313 Mockingbird Lane’ then the query optimizer is going to look at the histogram to determine two things:
  • Is it possible that this set of statistics contains this value
  • If it does contain the value, how many likely rows will be returned
The RANGE_HI_KEY column shows the top of a set of data within the histogram, so a search for ‘1313’ would place it within the step represented by row ten which has a RANGE_HI_KEY value of ‘137 Lancelot Dr.’ So the first question is answered. The optimizer will then look at the AVG_RANGE_ROWS to determine the average number of rows that match any given value within the step. So in this case, the optimizer will assume that there are 1.623188 rows returned for the value. But, these are all estimates. In reality, the database doesn’t contain the value ‘1313 Mockingbird Lane

When data changes, SQL Server will automatically maintain the statistics on indexes that I explicitly create, if that setting is enabled. Does it also maintain the statistics automatically created on columns?

As data changes in your tables, the statistics - all the statistics - will be updated based on the following formula:
  • When a table with no rows gets a row
  • When 500 rows are changed to a table that is less than 500 rows
  • When 20% + 500 are changed in a table greater than 500 rows
By ‘change’ we mean if a row is inserted, updated or deleted. So, yes, even the automatically-created statistics get updated and maintained as the data changes.

Where are the statistics actually stored? Do they take up much space? Can I save space by only having the essential ones?

The statistics themselves are stored within your database in a series of internal tables that include sysindexes. You can view some of the information about them using the system views sys.stats and sys.indexes, but the most detail is gleaned using a function, DBCC SHOW_STATISTICS. The statistics themselves take very little space. The header is a single row of information. The density is a set of rows with only three columns, equal in the number of rows to the number of columns defining the key columns of the statistic. Then you have the histogram. The histogram is up to 200 rows and never exceeds that amount. This means statistics do not require much room at all. While you can save a little bit of space by removing unneeded statistics, the space savings are too small to ever be worthwhile .

How do I know when stats were last updated?

You can look at the header using DBCC SHOW_STATISTICS. This contains a bunch of general information about the statistics in question. This example is from the Person.Address table:
As you can see, the last time the statistics were updated was January 4, 2013 at 7:01AM.

How reliable is the automated update of statistics in SQL Server?

You’d have to define what you mean by reliable. They are very reliable. They’re also sampled and automated to update on the criteria that we outlined in the first question. If the data in your system is fairly well distributed, as is usually the case, then the sampled statistics will work well for you. By ‘well distributed’ I mean that you’ll get a consistent view of all the available data by pulling just a sample of the data. Most systems, most of the time, will have reasonably well distributed data. But, almost every system I’ve worked with has exceptions. There always seems to be that one rogue table or that one index that’s got a very weird distribution of data, or gets updated very frequently, but not frequently enough to trigger an automatic update of the statistics. In this situation, the statistics can get stale or be inaccurate. But the problem isn’t the automated process. It’s that this data is skewed. These are the situations where you’ll need to manually take control of your statistics. This should be an exceptional event in most systems.

Are there any scripts or tools that will help me maintain statistics?

SQL Server provides two basic commands to help you to maintain your statistics, sp_updatestats and UPDATE STATISTICSSp_updatestats will look at all the statistics within a database to see if any rows have been modified in the table that the statistics support. If one or more rows have been modified, then you’ll get a sampled update of the statistics. UPDATE STATISTICS will update the statistics in the way that you specify, against the object that you specify; whether it is a table, an index, or a specific set of statistics.

Do we update statistics before or after we rebuild/reorganize indexes?

Just remember that, when you create an index, part of the task of creating that index is to create the statistics using a full scan of the data. Since a rebuild of an index is effectively a ‘drop and recreate’ of the index, the statistics are also recreated. I’ve frequently seen maintenance scripts that rebuild indexes and then update statistics usingsp_updatestats. This basically replaces a good set of statistics based on a full scan with a less-accurate set of statistics based on random sampling. So if you are rebuilding indexes, I would not recommend updating the statistics since this is extra work for a less effective statistic.
Now when you reorganize an index, no modifications of any kind are made to the statistics. So if you’ve had enough modifications to the data that you feel you should also update the statistics, go ahead and do it for indexes that you’ve reorganized because you won’t be hurting any other work.
If you absolutely must update the statistics and rebuild the indexes and you’re not going to try to tightly control exactly which indexes and tables you do this on, then the best practice would be to update the statistics first and then rebuild the indexes second.

Is using UPDATE STATISTICS WITH FULL SCAN the same as the statistics update that happens during an index rebuild?

Yes. The keyword and tricky phrase that you have to look at is WITH FULL SCAN. That indicates that the full data set is used to create/update the statistics. Since rebuilding an index is, to a degree, almost the same as recreating the index, you’re getting a full scan of the data set to put the index back together. This full scan also updates the statistics. If you run UPDATE STATISTICS WITH FULL SCAN, once again, it’s looking at all the data.

How do you determine when you need to manually update statistics?

This is one of the hardest questions about statistics to answer, because there is no hard-and-fast formula. The short answer is that you need to determine whether the statistics accurately enough represent the distribution of the data, and update them if it doesn’t. You need to update your statistics manually when the automatic processes are either not occurring frequently enough to provide you with a good set of statistics or because the sampled nature of the automatic updates is causing your statistics to be inaccurate. The only way to know whether you need to do these things is to track the behavior of queries within your system in order to spot when the optimizer starts making bad choices. This is probably because the statistics are not reflecting the data in a way that leads to good execution plans. It’s very easy to say, but a lot of work to do.

How often should I run a manual update of my statistics?

The answer to that depends on many circumstances. I’ve worked on systems that never needed any manual intervention on the statistics at all. At other times I’ve experienced a problem where we were running a manual update of the statistics every two minutes (that was a horrifically broken database and indexing design, not a pattern to be emulated). More usually, you’ll be dealing with a table that gets lots of inserts, maybe with an index on a datetime column, or an identity column, so that the data is outside the range of the statistics, but, there’s not enough activity to cause the automatic update to fire. Then, you’ll need to manually update statistics on a scheduled basis.
But how often? Often enough. You’ll need to determine that period based on your system and your circumstances. There’s not a formula I can provide that will precisely tell you when to run a manual update on your statistics. Further, I can’t tell you how to sample your statistics either. You may be experiencing random sampling that is perfect , or you may need some more specified degree of sampling right up to a full scan. You’ll need to experiment to understand what the right answer is in your circumstances.

Is there a way to change the sample rate for particular tables in SQL Server?

You can define how statistics sample the table when you create them or update them. You can specify either the number of rows to be sampled or the percentage of the table to be sampled. You can even specify that 0 percent or 0 rows be sampled. This will update your statistics, but with no actual statistics data. It’s probably a dangerous choice. If you use sp_update stats or UPDATE STATISTICS you can specify, by using the RESAMPLE command, the sample rate that you specified when you created the statistics should then be reused. If you controlled the sample rate directly, the next time you use RESAMPLE, that sample rate will be used again. This includes statistics created on columns and on indexes.
One point worth noting is that the automatic update of statistics will be a sampled update. If you have a very specific need for a particular sample rate on statistics and there’s a good chance that the automatic maintenance could tax your system too much, you might consider turning off the automatic update for that table or set of statistics. You can do this by specifying NORECOMPUTE option when you either UPDATE or CREATE STATISTICS. You have to use theSTATISTICS_NORECOMPUTE option when you create an index. You can specify NORECOMPUTE with a manual update of the statistics. But, if you do this, and the data is changing, you need to plan for manual updates to those statistics.

Can you create a set of statistics in SQL Server like you do in Oracle?

Oracle allows you to create custom statistics all the way down to creating your own histogram. SQL Server doesn’t give you that much control. However, you can create something in SQL Server that doesn’t exist in Oracle; filtered statistics. These are extremely useful when dealing with partitioned data or data that is wildly skewed due to wide ranging data or lots of nulls. Using AdventureWorks2012 as an example, I could create a set of statistics on multiple columns such as TaxAmt and CurrencyRateID in order to have a denser, more unique, value for the statistics than would be created by the optimizer on each of the columns separately. The code to do that looks like this:
CREATE STATISTICS TaxAmtFiltered
ON Sales.SalesOrderHeader (TaxAmt,CurrencyRateID)
WHERE TaxAmt > 1000 AND CurrencyRateID IS NOT NULL
WITH FULLSCAN;
This may help the optimizer to make better choices when creating the execution plan, but you’ll need to test it in any given setting.

Can you have statistics on a View?

No, and yes. No, your basic view is nothing but a query. That query doesn’t have statistics. It’s the same as a SELECT query inside a stored procedure or a batch statement. But, you can create a construct called an indexed view, or materialized view, which is a clustered index based on the query that defines the view. That’s an index, so it gets statistics just like any other index. Further, if you run queries that reference the columns in a filtering clause in the new clustered index, statistics can be created on those columns. While this is, strictly speaking, not the same as statistics on a view, it’s as close as you can get within SQL Server.

Are statistics created on temporary tables?

Yes. The major difference between a table variable and a temporary table is that a temporary table has statistics. The rules for the creation and maintenance of these statistics are exactly the same as for a regular table within SQL Server. So if you reference a column in a temporary table in a filtering command in T-SQL such as WHERE orJOIN, then a set of statistics will get created. Unfortunately, the creation of the statistics causes a statement recompile. This is a potential disadvantage of temporary tables: For small statements this is a cheap operation. For larger queries this can be very expensive. That’s a reason why you have to be careful about how you work with your temporary tables.

How does partitioning affect the statistics created by SQL Server?

Partitioning within SQL Server is defined through the creation of a clustered index. This clustered index has a full set of statistics based on all the data in the partition. If we’re talking about very large partitions and very large amounts of data, then there’s a good chance that the set of statistics may not be terribly accurate for the queries within your system. Oh, it will help you determine which partition to use very accurately, but within the partition you may still be seeing scans where a seek is possible. In order to help the optimizer, it’s a very good idea to create a set of manual statistics on each partition. This ensures that the data distribution of the partition is available to the optimizer to help it with making a good choice of query plan when executing your queries. For some additional details, read this overview from the SQL Server Customer Advisory Team (SQLCAT).

What kind of statistics are provided for SQL Server through a linked server?

None. The data on a linked server will have whatever statistics are provided by the database you’re connecting to through the linked server, on that server, but it won’t pass any statistics back to your system. If you need statistics on the data in the linked server then you’ll need to load that data into a table or temporary table (not a table variable) and create indexes and/or statistics on that table.

Can statistics be imported/exported?

Yes. If you look at the “Generate and Publish Scripts” wizard for a database it is possible to set up a situation where you not only script out the database, but the statistics that define that database as well. Within SQL Server Management Studio (SSMS), right click on the database in question and select “Tasks” from the context menu and then “Generate Scripts” from the sub-menu. This will launch the wizard. You can choose the objects you’re interested in and then click Next until you get to the “Set Scripting Options” step. Here, you want to click the ‘Advanced’ button. Scroll down and you’ll find the option “Script Statistics.” You can select to script out just the base statistics or include the histogram as you can see selected below.
You can then output the scripts. All the objects you selected are then generated out to a script which you can use to create a new database. If you take a look at the script, you can see one of two commands, either an UPDATE STATISTICS command for objects like indexes where a set of statistics are automatically created, or CREATE STATISTICS for sets of statistics that you created manually or were created automatically for you by SQL Server. Each of these has an additional option defined WITH STATS_STREAM and then a binary value:
While SQL Server can generate this binary information, you can’t. If you look for the documentation forSTATS_STREAM you won’t find it. Obviously, you can use this feature since it’s supplied by Microsoft. You can even generate your own STATS_STREAM value by using DBCC SHOW_STATISTICS WITH STATS_STREAM. But the documentation there reminds us: Not supported. Future compatibility is not guaranteed. So exercise caution when using this.

Conclusion

Dealing with statistics is definitely one of the more frustrating tasks of maintaining your SQL Server instances: However, as you can see, you have quite a large number of options which will enable you to get your statistics optimized on your server. Just remember that just as your data is not always generic, there is no generic solution for the maintenance of statistics. You’ll need to conform the solution you use to the data and the queries that are running on your system.