Tuesday, November 3, 2015

Missing Date Ranges- the SQL and dates between missing period

https://www.simple-talk.com/sql/t-sql-programming/missing-date-ranges--the-sequel/

Some time ago, I published the article "Find Missing Date Ranges in SQL": . In that article, I showed how to find missing dates, using different SQL techniques.
A few months later, I received a message from a Simple-Talk reader, who thought he had a similar problem, and asked me for help. Here is the reader's problem:
'There is a table, where two of the columns are DateFrom and DateTo. Both columns contain date and time values. How does one find the missing date ranges or, in other words, all the date ranges that are not covered by any of the entries in the table'.
At first glance, the reader's problem and the one on my article look similar; they both involve finding missing, or unused, date ranges. However, they are not the same. In the article, a hypothetical application registers the moments (time) of events and stores them in a database (table). In the reader's problem, some imaginary application records the beginning and the end of the processes, representing them as continuous date/time ranges.
The second task may become more complicated, if the processes can overlap each other in time.
The following diagram illustrates the reader's problem (see Fig.1):

Fig.1 The reader's problem
As shown on the diagram, there is only one missing date range: from Jan 24 to Jan 31.
Without the process that started on Jan 01 and completed on Jan 22, shown in green in the diagram, there would be three missing date ranges:
  1. Jan 04 - Jan 04;
  2. Jan 11 - Jan 19;
  3. Jan 24 - Jan 31;
Notice, that a special case, when the process started, but did not finish yet, can be treated as a process with the end time equal to the current time.
For example, if the process started on June 10 and still running on June 14 (when you execute the query), you can consider that process as a process that started on June 10 and finished on June 14. In other words, dates in the range from June 10 to June 14 are covered (exist).

Solution with Auxiliary Table

Before we explore the fist solution, let us create and load sample table (see Listing1):
IF EXISTS(SELECT FROM sysobjects 
   
WHERE ID (OBJECT_ID('dateRanges')) AND xtype 'U')  DROP TABLE dateRanges;GOCREATE TABLE dateRanges(dateFrom DATETIMEdateTo DATETIME)GO
INSERT INTO dateRanges VALUES('Jan 01, 2000','Jan 22, 2000')INSERT INTO dateRanges VALUES('Jan 01, 2000','Jan 03, 2000')INSERT INTO dateRanges VALUES('Jan 05, 2000','Jan 10, 2000')INSERT INTO dateRanges VALUES('Jan 20, 2000','Jan 23, 2000')INSERT INTO dateRanges VALUES('Feb 01, 2000','Feb 06, 2000')INSERT INTO dateRanges VALUES('Feb 02, 2000','Feb 05, 2000')INSERT INTO dateRanges VALUES('Mar 01, 2000','Mar 02, 2000')INSERT INTO dateRanges VALUES('Mar 20, 2000','Mar 23, 2000')INSERT INTO dateRanges VALUES('Mar 25, 2000','Mar 28, 2000')INSERT INTO dateRanges VALUES('Mar 29, 2000','Apr 03, 2000')INSERT INTO dateRanges VALUES('Apr 01, 2000','Apr 05, 2000')INSERT INTO dateRanges VALUES('Apr 20, 2000','Apr 23, 2000')INSERT INTO dateRanges VALUES('Apr 25, 2000','Apr 28, 2000')INSERT INTO dateRanges VALUES('May 19, 2000','May 23, 2000')INSERT INTO dateRanges VALUES('May 20, 2000','May 23, 2000')INSERT INTO dateRanges VALUES('May 24, 2000','May 25, 2000')INSERT INTO dateRanges VALUES('May 28, 2000','May 29, 2000')INSERT INTO dateRanges VALUES('May 28, 2000','Jun 10, 2000')INSERT INTO dateRanges VALUES('May 29, 2000','Jun 10, 2000')INSERT INTO dateRanges VALUES('Jun 05, 2000','Jun 15, 2000')INSERT INTO dateRanges VALUES('Jun 20, 2000','Jun 28, 2000')INSERT INTO dateRanges VALUES('Jul 02, 2000','Jul 28, 2000')
Listing1. Create and load sample table

Fig.2 The Test data
Then, using an auxiliary table you can easily find missing dates (see Listing2):
SET NOCOUNT ON;
DECLARE @minDateFrom DATETIME@maxDateTo DATETIME;SELECT @minDateFrom MIN(dateFrom),
    
@maxDateTo MAX(dateTo)
   
FROM dateRanges;
-- create auxiliary table dateSequenceIF EXISTS(SELECT FROM sysobjects 
   
WHERE ID (OBJECT_ID('dateSequence')) AND xtype 'U')  DROP TABLE dateSequenceCREATE TABLE dateSequence(dt DATETIME NOT NULL PRIMARY KEY);
-- load table dateSequenceWITH dates AS SELECT @minDateFrom AS initDate UNION ALL SELECT DATEADD(dd1initDateFROM dates WHERE initDate <@maxDateToINSERT INTO dateSequence SELECT initDate FROM dates  OPTION (MAXRECURSION 0);
-- get missing datesSELECT CONVERT(VARCHAR(20),dt107AS missingDates
   
FROM dateSequence t1
   
WHERE NOT EXISTS
     (
SELECT FROM dateRanges t2
       
WHERE dt BETWEEN t2.dateFrom AND t2.dateTo)
Results:

missingDates
--------------------
Jan 24, 2000
Jan 25, 2000
Jan 26, 2000
Jan 27, 2000
Jan 28, 2000
Jan 29, 2000
Jan 30, 2000
Jan 31, 2000
Feb 07, 2000
Feb 08, 2000
. . . . . . .

Feb 28, 2000
Feb 29, 2000
Mar 03, 2000
Mar 04, 2000
. . . . . . .

Mar 18, 2000
Mar 19, 2000
Mar 24, 2000
Apr 06, 2000
Apr 07, 2000
. . . . . . .

Apr 18, 2000
Apr 19, 2000
Apr 24, 2000
Apr 29, 2000
Apr 30, 2000
May 01, 2000
May 02, 2000
. . . . . . .

May 17, 2000
May 18, 2000
May 26, 2000
May 27, 2000
Jun 16, 2000
Jun 17, 2000
Jun 18, 2000
Jun 19, 2000
Jun 29, 2000
Jun 30, 2000
Jul 01, 2000
Listing2. Find missing dates, using an auxiliary table
The same approach can be used, if you need to find missing hours, minutes or seconds. For example, for missing minutes, you need to make a small change in DATEADD() function in common table expression (see Listing3):
SET NOCOUNT ON;
DECLARE @minDateFrom DATETIME@maxDateTo DATETIME;SELECT @minDateFrom MIN(dateFrom),
       
@maxDateTo MAX(dateToFROM dateRanges;
-- create table dateSequenceIF EXISTS(SELECT FROM sysobjects 
   
WHERE ID (OBJECT_ID('dateSequence')) AND xtype 'U')  DROP TABLE dateSequenceCREATE TABLE dateSequence(dt DATETIME NOT NULL PRIMARY KEY);
-- load table dateSequenceWITH dates AS SELECT @minDateFrom AS initDate UNION ALL SELECT DATEADD(mi1initDateFROM dates
       
WHERE initDate <@maxDateToINSERT INTO dateSequence SELECT initDate FROM dates  OPTION (MAXRECURSION 0);
-- get missing datesSELECT dt AS missingDates
   
FROM dateSequence t1
   
WHERE NOT EXISTS
     (
SELECT FROM dateRanges t2
       
WHERE dt BETWEEN t2.dateFrom AND t2.dateTo)
Listing3. Solution for missing minutes with auxiliary table
However, the solutions, as shown in  Listing2 and Listing3, are not the answers for the reader's problem. This will  require finding the missing, or unused, date ranges, but not missing, or unused,  dates.
One possible way to solve that problem is to transform already found missing dates into missing date ranges.
In order to test that solution, you will need to:
  1. Create and load an auxiliary table dateSequence as shown in the Listing2
  2. Run following query (see Listing 4)
WITH datesCTE(missingDatesAS  (SELECT missingDates FROM
   
(SELECT dt AS missingDates FROM dateSequence t1
     
WHERE NOT EXISTS (SELECT FROM dateRanges t2
   
WHERE dt BETWEEN t2.dateFrom AND t2.dateTo)) tbl1SELECT CONVERT(VARCHAR(20), t1.missingDates107missingFrom,
       
CONVERT(VARCHAR(20), MIN(t2.missingDates), 107missingTo FROM (SELECT missingDates FROM datesCTE tbl1
   
WHERE NOT EXISTS(SELECT FROM datesCTE tbl2 WHERE DATEDIFF(ddtbl2.missingDatestbl1.missingDates1)) t1INNER JOIN (SELECT missingDates FROM datesCTE tbl1
    
WHERE NOT EXISTS(SELECT FROM datesCTE tbl2 WHERE DATEDIFF(ddtbl1.missingDatestbl2.missingDates1)) t2ON DATEDIFF(ddt1.missingDatest2.missingDates) >= 0GROUP BY t1.missingDatesORDER BY t1.missingDates

Results:

missingFrom missingTo
------------ ------------
Jan 24, 2000 Jan 31, 2000
Feb 07, 2000 Feb 29, 2000
Mar 03, 2000 Mar 19, 2000
Mar 24, 2000 Mar 24, 2000
Apr 06, 2000 Apr 19, 2000
Apr 24, 2000 Apr 24, 2000
Apr 29, 2000 May 18, 2000
May 26, 2000 May 27, 2000
Jun 16, 2000 Jun 19, 2000
Jun 29, 2000 Jul 01, 2000 
Listing4. Find missing date ranges, using an auxiliary table
There is, however, at least one way to solve the reader's problem in one query and without an auxiliary table.

The 'Missing Date Ranges' Puzzle

Try to find the solution for the reader's problem. Consider it as a puzzle with the following requirements:
  • You have a sample table dateRanges with two date/time columns dateFrom and dateTo (see Listing1)
  • You need to find the missing date ranges (that were not used in any rows), where the format of result and the result should be the same as in the Listing4.
  • The solution should be one-query solution without an auxiliary table.
  • It should work in SQL Server 2005 and in SQL Server 2000.
The best solution will be awarded with a $50 Amazon voucher.
I will post my solution as comments in a couple of weeks.
Alex Kozak
Author profile: Alex Kozak
Alex Kozak is a Senior DBA/Analyst working for SAP Canada. He has more than 15 years of database and programming experience. Microsoft has included some of his articles in the MSDN Library.
Rate this article:   Avg rating:  from a total of 12 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject:Solution
Posted by:George Palacean (view profile)
Posted on:Friday, June 20, 2008 at 3:28 AM
Message:Hi Alex,
each interval will represent the start of a gap, unless there is another interval to start before this one's end and finish after (in this case the latter interval is a potential gap starter).
The end of a gap is the first interval start.

Expressing this into SQL, the solution to find the missing dates is:
select convert (varchar (15), DateTo, 107) as missingFrom,
  convert (varchar (15), NextDateFrom - 1, 107) as missingTofrom (
  select distinct DateFrom, DateTo + 1 as DateTo, 
     ( select MIN (dateFrom) 
        from dateranges 
        where DateTo &amp;gt; D.DateTo
     ) as NextDateFrom
  from dateranges D) Xwhere DateTo < NextDateFrom
To find the missing date ranges (including minutes, seconds or milliseconds):
select DateTo as missingFrom,
  NextDateFrom as missingTofrom (
  select distinct DateFrom, DateTo as DateTo, 
     (select MIN (dateFrom) 
        from dateranges 
        where DateTo &amp;gt; D.DateTo
     ) as NextDateFrom
  from dateranges D) Xwhere DateTo < NextDateFrom



Subject:A rather complex solution
Posted by:Phil Factor (view profile)
Posted on:Friday, June 20, 2008 at 6:54 AM
Message:This is a bit awkward, as it is really designed to use a self-join on a temporary table, or table variable.... but it gives the same results as yours. I have an awful feeling that it could be done a lot more simply.
SELECT 
 CONVERT(VARCHAR(20), DATEADD(DAY, 1, c.[to]), 107), 
 CONVERT(VARCHAR(20), DATEADD(DAY, -1, MIN(d.[from])), 107)
 FROM
 (SELECT  [from], [to] = MAX([to])
        FROM
           (SELECT [from] = MIN(b.dateFrom), [to] = MAX(b.DateTo)
              FROM    dateranges a INNER JOIN dateranges b
                  ON a.dateFrom BETWEEN b.datefrom AND b.DateTo 
          OR a.dateTo BETWEEN b.datefrom AND b.DateTo
              GROUP BY a.datefrom
           ) f
        GROUP BY [from]--get the widest range from every start-date
      ) c 
   CROSS JOIN 
     (SELECT [from], [to] = MAX([to])
         FROM    
            (SELECT [from] = MIN(b.dateFrom), [to] = MAX(b.DateTo)
                FROM    dateranges a INNER JOIN dateranges b
                     ON a.dateFrom BETWEEN b.datefrom AND b.DateTo 
                         OR a.dateTo BETWEEN b.datefrom AND b.DateTo
                GROUP BY a.datefrom
              ) f
          GROUP BY [from] )dWHERE d.[from] > c.[to]GROUP BY c.[to]HAVING DATEDIFF( DAY, c.[to], MIN(d.[from]))>1

Subject:Interesting
Posted by:Alex Kuznetsov (view profile)
Posted on:Friday, June 20, 2008 at 11:25 AM
Message:In my experience, a cursor or a while loop may actually perform much better - and may even take less time to develop!

Subject:re: interesting
Posted by:Phil Factor (view profile)
Posted on:Friday, June 20, 2008 at 11:33 AM
Message:Why not supply a solution and we can do some timings. I'll soon get a million-row sample table using you-know-what. The race for the fastest-running solution is on!

Subject:on Monday
Posted by:Alex Kozak (view profile)
Posted on:Friday, June 20, 2008 at 11:42 AM
Message:Sorry guys, I will be out of town on the weekends and I don’t have time to check the solutions today.
I will answer on Monday.

Subject:re:Solution
Posted by:Alex Kozak (view profile)
Posted on:Monday, June 23, 2008 at 3:13 PM
Message:Hi George,

Very neat solution, I like it.

Subject:re:Interesting
Posted by:Alex Kozak (view profile)
Posted on:Monday, June 23, 2008 at 9:24 PM
Message:I might have made a mistake asking to solve the reader’s problem in one query.
I probably needed to widen the range of SQL Server tools and languages that could be used.
However, even in that case, I personally would not use the cursors.
I do not like them and I try to avoid them.

Saying all that, I will not decline a simple and fast solution with the cursors or while loop(s). You can try.

Subject:Generating sample data
Posted by:Alex Kozak (view profile)
Posted on:Monday, June 23, 2008 at 9:31 PM
Message:I actually tried to use “you-know-what” to generate some more data for testing.

I found that for my particular case that tool is not helpful (though it is a great data generator).

The main problem is that “you-know-what” does not understand any logic (unless I miss something).

For instance, I would like to generate dates for two columns, where:

- dateTo will be greater than dateFrom;
- Dates in both columns will be random, but I should be able to set the range, where I want dates to be generated.
For instance, I would like dates to be generated in the range 1998 - 2000, but not in 1956 – 1978 as it does now.

Subject:Solution
Posted by:Srinivasan R (not signed in)
Posted on:Wednesday, June 25, 2008 at 7:07 AM
Message:

SELECT
    DateTo + 1,
    (
        SELECT
            MIN (DateFrom) - 1
        FROM
            DateRanges D4
        WHERE
            D4.DateTo &amp;gt; D1.DateTo
        AND
            D4.DateFrom &amp;gt; D1.DateTo
    )FROM
    DateRanges D1WHERE
    NOT EXISTS (SELECT * FROM DateRanges D2 
   WHERE D1.DateTo + 1 BETWEEN D2.DateFrom AND D2.DateTo) 
     AND DateTo &amp;lt; (SELECT MAX (DateTo) FROM DateRanges)

Subject:Solution
Posted by:Saharafrog (not signed in)
Posted on:Wednesday, June 25, 2008 at 2:28 PM
Message:Clumsy, cumbersome, and blatantly ignores that it should work in 2000:
;WITHL0 AS (SELECT 1 AS C UNION ALL SELECT 1),L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),Q1 AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L4),Q2 AS (SELECT L.N FROM dateRanges dR
   JOIN Q1 L ON L.N 
          BETWEEN CAST(dR.dateFrom AS INT) 
          AND CAST(dR.dateTo AS INT)),Q3 AS (SELECT y.N FROM Q1 y
   LEFT JOIN Q2 z ON y.N = z.N
   WHERE z.N IS NULL
   AND y.N BETWEEN 
                 (SELECT MIN(CAST(dateFrom AS INT)) 
                    FROM dateRanges)
          AND 
                  (SELECT MAX(CAST(dateTo AS INT)) 
                    FROM dateRanges)),Q4 AS (SELECT f.N Whither, e.N Thither, e.N-f.N diff
   FROM (
       SELECT a.N
       FROM Q3 a
       LEFT JOIN Q3 b ON a.N = b.N-1
       WHERE b.N IS NULL
   ) e
   JOIN (
       SELECT c.N
       FROM Q3 c
       LEFT JOIN Q3 d ON c.N = d.N+1
       WHERE d.N IS NULL
   ) f ON 1=1
   WHERE e.N-f.N >=0),Q5 AS (SELECT Whither, 
             Thither, 
             diff, 
             ROW_NUMBER() OVER(PARTITION 
                BY Whither ORDER BY diff) RN FROM Q4
       )SELECT CONVERT(VARCHAR(12), CONVERT(DATETIME, Whither), 107)             AS missingFrom, 
       CONVERT(VARCHAR(12), CONVERT(DATETIME, Whither), 107)             AS missingToFROM Q5WHERE RN = 1ORDER BY Whither

Subject:Can you break this?
Posted by:puzsol (view profile)
Posted on:Wednesday, June 25, 2008 at 7:31 PM
Message:Ok, so I'm a 2005 guy, but someone could do a translation for 2000... I have an uneasy feeling that this might break somehow, but so far I can't figure out how... can anyone else?
SELECT 
   CONVERT(VARCHAR(20), mf.missDateFrom, 107), 
   CONVERT(VARCHAR(20), mt.missDateTo, 107) FROM (
  SELECT 
  row_number() OVER (ORDER BY dateFrom) - 1 AS idx, 
  DATEADD(DAY, -1, dateFrom) AS missDateTo 
  FROM dateRanges) AS mtINNER JOIN (
  SELECT 
  row_number() OVER (ORDER BY dateTo) AS idx,
  DATEADD(DAY, 1, dateTo) AS missDateFrom
  FROM dateRanges) AS mf ON mf.idx = mt.idxWHERE mf.missDateFrom <= mt.missDateTo

Subject:How to generate the data in SQL Data Generator
Posted by:Lionel (view profile)
Posted on:Thursday, June 26, 2008 at 9:18 AM
Message:Hi Alex

You should be able to get SQL Data Generator(SDG) to generate the date although I guess we have not made it as obvious to do as we should have.

If you click on the dateFrom column in SDG and set the range to be "min max" you should be able to set the min date to be '1/1/1998' and the max date to be '1/1/2007'. You should now get the correctdate in the dateFrom column.

Next to always make sure the dateTo column is greater than dateFrom set the range drop down to be 'Offset from column'. If you now select the dateFrom column in the column dropdown the value generated into the dateTo column will be the date in the dateFrom column plus a randomly chosen offset in the range specified. The defaults are between 0 and 7 days.

Hope that explains it.

Lionel

Subject:Thanks for the puzzle!
Posted by:Umfriend (not signed in)
Posted on:Thursday, June 26, 2008 at 9:47 AM
Message:Uhm, stricly 2005 here.

Hope the pasting works. Nice puzzle!
SELECT DISTINCT * FROM (
   SELECT        CONVERT(VARCHAR(20), DATEADD(d, 1, t1.dateto), 107) AS MissingFrom,
      CONVERT(VARCHAR(20), DATEADD(d, -1, t2.datefrom), 107) AS MissingTO
   FROM    DateRanges t1
   INNER JOIN DateRanges t2
   ON      DATEDIFF(d, t1.DateTo, t2.DateFrom) > 1
   AND     NOT EXISTS (SELECT * FROM DateRanges t3 
                       WHERE (t3.DateFrom < t2.DateFrom 
                       AND t3.DateTo > t1.DateTo))
                       ) AS t


Subject:re: How to generate the data in SQL Data Generator
Posted by:Alex Kozak (view profile)
Posted on:Thursday, June 26, 2008 at 12:32 PM
Message:Hi Lionel,

Thanks a lot - it works!

I knew I miss something.
As always, I was in a hurry and didn’t read the help.
In addition, the UI was so obvious and straightforward… :)

Great product!

Alex

Subject:re: Can you break this?
Posted by:Alex Kozak (view profile)
Posted on:Thursday, June 26, 2008 at 2:32 PM
Message:Here's the problem, puzsol
--Truncate table dateRanges and insert the next three rows:
INSERT INTO dateRanges VALUES('Apr 01, 2000' ,'Apr 06, 2000')INSERT INTO dateRanges VALUES('Apr 02, 2000' ,'Apr 04, 2000')INSERT INTO dateRanges VALUES('Apr 08, 2000' ,'Apr 13, 2000')
--, your query will retrieve the correct result

Apr 07, 2000 Apr 07, 2000

--Truncate table dateRanges again and insert the next three rows:
INSERT INTO dateRanges VALUES('Apr 01 2000 00:00:01','Apr 06 2000 19:36:00')INSERT INTO dateRanges VALUES('Apr 02 2000 00:00:02','Apr 04 2000 19:30:00')INSERT INTO dateRanges VALUES('Apr 08 2000 00:00:04','Apr 13 2000 19:00:00')
--This time the query doesn’t retrieve any result, even though 
--Apr 07 is still missing.

Subject:Can't break it yet...
Posted by:puzsol (view profile)
Posted on:Friday, June 27, 2008 at 1:14 AM
Message:Ok, So I didn't cater for times (which the problem is a bit vague on, unless I missed something)... but there's still nothing wrong with the method as such, just the location of the math....

This version works for the dates entered:
SELECT CONVERT(VARCHAR(20), mf.missDateFrom, 107) AS dateFrom, CONVERT(VARCHAR(20), DATEADD(DAY, -1, mt.missDateTo), 107) dateToFROM (
  SELECT row_number() OVER (ORDER BY dateFrom) -1 AS idx
  , dateFrom AS missDateTo 
  FROM dateRanges) AS mtINNER JOIN (
  SELECT row_number() OVER (ORDER BY dateTo) AS idx
  , DATEADD(DAY, 1, dateTo) AS missDateFrom 
  FROM dateRanges) AS mf ON mf.idx = mt.idxWHERE mf.missDateFrom < mt.missDateTo

The other way you could do it is to strip the time - but that would probably take a bit longer as it would have to be done before the comparison.... I'm still searching for a pattern of dates that would break it...

Has anyone performed any time comparisons on a large database yet? I'm wondering if the sub-query approach is faster than the merging of two datasets.

Subject:Can't try it
Posted by:Umfriend (not signed in)
Posted on:Friday, June 27, 2008 at 7:38 AM
Message:I'm elsewhere today so I won;t be able to check whether my solution above works on the data with times but I strongly suspect it will.

Stripping time would be easy, I find DATEADD(d, DATEDIFF(d, 0, datefrom), 0) an elegant solution, and if it is implemented in the earlier solution at lines 8 and 15 then it would work, e.g. Line 8:
DATEADD(DAY, -1, DATEDIFF(d, 0, datefrom), 0)) AS missDateTo

Subject:my solution
Posted by:Ian Blackburn (not signed in)
Posted on:Wednesday, July 2, 2008 at 10:20 AM
Message:Here is my solution which upon inspection, is the same solution as R. Srinivasan. I could not understand how he managed do to it without using a DISTINCT clause but I discovered his solution will return duplicate values if you add the following record
INSERT INTO dateRanges VALUES('May 22, 2000','May 25, 2000')
SELECT DISTINCT 
      (SELECT DATEADD(MS,3,MAX(dateTo)) 
          FROM dateRanges c 
          WHERE c.dateTo < a.dateFrom) AS missingFrom,
       DATEADD(MS,-3,a.dateFrom
       ) AS missingToFROM   dateRanges aWHERE  NOT EXISTS  
       (SELECT * FROM dateRanges b 
            WHERE DATEADD(MS,-3,a.dateFrom) BETWEEN b.dateFrom AND b.dateTo)
             AND a.dateFrom >(SELECT MIN(dateFrom) FROM dateRanges
       )

Subject:Displaying date ranges
Posted by:Mats G Friberg (not signed in)
Posted on:Thursday, July 3, 2008 at 2:22 AM
Message:I wonder what tool you used to display the date-ranges so neatly. :-)

Subject:re: Displaying Date Ranges
Posted by:Andrew Clarke (view profile)
Posted on:Thursday, July 3, 2008 at 2:58 AM
Message:We don't normally like giving away our trade secrets but in this case....
.. We just used Microsoft Project. It is great for this sort of diagram.

Subject:re: Displaying Date Ranges
Posted by:Ben Lloyd (view profile)
Posted on:Thursday, July 3, 2008 at 3:33 AM
Message:Here's my answer, using ANSI style joins

SELECT DATEADD( DAY, 1, a.dateTo ) missingDateFrom,
       DATEADD( DAY, -1, MIN( c.dateFrom ) ) missingDateToFROM   DateRanges a
       LEFT JOIN DateRanges b 
         ON  DATEADD( DAY, 1, a.dateTo ) BETWEEN b.dateFrom AND b.dateTo
       LEFT JOIN DateRanges c
         ON DATEADD( DAY, 1, a.dateTo ) < c.dateFromWHERE  b.dateTo IS NULL AND c.DateFrom IS NOT NULLGROUP BY DATEADD( DAY, 1, a.dateTo )
If the source data used something more granular than days, just change the DateAdd parameters slightly, e.g.

SELECT DATEADD( ms, 3, a.dateTo ) missingDateFrom,
       DATEADD( ms, -3, MIN( c.dateFrom ) ) missingDateToFROM   DateRanges a
       LEFT JOIN DateRanges b 
         ON  DATEADD( ms, 3, a.dateTo ) BETWEEN b.dateFrom AND b.dateTo
       LEFT JOIN DateRanges c
         ON DATEADD( ms, 3, a.dateTo ) < c.dateFromWHERE  b.dateTo IS NULL AND c.DateFrom IS NOT NULLGROUP BY DATEADD( ms, 3, a.dateTo )

Subject:Missing Date Ranges
Posted by:Mike Toole (view profile)
Posted on:Friday, July 4, 2008 at 8:38 AM
Message:select Convert(Varchar(12),MissingFrom,107) MissingFrom,
(Select top 1 Convert(Varchar(12),MissingTo,107) MissingTo From
(select distinct x.dateFrom - 1 MissingTo
from dateranges x
Where not exists (select * from dateranges y where x.dateFrom - 1 between y.datefrom and y.dateto)
and DateFrom > (select Min(datefrom) from dateranges)) a
Where MissingTo > MissingFrom
) MissingTo
From
(select distinct x.dateTo + 1 MissingFrom
from dateranges x
Where not exists (select * from dateranges y where x.dateTo + 1 between y.datefrom and y.dateto)
And DateTo < (Select Max(Dateto) From dateranges)
) b

Subject:Missing Date Ranges
Posted by:Mike Toole (view profile)
Posted on:Friday, July 4, 2008 at 9:38 AM
Message:select Convert(Varchar(12),MissingFrom,107) MissingFrom,
(Select top 1 Convert(Varchar(12),MissingTo,107) MissingTo From
(select distinct x.dateFrom - 1 MissingTo
from dateranges x
Where not exists (select * from dateranges y where x.dateFrom - 1 between y.datefrom and y.dateto)
and DateFrom > (select Min(datefrom) from dateranges)) a
Where MissingTo > MissingFrom
) MissingTo
From
(select distinct x.dateTo + 1 MissingFrom
from dateranges x
Where not exists (select * from dateranges y where x.dateTo + 1 between y.datefrom and y.dateto)
And DateTo < (Select Max(Dateto) From dateranges)
) b

Subject:missing dates excluding weekend
Posted by:Anonymous (not signed in)
Posted on:Monday, July 7, 2008 at 11:41 PM
Message:Responses
hi

i have a problem.
i have a table named IOData.Fields are
name ,id,date ,gatename.
i have to find absent names,dates excluding weekends for all employees for a month group by name. anyone pls help me

regards
sarika

Subject:excluding weekends
Posted by:Anonymous (not signed in)
Posted on:Monday, July 7, 2008 at 11:49 PM
Message:i tried the query below.but not working properly.i mean for one person for one month it is working.when consider all employees for one month it is not working.

Select distinct Name,convert(varchar,dateadd(day,number+1,'6/1/2008'),104) as Date
from
master.dbo.spt_values,IOData
where
((datepart(dw,IODate) not in (7,1))and master.dbo.spt_values.type='p' and dateadd(day,number+1,'6/1/2008')<='6/30/2008'and dateadd(day,number+1,'6/1/2008') not in (select IODate from IOData where((IODate BETWEEN '6/1/2008'AND '6/30/2008')and((GateName='main entry(I/O)')OR (GateName='MAINENTRY 1 FLOOR(I/O)')) )) )order by Date


pls give me one solution


thanx & regards

Subject:Solution
Posted by:Shilpa M (not signed in)
Posted on:Tuesday, July 8, 2008 at 5:01 AM
Message:Extracted the Next Date after each of the DateTo values. Determined which of these did not fall within any of the date ranges(would be the MissingFrom Values). For these, determined the Minimum DateFrom value in DateRanges that is greater than the MissingFrom value. 1 less than this would provide the MissingTo value for the corresponding MissingFrom value.

SQL:
SELECT
CONVERT(VARCHAR(20),MissingFrom,107),
CONVERT(VARCHAR(20),DATEADD(dd,-1,MIN(DateFrom)),107) MissingTo
FROM
(
SELECT
DateToNext MissingFrom
FROM
(SELECT DATEADD(dd,1,DateTo) DateToNext
From DateRanges)t
WHERE NOT Exists
(Select 1 from DateRanges where
t.DateToNext between DateFrom and
DateTo)
)t1,
DateRanges
where DateFrom>MissingFrom
Group By MissingFrom

Subject:Solution
Posted by:Shilpa M (not signed in)
Posted on:Tuesday, July 8, 2008 at 6:10 AM
Message:Extracted the Next Date after each of the DateTo values. Determined which of these did not fall within any of the date ranges(would be the MissingFrom Values). For these, determined the Minimum DateFrom value in DateRanges that is greater than the MissingFrom value. 1 less than this would provide the MissingTo value for the corresponding MissingFrom value.

SQL:
SELECT
CONVERT(VARCHAR(20),MissingFrom,107),
CONVERT(VARCHAR(20),DATEADD(dd,-1,MIN(DateFrom)),107) MissingTo
FROM
(
SELECT
DateToNext MissingFrom
FROM
(SELECT DATEADD(dd,1,DateTo) DateToNext
From DateRanges)t
WHERE NOT Exists
(Select 1 from DateRanges where
t.DateToNext between DateFrom and
DateTo)
)t1,
DateRanges
where DateFrom>MissingFrom
Group By MissingFrom

Subject:About the solutions
Posted by:Alex Kozak (view profile)
Posted on:Thursday, July 10, 2008 at 1:14 AM
Message:OK, it is time to summarize…

First, I was surprised and glad to see so many different solutions.

However, even though the solutions were different, everybody who tried to break that puzzle had to solve a couple of smaller problems:

1) You needed to come with the idea (algorithm).
For instance, it is obvious that the beginning of the gap will be dateTo plus one time unit (ms or sec. or min. or hour or day)
Similarly, only dateFrom minus one time unit can be the end of the gap.

It is also obvious, that in the pair dateTo + 1 and dateFrom – 1, which is identifying the gap, dateTo should be less than dateFrom at least by two time units (in the case of only one time unit, we have a covered cluster of two adjacent date ranges).

That last postulate can be clearly seen in the solution of umfriend:
. . . . . . . . . . . . . . . . .
SELECT CONVERT(VARCHAR(20), DATEADD(d, 1, t1.dateto), 107) AS MissingFrom,
CONVERT(VARCHAR(20), DATEADD(d, -1, t2.datefrom), 107) AS MissingTO
FROM DateRanges t1 INNER JOIN DateRanges t2
ON DATEDIFF(d, t1.DateTo, t2.DateFrom) > 1
. . . . . . . . . . . . . . . . . .
Doing that, he identified all possible gaps and then checked if the existing date ranges can be found within those gaps-candidates.

The solution is interesting, but quite heavy.

2) Many of you tried to find the beginning (the end) of the gaps first.
For example, you might have noticed that the beginning of the gap dateTo + 1 cannot be within any existing date range.
Following this or similar logic, you could get all the beginnings (ends) of the gaps.
Then, in order to find the matching parts (ends or beginnings) of the gaps, you needed to locate the nearest date range (because pair dateTo + 1 and dateFrom – 1 comes from the adjacent covered date ranges).
Since, identifying the adjacent ranges (rows) without ordering numbers is quite difficult, you needed to use joins, correlated sub-quires and grouping.
All that made your solutions heavier.

Trying to solve that last problem, some of you came up with another idea.

Puzsol, for example assigned numeric index (order number) to each existing date range. That helped him to solve the problem of adjacent rows in a simple and elegant manner and made whole query much faster.

However, his solution needs to be corrected in order to work properly with date and time values.
In addition, he uses the ROW_NUMBER() function that doesn’t work in SQL Server versions older than 2005.
(Yes, the order numbers can be produced in the same one query, using ANSI SQL, but that will slow down the overall performance of the query.)

One more solution that looks very different from any other is the solution of Saharafrog.

In the beginning, he found all missing days for dateRanges table (I implemented the same idea, using an auxiliary table.)
As a result, he got the islands of missing days.
Next, he got all possible pairs "beginning of the island" – "end of the island" and calculated the difference between the members of each pair (in days).

After that, he did a nice trick. Using a ROW_NUMBER() function, he partitioned the set of "beginning of island" – "end of island" pairs by "beginning" and ordered by difference.
Then, the first in order (lowest in partition) difference, pointed to the correct value of the partition's end (the "beginning" values are the same within the partition).

This is interesting solution; very unusual; masterpiece of CTE virtuoso. :)

However, the solution is very heavy, nonflexible and looks complicated.(and can be used in SQL Server 2005 only).

I may have to split my comments into two parts, because it takes too much space.
The second part will follow.

Subject:In my defence
Posted by:puzsol (view profile)
Posted on:Saturday, July 12, 2008 at 8:17 AM
Message:The query (minus the date-add-subrtract) can find any gap down to the minute... but then it wouldn't have matched the date solution originally specified (it found two extra time ranges). As I mentioned before, I felt that the criteria for when a day started/finished, or what was considered to be a gap was not well specified. So I did what I had to to match the original expected output. (I'm eagerly waiting for the second posting....)

I'd still like to know if there is a better way to 'zip' two results together... ie row 1 from query 1 with row 1 from query 2 etc... I'm sure there would be another use for it... I mean if I query from table a, and table b you get the cross product... can you join on a blank? What I wanted was the result for the ordered row 1 of table a, listed next to the ordered row 1 of table b. Can you do this without having any key to join them (without using a row_number() function)?

Subject:re: in my defence
Posted by:Alex Kozak (view profile)
Posted on:Sunday, July 13, 2008 at 11:46 PM
Message:There is no magic, puzsol.
If you don’t have the indexes (order numbers) that specify the row position or order, you will need to compare the rows one by one using correlated subqueries or (and) joins or (and) GROUP BY clause.

Your idea allows comparing the adjacent rows without heavy iterations and that makes your solution the fastest…but again, you used the ROW_NUMBER() (and solution should work in SQL Server 2000 and 2005).

You asked whether it possible to order the rows in the table without having any key or without using a ROW_NUMBER() function.

Yes, it is possible. Here’s the example:

SELECT CONVERT(VARCHAR(20), mf.missDateFrom, 107) AS dateFrom
, CONVERT(VARCHAR(20), DATEADD(DAY, -1, mt.missDateTo), 107) dateTo
FROM
(
SELECT COUNT(G2.dateFrom) - 1as idx, G2.dateFrom AS missDateTo
FROM
(SELECT MIN(dateFrom) as dateFrom, dateTo
FROM (SELECT dateFrom, MAX(dateTo) dateTo
FROM dateRanges GROUP BY dateFrom) t
GROUP BY dateTo) AS G1
INNER JOIN
(SELECT MIN(dateFrom) as dateFrom, dateTo
FROM (SELECT dateFrom, MAX(dateTo) dateTo
FROM dateRanges GROUP BY dateFrom) t
GROUP BY dateTo) AS G2
ON G1.dateFrom <= G2.dateFrom
GROUP BY G2.dateFrom, G2.dateTo
) AS mt
INNER JOIN
(
SELECT COUNT(G2.dateTo) as idx, DATEADD(dd,1,G2.dateTo) AS missDateFrom
FROM (SELECT MIN(dateFrom) as dateFrom, dateTo
FROM (SELECT dateFrom, MAX(dateTo) dateTo
FROM dateRanges GROUP BY dateFrom) t
GROUP BY dateTo) AS G1
INNER JOIN
(SELECT MIN(dateFrom) as dateFrom, dateTo
FROM (SELECT dateFrom, MAX(dateTo) dateTo
FROM dateRanges GROUP BY dateFrom) t
GROUP BY dateTo) AS G2
ON G1.dateTo <= G2.dateTo
GROUP BY G2.dateFrom, G2.dateTo
) AS mf ON mf.idx = mt.idx
WHERE mf.missDateFrom < mt.missDateTo


The solution first, eliminates the duplicates in dateFrom and dateTo columns.
Second, it builds the order numbers for dateFrom and dateTo columns exactly as the ROW_NUMBER() function does.
Finally, it implements the logic from your solution.

You can see that without having a ROW_NUMBER() function, the solution looks ugly and complicated, but that is not the main problem.
The problem is that your idea, being implemented without ROW_NUMBER(), loses the power.
All the performance gains will be spent now to produce the order numbers.

However, the solution is still quite fast (one of the fastest among the valid solutions for the puzzle).

Subject:About the solutions (Part 2)
Posted by:Alex Kozak (view profile)
Posted on:Wednesday, July 16, 2008 at 12:08 AM
Message:My solution is similar to the solution of Shilpa M.

The idea of that solution is to find the beginning of the gaps (dateTo + 1) that lie outside of any existing date range. In addition, the beginning of the gap cannot be immediately followed by the beginning of the date range.

The purpose of JOIN is to locate the closest dateFrom (which is greater than dateTo) for each dateTo, found in the first step.

There are two variants of the solution:

Variant #1:

SELECT CONVERT(varchar(20), t1.col1, 107) AS startOfMissingRange,
CONVERT(varchar(20), MIN(t2.col1), 107) AS endOfMissingRange
FROM
(SELECT col1 = DATEADD(dd, 1, dateTo) FROM dateRanges tbl1
WHERE NOT EXISTS
(SELECT * FROM dateRanges tbl2
WHERE DATEDIFF(dd, tbl1.dateTo, tbl2.dateFrom) = 1
OR (tbl1.dateTo <> tbl2.dateTo
AND tbl1.dateTo BETWEEN tbl2.dateFrom AND tbl2.dateTo)
)) t1
INNER JOIN
(SELECT col1 = DATEADD(ss, -1, dateFrom) FROM dateRanges tbl1) t2
ON t1.col1 <= t2.col1
GROUP BY t1.col1;

Variant #2

SELECT CONVERT(varchar(20), t1.col1, 120) AS startOfMissingRange,
CONVERT(varchar(20), MIN(t2.col1), 120) AS endOfMissingRange
FROM
(SELECT col1 = DATEADD(ss, 1, dateTo) FROM dateRanges tbl1
WHERE NOT EXISTS (SELECT * FROM dateRanges tbl2
WHERE tbl1.dateTo <> tbl2.dateTo
AND tbl1.dateTo BETWEEN tbl2.dateFrom AND tbl2.dateTo
)) t1
INNER JOIN
(SELECT col1 = DATEADD(ss, -1, dateFrom) FROM dateRanges tbl1) t2
ON DATEDIFF(ss, t1.col1, t2.col1) > 1
GROUP BY t1.col1;

Both variants perform pretty well.

Here are a few words about my performance tests.

The fastest solution for SQL Server 2005 is the solution of puzsol.

The fastest solution for both versions of SQL Server is the solution of George Palacean.

The slow solutions are the solutions of Umfriend, Ben Lloyd and Saharafrog.
All other solutions with small differences perform somewhere between the fastest and the slowest solutions (closer to fastest).

Finally, I should admit that all the solutions are based on interesting idea or (and) have interesting implementation.

Thank you to everybody who contributed.

The winner is to be announced.

Subject:Go 2005
Posted by:puzsol (view profile)
Posted on:Wednesday, July 23, 2008 at 6:21 PM
Message:All I can say is.... Go SQL Server 2005!

To me this shows just how powerful, and useful the row_number function really is... makes you wonder why it took so long to get something like that into the implementation...

Thanks Alex for the (ugly) SQL 2000 version - makes me appreciate I don't have to go back to the dark ages (hahaha).

My vote would be for George:
a) George was first
b) George's solution is fast, clean, elegant and works the best in the environments it was supposed to.
c) I'm disqualified due to the constraints of the problem - I knew that when I entered the solution.

Once again, thanks for the fun!

Puzsol

Subject:SQL 2005
Posted by:puzsol (view profile)
Posted on:Wednesday, July 23, 2008 at 6:22 PM
Message:You do know that Microsoft have also released SQL 2008 don't you? (lol)

Subject:re: SQL 2005
Posted by:Alex Kozak (view profile)
Posted on:Wednesday, July 23, 2008 at 11:38 PM
Message:I’m glad that we both like George’s solution, Puzsol.
As you said, his solution is clean, elegant and fast.
(You should also agree that his solution would work in any version of any RDBMS.)

I can regret about the constraints of the problem, but I cannot change anything now.

If I choose the SQL 2005 solution as the best solution, it wouldn’t be fair to other guys, who also could come with SQL 2005 solutions, but didn’t do that, because of the constraints of the problem. (And I still like George’s solution the most.)

Yes, I know about SQL 2008.

Subject:re: About the solutions (Part 2)
Posted by:Shilpa M (not signed in)
Posted on:Sunday, August 3, 2008 at 9:35 AM
Message:Hi Alex,

Good to know my solution is similar to the one suggested by you. Thanks for the puzzle!

Subject:How to...
Posted by:Jeff Moden (view profile)
Posted on:Wednesday, December 31, 2008 at 6:35 PM
Message:Hi Alex,

I've gotta ask... how do you make those nice code blocks on this forum? I know how to do the colorizing and the indenting, but it all seems to go away when I try to paste it to comments like this...

Subject:Help with added twist to puzzle in slighly different context.
Posted by:mszlazak (view profile)
Posted on:Wednesday, January 28, 2009 at 11:27 PM
Message:I found this page after posting to the related article that proceeded it. I a novice to SQL but have adapted George Palacean's solution to an existing MS Access "Appointments" table that is part of software I use. This table has three fields that define appointment intervals: [Occur Date], [Time] and [Length]. So an appointments start time is [Occur Date]+[Time] and it's ending is [Occur Date]+[Time]+[Length]. Appointment overlap occurs and Georges solution seems to work in finding all openings except those not in the table. For instance, between the opening of business day and the first appointment, the last appointment and close of business, or some given time during business hours and the next appointment. Would adding these be best done by modifying George's approach or should script be used? The end result would be a solution that "finds the next available appointment." Here is how I did what George did:

SELECT
[End Date] AS [Opening Start],
[Next Start Date] AS [Opening End],
DATEDIFF('n', [End Date], [Next Start Date]) AS [Opening Length]
FROM (
SELECT DISTINCT
[Start Date],
([Occur Date]+[Time]+[Length]) AS [End Date],
(SELECT MIN([Occur Date]+[Time]) AS [Start Date]
FROM
Appointments
WHERE
([Occur Date]+[Time]+[Length]) > (A.[Occur Date]+A.[Time]+A.[Length])
AND
[Occur Date] = A.[Occur Date]
) AS [Next Start Date]
FROM Appointments AS A
)
WHERE
[End Date] < [Next Start Date]

No comments:

Post a Comment