Monday, December 22, 2014

Linked lists

http://www.sqlservercentral.com/articles/T-SQL/118863/

Introduction

Imagine you want to sort a dataset by something that isn’t stored in the database. Let me give you an example to better imagine the problem: I have a table with songs (i.e. song title, artist, etc.) that has a column called Song_ID as its primary key. I want to create playlists with some songs from that table.
To do that I create another table that contains the main data for these playlists including the names of the playlists. It also has a column as its primary key which is called Playlist_ID. I could easily create a table, say Playlist_Contents that consists of Playlist_ID and Song_ID to model the many-to-many-relationship between songs and playlists. But I like the songs within each playlist to be played in a specific order but there are no criteria I could use for sorting them. Instead the sequence of the songs will be determined by some criteria unknown to the database. My ER diagram would look like this:

Suppose the tables have the following contents:
Playlists
Playlist_IDPlaylist_Name
1 All time favourites
2 Christmas songs

Songs
Song_IDSongArtist
10 My girl Temptations
20 Good vibrations Beach Boys
30 Clocks Coldplay
40 Velouria Pixies
50 Day tripper Beatles
60 Wonderwall Oasis
70 Temptation Heaven 17
80 Mercy Duffy
Now I'd like to put 6 of the songs into the playlist "My favourite songs". I want to play the songs in this order:
  1. Good vibrations - Beach Boys
  2. Day tripper - Beatles
  3. Wonderwall - Oasis
  4. Clocks - Coldplay
  5. Velouria - Pixies
  6. My girl - Temptations
I can simply insert 6 rows into Playlist_Contents but I need to store the order as well. The table Playlist_Contents has to be extended to do this.

First attempt

A straightforward solution would be to add a column to Playlist_Contents that stores the position of the song in the playlist.
Playlist_Contents with Position column
Playlist_ID Song Position
1 20 1
1 50 2
1 60 3
1 30 4
1 40 5
1 10 6
But what if you want to edit the playlist, i.e. adding, deleting or moving songs? You can easily add a song at the end of a playlist or remove the last song. All other operations lead to multiple updates, esp. if you want the position to hold contiguous numbers starting with 1 onwards:
-- add "Mercy" right after "Good vibrations:
UPDATE Playlist_Contents SET Position = Position + 1 WHERE Position >= 2
INSERT Playlist_Contents VALUES (1, 80, 2)
 
-- remove "Clocks" from the playlist:
DELETE FROM Playlist_Contents WHERE Playlist_ID = 1 AND Song_ID = 30
UPDATE Playlist_Contents SET Position = Position - 1 WHERE Position > 4

-- move "Velouria" to the beginning of the list:
UPDATE Playlist_Contents SET Position = Position + 1 WHERE Position >= 1
UPDATE Playlist_Contents SET Position = 1 WHERE Song_ID = 40
UPDATE Playlist_Contents SET Position = Position - 1 WHERE Position > 5
Maybe some playlists get quite long, so performance could become an issue. So using a position column seems to be quite inefficient.

A better solution

What about alternatives? How can we reduce the necessary modifications to a minimum? My solution was to not have a position column in the Playlist_Contents table at all. Instead I tried to use a linked list.
CREATE TABLE Playlist_Contents(
 Playlist_ID int NOT NULL,
 Song_ID  int NOT NULL,
 Predecessor int NULL,
 CONSTRAINT PK_Playlist_Contents PRIMARY KEY CLUSTERED 
(
 Playlist_ID ASC,
 Song_ID ASC
))
GO

ALTER TABLE Playlist_Contents  WITH CHECK
 ADD CONSTRAINT FK_Playlist_Contents_Self FOREIGN KEY (Playlist_ID, Predecessor)
 REFERENCES Playlist_Contents (Playlist_ID, Song_ID)

ALTER TABLE Playlist_Contents CHECK CONSTRAINT FK_Playlist_Contents_Self

CREATE UNIQUE NONCLUSTERED INDEX AK_Playlist_Contents ON Playlist_Contents
(
 Playlist_ID ASC,
 Predecessor ASC
)

ALTER TABLE Playlist_Contents  WITH CHECK
 ADD CONSTRAINT CK_Playlist_Contents_Predecessor CHECK (Song_ID!=Predecessor)

ALTER TABLE Playlist_Contents CHECK CONSTRAINT CK_Playlist_Contents_Predecessor
GO
In order to do that I created a column “Predecessor” that is designed to point to the Song_ID of the preceding song (within the same playlist, of course). The first song does not have a predecessor, so I allowed NULL for this column. Since no two songs can have the same predecessor it must be unique within each playlist. Additionally a song’s predecessor must not point to itself. The playlist from above would now look like this:
Playlist_Contents with Predecessor column
Playlist_ID Song Predecessor
1 20 NULL
1 50 20
1 60 50
1 30 60
1 40 30
1 10 40
In order to have a position again I created a view which dynamically generates the position number. It is based on a recursive CTE:
CREATE VIEW [dbo].[vPlaylist_Contents] AS
WITH List AS (
 SELECT Playlist_ID, 1 AS Position, Song_ID, Predecessor
 FROM Playlist_Contents
 WHERE Predecessor IS NULL
 UNION ALL
 SELECT p.Playlist_ID, l.Position + 1, p.Song_ID, p.Predecessor
 FROM Playlist_Contents p INNER JOIN List l
   ON p.Playlist_ID = l.Playlist_ID AND
    p.Predecessor = l.Song_ID
 )
SELECT * FROM List

GO

Modifying data in a linked list

How can the sequence of songs in a playlist be changed? If we want to insert a new song or change a song’s position in the list, we just name the predecessor of the song or put NULL in it. Of course we cannot really execute the inserts, updates and deletes as they are. So we need a trigger to perform the necessary additional modifications on the table.
This trigger will be an instead-of-trigger for all three types of data modification. The trigger has to determine the additional modifications and execute those in a single statement in order to fulfill all constraints we have on the table, i.e. the primary key, the foreign key, the unique index, and the check constraint. Since we use the view to read the data we create the trigger on the view.
CREATE TRIGGER [dbo].[tr_vPlaylist_Contents]
   ON  [dbo].[vPlaylist_Contents] 
   INSTEAD OF INSERT,DELETE,UPDATE
AS 
BEGIN
 SET NOCOUNT ON;
 WITH Modifications AS (
  SELECT Playlist_ID, Song_ID, Predecessor
  FROM inserted
  UNION ALL
  SELECT d.Playlist_ID, d.Song_ID, d.Predecessor
  FROM inserted i FULL OUTER JOIN deleted d
    ON i.Playlist_ID = d.Playlist_ID AND
     i.Song_ID = d.Song_ID
  WHERE i.Playlist_ID IS NULL
  UNION ALL
  SELECT p.Playlist_ID,
    p.Song_ID,
    i.Song_ID
  FROM Playlist_Contents p
    INNER JOIN inserted i
    ON p.Playlist_ID = i.Playlist_ID AND
     (p.Predecessor = i.Predecessor OR 
      (p.Predecessor IS NULL AND i.Predecessor IS NULL))
  UNION ALL
  SELECT p.Playlist_ID,
    p.Song_ID,
    d.Predecessor
  FROM Playlist_Contents p
    INNER JOIN deleted d
    ON p.Playlist_ID = d.Playlist_ID AND
     p.Predecessor = d.Song_ID
 )
 MERGE INTO Playlist_Contents AS targettable
  USING Modifications AS sourcetable
  ON  targettable.Playlist_ID = sourcetable.Playlist_ID AND
    targettable.Song_ID = sourcetable.Song_ID
  WHEN MATCHED AND targettable.Predecessor = sourcetable.Predecessor OR
      (targettable.Predecessor IS NULL AND sourcetable.Predecessor IS NULL)
   THEN DELETE
  WHEN MATCHED AND (targettable.Predecessor != sourcetable.Predecessor) OR
      (targettable.Predecessor IS NULL AND sourcetable.Predecessor IS NOT NULL) OR
      (targettable.Predecessor IS NOT NULL AND sourcetable.Predecessor IS NULL)
   THEN UPDATE SET targettable.Predecessor = sourcetable.Predecessor
  WHEN NOT MATCHED BY TARGET
   THEN INSERT (Playlist_ID, Song_ID, Predecessor) 
   VALUES (sourcetable.Playlist_ID, sourcetable.Song_ID, sourcetable.Predecessor);
END
It uses a MERGE statement to perform all necessary modifications at once. The CTE produces all rows that have to be modified. The rows whose primary key values do not match a row in the table will be inserted. The rows which do match a row in the table but have a different predecessor value will be updated. Finally rows where all column values match a row in the table will be deleted.

Testing the solution

Finally let’s try to insert, update and delete some data in the table to test the trigger.
-- add "Mercy" right after "Good vibrations:
BEGIN TRAN
INSERT vPlaylist_Contents (Playlist_ID, Song_ID, Predecessor) VALUES (1, 80, 20)
SELECT * FROM vPlaylist_Contents
ROLLBACK
 
-- remove "Clocks" from the playlist:
BEGIN TRAN
DELETE FROM vPlaylist_Contents WHERE Playlist_ID = 1 AND Song_ID = 30
SELECT * FROM vPlaylist_Contents
ROLLBACK

-- move "Velouria" to the beginning of the list:
BEGIN TRAN
UPDATE vPlaylist_Contents SET Predecessor = NULL WHERE Song_ID = 40
SELECT * FROM vPlaylist_Contents
ROLLBACK
The difference between this solution and the first attempt is the number of rows that have to be modified. In this solution we have an upper bound of 2 for inserts, 2 for deletes and 3 for updates. These figures are independent of the number of rows in the linked list whereas the figures in our first attempt increase as the number of songs in the playlist increases.
Actually I use this construction in an application that modifies one row at a time. The trigger works fine for some statements that modify multiple rows but not for all. The solution can be extended to allow all possible multiple row modifications but that involves much more effort to make things work.

Restrictions

Apart from multiple row modifications this implementation does not allow to change the primary key values of a row, i.e. you cannot replace a song by another one or move a song from one playlist to another. If you try to do that the trigger will fail due to constraint violations or missing uniqueness within the MERGE statement. To avoid error messages like that you can simply test for changed primary key values at the beginning of the trigger. If moving songs between playlists or replacements of songs within one statement are required you could extend the table by a surrogate primary key and use this new PK in the statement inside the trigger instead.

Don't use hierarchyid for linked lists

A linked list like this can also be regarded as a degenerated tree structure that contains one branch only. I was thinking of using a hierarchyid column in the beginning but discarded this idea quickly. Using a hierarchyid column doesn't reduce the number of updates. It is about the same as with a position column. Also a hierarchy column would use much more space than a predecessor column of type int.

No comments:

Post a Comment