Yesterday, I posted a relatively well-known table-value function called fnSplit() for Microsoft SQL Server that splits delimited data from a single field into multiple data rows. Today, we will improve on that function by including an index number column and retaining null entries within the array.
Our Example
First, let’s take a look at the data that we’re going to split. Our CUSTOMERS table contains two arrays – FIRST_NAMES and LAST_NAMES – that are “position specific.” Basically, the first FIRST_NAME value needs to be matched to the first LAST_NAME value and so forth.
| ID | FIRST_NAMES | LAST_NAMES |
|---|---|---|
| 1 | Bill,David,James | Albertson,Jones,Smythe |
| 2 | Cyndi,Sarah,Vanessa,Jessica | Blanchard,Howards,,Williamson |
Looking at the table, we see that Vanessa’s last name is missing. We will need to be able to manage “missing” values as indexes. Neither index-positioning nor missing values can be appropriately handled with the function that we explored yesterday, so we’ll make some modifications.
The Original fnSplit() Function
Let’s look at the original fnSplit() function.
CREATE FUNCTION [dbo].[fnSplit](
@sInputList VARCHAR(MAX) -- List of delimited items
, @sDelimiter VARCHAR(MAX) = ',' -- delimiter that separates items
) RETURNS @List TABLE (item VARCHAR(MAX))
BEGIN
DECLARE @sItem VARCHAR(MAX)
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
BEGIN
SELECT
@sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,
CHARINDEX(@sDelimiter,@sInputList,0)-1))),
@sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,
CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),
LEN(@sInputList))))
IF LEN(@sItem) > 0
INSERT INTO @List SELECT @sItem
END
IF LEN(@sInputList) > 0
INSERT INTO @List SELECT @sInputList -- Put the last item in
RETURN
ENDWe can then CROSS APPLY our fnSplit() function within a SELECT query of our CUSTOMERS table, but we can only reference one field at a time this way.
SELECT C.ID, F.ITEM FROM CUSTOMERS C CROSS APPLY dbo.fnSplit(FIRST_NAMES, ',') F
Our result looks like the following:
| ID | ITEM |
|---|---|
| 1 | Bill |
| 1 | David |
| 1 | James |
| 2 | Cyndi |
| 2 | Sarah |
| 2 | Vanessa |
| 2 | Jessica |
If we run the same query and substitute the LAST_NAMES fields, the problem becomes apparent.
| ID | ITEM |
|---|---|
| 1 | Albertson |
| 1 | Jones |
| 1 | Smythe |
| 2 | Blanchard |
| 2 | Howards |
| 2 | Williamson |
Whereas seven rows were returned in the FIRST_NAMES query, only six are returned in the LAST_NAMES query. We can surmise that the “missing” name is from the second row since only three rows were returned from that row. But which first name is missing a last name? With our existing function, there is no systematic way to tell, and there is no way to join the two queries reliably.
The Enhanced fnSplit() Function
To fix this problem, we need only make some revision to our fnSplit() function. We need to ensure that null entries are retained, and we need a position index number that we can use in joins.
ALTER FUNCTION [dbo].[fnSplit](
@sInputList NVARCHAR(MAX) -- List of delimited items
, @sDelimiter NVARCHAR(MAX) = ',' -- delimiter that separates items
) RETURNS @List TABLE (ARRAY_IDX INT, ITEM NVARCHAR(MAX))
BEGIN
DECLARE @sItem NVARCHAR(MAX)
DECLARE @sIndex INT
SET @sIndex = 1
IF CHARINDEX(@sDelimiter,@sInputList,0) > 0
BEGIN
WHILE CHARINDEX(@sDelimiter,@sInputList,0) > 0
BEGIN
SELECT
@sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,
CHARINDEX(@sDelimiter,@sInputList,0)-1))),
@sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,
CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),
LEN(@sInputList))))
IF LEN(@sItem) > 0
INSERT INTO @List SELECT @sIndex, @sItem
ELSE
INSERT INTO @List SELECT @sIndex, NULL
SET @sIndex = @sIndex + 1
END
IF LEN(@sInputList) > 0
INSERT INTO @List SELECT @sIndex, @sInputList
END
ELSE
INSERT INTO @List SELECT 1, NULL
RETURN
ENDHere, we have made three changes. First, we modified the IF condition that checks for string length and added an ELSE clause to inject a NULL value if the string length is equal to zero. Second, we added an integer counter variable called @sIndex that increases with each iteration of the WHILE loop block. This index will be returned as a column called ARRAY_IDX. Finally, if there are no valid entries at all, we create a NULL table with a single row. This ensures that we will always get an index of at least 1, even if the value is NULL.
Now, we execute the following query:
SELECT C.ID, F.ARRAY_IDX, F.ITEM FROM CUSTOMERS C CROSS APPLY dbo.fnSplit(FIRST_NAMES, ',') F
And our result should look like this:
| ID | ARRAY_IDX | ITEM |
|---|---|---|
| 1 | 1 | Bill |
| 1 | 2 | David |
| 1 | 3 | James |
| 2 | 1 | Cyndi |
| 2 | 2 | Sarah |
| 2 | 3 | Vanessa |
| 2 | 4 | Jessica |
And, when we substitute the LAST_NAMES field, the result appears like this:
| ID | ARRAY_IDX | ITEM |
|---|---|---|
| 1 | 1 | Albertson |
| 1 | 2 | Jones |
| 1 | 3 | Smythe |
| 2 | 1 | Blanchard |
| 2 | 2 | Howards |
| 2 | 3 | NULL |
| 2 | 4 | Williamson |
We now have a retained NULL value in the third position of the second array. We also have an array index field (position identifier) that we can use to build a join.
SELECT C.ID, F.ITEM AS FIRST_NAME, L.ITEM AS LAST_NAME FROM CUSTOMERS C CROSS APPLY dbo.fnSplit(C.FIRST_NAMES, ',') F CROSS APPLY dbo.fnSplit(C.LAST_NAMES, ',') L WHERE F.ARRAY_IDX = L.ARRAY_IDX
And achieve the following result:
| ID | FIRST_NAME | LAST_NAME |
|---|---|---|
| 1 | Bill | Albertson |
| 1 | David | Jones |
| 1 | James | Smythe |
| 2 | Cyndi | Blanchard |
| 2 | Sarah | Howards |
| 2 | Vanessa | NULL |
| 2 | Jessica | Williamson |
It is now apparent that someone (or some process) neglected to record Vanessa’s last name. Additionally, we have joined two array fields based on their index positions.
Some Thoughts and Challenges
This technique works well if both arrays are ensured to contain the same number of elements. If you are faced with joining arrays that contain different numbers of elements but still wish to retain correlation between indexes, you could contain each CROSS APPLY clause into a sub-SELECT and use a LEFT OUTER JOIN to accomplish this. It may also be beneficial to place common CROSS APPLY queries into a VIEW so you aren’t burdened with remembering the syntax and wish to avoid a potential for multiple, nested sub-SELECT statements.
Conclusion
In adding position indexing to the fnSplit() function, you have a powerful function that can be used to manage all sorts of delimited data types. I have replaced fnSplit() on virtually every SQL Server database that I manage primarily for the added functionality. The additional field can always be left out of the SELECT statement when it’s not needed. I hope that you will experience similar benefits to this function as I have.
Related posts:
