Print Friendly

Yesterday, I posted a relatively well-known table-value function called fnSplit() for Microsoft SQL Server that splits delimited data from a single field into multiple data rows. Today, we will improve on that function by including an index number column and retaining null entries within the array.

Our Example

First, let’s take a look at the data that we’re going to split. Our CUSTOMERS table contains two arrays – FIRST_NAMES and LAST_NAMES – that are “position specific.” Basically, the first FIRST_NAME value needs to be matched to the first LAST_NAME value and so forth.

IDFIRST_NAMESLAST_NAMES
1Bill,David,JamesAlbertson,Jones,Smythe
2Cyndi,Sarah,Vanessa,JessicaBlanchard,Howards,,Williamson

Looking at the table, we see that Vanessa’s last name is missing. We will need to be able to manage “missing” values as indexes. Neither index-positioning nor missing values can be appropriately handled with the function that we explored yesterday, so we’ll make some modifications.

The Original fnSplit() Function

Let’s look at the original fnSplit() function.

CREATE FUNCTION [dbo].[fnSplit](
    @sInputList VARCHAR(MAX) -- List of delimited items
  , @sDelimiter VARCHAR(MAX) = ',' -- delimiter that separates items
) RETURNS @List TABLE (item VARCHAR(MAX))
 
BEGIN
	DECLARE @sItem VARCHAR(MAX)
	WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
	BEGIN
		SELECT
		@sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,
			CHARINDEX(@sDelimiter,@sInputList,0)-1))),
		@sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,
			CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),
			LEN(@sInputList))))
 
		IF LEN(@sItem) > 0
		INSERT INTO @List SELECT @sItem
	END
 
	IF LEN(@sInputList) > 0
	INSERT INTO @List SELECT @sInputList -- Put the last item in
	RETURN
END

We can then CROSS APPLY our fnSplit() function within a SELECT query of our CUSTOMERS table, but we can only reference one field at a time this way.

SELECT C.ID, F.ITEM
FROM CUSTOMERS C
CROSS APPLY dbo.fnSplit(FIRST_NAMES, ',') F

Our result looks like the following:

IDITEM
1Bill
1David
1James
2Cyndi
2Sarah
2Vanessa
2Jessica

If we run the same query and substitute the LAST_NAMES fields, the problem becomes apparent.

IDITEM
1Albertson
1Jones
1Smythe
2Blanchard
2Howards
2Williamson

Whereas seven rows were returned in the FIRST_NAMES query, only six are returned in the LAST_NAMES query. We can surmise that the “missing” name is from the second row since only three rows were returned from that row. But which first name is missing a last name? With our existing function, there is no systematic way to tell, and there is no way to join the two queries reliably.

The Enhanced fnSplit() Function

To fix this problem, we need only make some revision to our fnSplit() function. We need to ensure that null entries are retained, and we need a position index number that we can use in joins.

ALTER FUNCTION [dbo].[fnSplit](
    @sInputList NVARCHAR(MAX) -- List of delimited items
  , @sDelimiter NVARCHAR(MAX) = ',' -- delimiter that separates items
) RETURNS @List TABLE (ARRAY_IDX INT, ITEM NVARCHAR(MAX))
 
BEGIN
	DECLARE @sItem NVARCHAR(MAX)
	DECLARE @sIndex INT
	SET @sIndex = 1
	IF CHARINDEX(@sDelimiter,@sInputList,0) > 0
	BEGIN
		WHILE CHARINDEX(@sDelimiter,@sInputList,0) > 0
		BEGIN
			SELECT
			@sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,
				CHARINDEX(@sDelimiter,@sInputList,0)-1))),
			@sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,
				CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),
				LEN(@sInputList))))
 
			IF LEN(@sItem) > 0
				INSERT INTO @List SELECT @sIndex, @sItem
			ELSE
				INSERT INTO @List SELECT @sIndex, NULL
 
			SET @sIndex = @sIndex + 1
		END
 
		IF LEN(@sInputList) > 0
			INSERT INTO @List SELECT @sIndex, @sInputList
	END
	ELSE
		INSERT INTO @List SELECT 1, NULL
 
	RETURN
END

Here, we have made three changes. First, we modified the IF condition that checks for string length and added an ELSE clause to inject a NULL value if the string length is equal to zero. Second, we added an integer counter variable called @sIndex that increases with each iteration of the WHILE loop block. This index will be returned as a column called ARRAY_IDX. Finally, if there are no valid entries at all, we create a NULL table with a single row. This ensures that we will always get an index of at least 1, even if the value is NULL.

Now, we execute the following query:

SELECT C.ID, F.ARRAY_IDX, F.ITEM
FROM CUSTOMERS C
CROSS APPLY dbo.fnSplit(FIRST_NAMES, ',') F

And our result should look like this:

IDARRAY_IDXITEM
11Bill
12David
13James
21Cyndi
22Sarah
23Vanessa
24Jessica

And, when we substitute the LAST_NAMES field, the result appears like this:

IDARRAY_IDXITEM
11Albertson
12Jones
13Smythe
21Blanchard
22Howards
23NULL
24Williamson

We now have a retained NULL value in the third position of the second array. We also have an array index field (position identifier) that we can use to build a join.

SELECT C.ID, F.ITEM AS FIRST_NAME, L.ITEM AS LAST_NAME
FROM CUSTOMERS C
CROSS APPLY dbo.fnSplit(C.FIRST_NAMES, ',') F
CROSS APPLY dbo.fnSplit(C.LAST_NAMES, ',') L
WHERE F.ARRAY_IDX = L.ARRAY_IDX

And achieve the following result:

IDFIRST_NAMELAST_NAME
1BillAlbertson
1DavidJones
1JamesSmythe
2CyndiBlanchard
2SarahHowards
2VanessaNULL
2JessicaWilliamson

It is now apparent that someone (or some process) neglected to record Vanessa’s last name. Additionally, we have joined two array fields based on their index positions.

Some Thoughts and Challenges

This technique works well if both arrays are ensured to contain the same number of elements. If you are faced with joining arrays that contain different numbers of elements but still wish to retain correlation between indexes, you could contain each CROSS APPLY clause into a sub-SELECT and use a LEFT OUTER JOIN to accomplish this. It may also be beneficial to place common CROSS APPLY queries into a VIEW so you aren’t burdened with remembering the syntax and wish to avoid a potential for multiple, nested sub-SELECT statements.

Conclusion

In adding position indexing to the fnSplit() function, you have a powerful function that can be used to manage all sorts of delimited data types. I have replaced fnSplit() on virtually every SQL Server database that I manage primarily for the added functionality. The additional field can always be left out of the SELECT statement when it’s not needed. I hope that you will experience similar benefits to this function as I have.

Related posts:

  1. Query HP Service Manager Arrays in SQL Server