There are many techniques for doing quick, full loads of data with Microsoft SQL Server Integration Services (SSIS) for a reporting environment. This one is the quick change artist of the bunch. Our goal is to create a mechanism for doing a full data table load with little to no interruption to production reports that may rely on that table’s data. We’re going to quick load 30,000 or more records of data in the blink of an eye using a very simple, if commonly overlooked, technique.
The Method at a Glance
The beauty of the technique is simplicity in itself. We will load the data via a quick, three-step process.
- Create a staging table with the same structure of the destination table.
- Load the data into the staging table.
- Drop the destination table and rename the staging table to the destination table name.
The Basic SSIS Project
Launch Microsoft SQL Server Business Intelligence Development Studio (BIDS) and create your SSIS project. Create your package, and configure your source and destination Connection Managers.
Our Control Flow will consist of three steps as outlined above. We’ll first need an Execute SQL Task followed by a Data Flow Task and finally another Execute SQL Task.
The sources for both Execute SQL Tasks will be your destination Connection Manager.
Create the Staging Table
You can create the required staging script relatively quickly using SQL Server Management Studio (SSMS) as follows:
1. Connect to the database where the destination table is stored.
2. Locate the destination table in the Object Explorer view.
3. Right-click the destination table and select Script table as: DROP and CREATE to: New Query Editor Window. Be careful not to execute your script at this point! It is best to be connected as a user with read-only permission to the database to ensure that this script is not accidently executed.
4. Select Edit: Find and Replace: Quick Replace.
5. In the Find what text box, type [dbo].[<table name>], and in the Replace with box, type [dbo].[STAGE_<tablename>]. For example, if the table is called History, enter [dbo].[History] and [dbo].[STAGE_History] respectively. This will become our load table.
6. Click the Replace All button.
7. If you have any indexes or primary key constraints, you will need to rename these as well in the creation script. A SQL Server database may not contain two indexes with the same logical name. In reality, you should keep only the primary key constraint. Other indexes can be removed from the script altogether to potentially improve load performance.
The script that now exists in your Query Editor Window is the T-SQL script for your first Execute SQL Task. It checks for the existence of the STAGE table, performs a DROP if it exists, and then CREATEs a new STAGE table.
Execute the task and ensure that the STAGE table is created in the destination database. It will need to exist for the next step in our process.
Populating the Staging Table
Next, open the Data Flow Task control. This is where you will load data from the source table to the destination STAGE table. Using the appropriate Data Flow Source, Transformations and Data Flow Destination, build the ETL that you would normally use to populate the destination table. For this example, a very direct load with no transformations is used. You can use any necessary transformation controls for your particular situation.
When the script is executed in sequence, your STAGE table is guaranteed to be empty due to the first Execute SQL Task. In the absence of indexing, this should be a relatively fast load if the data is reasonable. Note that this method may not be the best method for the full load of a table with very wide rows (lots of TEXT or IMAGE type fields).
Return to the Control Flow and execute this task to ensure that the data can load into the STAGE table as expected.
In order to prevent Validation errors prior to the creation of the STAGE table, set the DelayValidation property of the Data Flow Task to True.
The Quick Change
The last step in the process is the quick change technique. This can be scripted into a single T-SQL script and saved into the final Execute SQL Task.
First, script any indexes from the destination table (with the exception of the primary key destination which was created along with the STAGE table above) as CREATE statements via SQL Server Management Studio (SSMS). Script these to new Query Editor windows.
Second, script the destination table (not the STAGE table) as DROP to a new Query Editor window. The DROP statement includes an existing table check.
Now compose a single T-SQL script as follows:
1. In the DROP statement’s Query Editor window, after the GO following the IF…DROP… statement, enter EXEC sp_rename ‘STAGE_<tablename>’, ‘<tablename>’ (include the single quotes). Follow this by a GO statement. This calls the sp_rename stored procedure which allows you to rename a table or other database object. Example: EXEC sp_rename ‘STAGE_History’, ‘History’
2. Next, enter EXEC sp_rename ‘<name you gave the STAGE primary key>’, ‘<name of the real primary key>’ and a GO command. Example: EXEC sp_rename ‘PK_STAGE_History’, ‘PK_History’
3. After this, paste the CREATE INDEX scripts from above in sequence.
Basically, you are writing a script that recreates the entire destination table including all indexes. Below is an example script created from the ReportServer.History table from a default Microsoft SQL Server Reporting Services (SSRS) database. This example assumes that the STAGE table is named STAGE_History and that the primary key on the STAGE table is called PK_STAGE_History.
USE [ReportServer]
GO/****** Object: Table [dbo].[History] Script Date: 02/28/2011 23:34:29 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N’[dbo].[History]‘) AND type in (N’U'))
DROP TABLE [dbo].[History]
GOEXEC sp_rename ‘STAGE_History’, ‘History’
GOEXEC sp_rename ‘PK_STAGE_History’, ‘PK_History’
GOCREATE UNIQUE CLUSTERED INDEX [IX_History] ON [dbo].[History]
(
[ReportID] ASC,
[SnapshotDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GOCREATE NONCLUSTERED INDEX [IX_SnapshotDataID] ON [dbo].[History]
(
[SnapshotDataID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Since we have tested population of the STAGE table step, it should be relatively safe to execute this script for testing purposes. Please ensure that you have appropriate backups and are using a development environment until you are certain that the script will work properly!
If the test was successful, paste this script into the final Execute SQL Task.
Executing the Package
When you execute the package, you are building a STAGE table separate from the active reporting/production table and populating it. Only during the final Execute SQL Task – which should execute very quickly – is there any table locking or potential for a failed query. Since the scripts check the existence of tables prior to drops and the STAGE table validation in the Data Flow Task is delayed, the package execution should be relatively bullet-proof.
Conclusion
Provided you have the data storage to support it, this is a vastly superior alternative to a TRUNCATE and INSERT method. I have multiple tables that utilize this method in various production environments and have not yet experienced any significant issues from using it. If you decide to load multiple tables in the same script in this way, I encourage you to use Sequence Containers if for no other reason than to group each flow together and gain some control over memory utilization.
Before using any full load method, you need to decide how best to load data for your particular table. Do you require periodic refreshes of table data while applications and/or users are executing reports? This method may be a very reliable (and fast) option. This method is very good for master data-type tables but is not well-suited for large, transactional tables that may be very wide. In most production environments, a variety of load methods may be required.
No related posts.
