<img src="https://track.adform.net/Serving/TrackPoint/?pm=268420" width="1" height="1" alt="">

Change data capture in SSIS SQL Server 2012

6 min. læsetid
5. februar 2012 Skrevet af: inspari Del med en ven     

SQL Server 2012 gives us some new SSIS components that makes reading from the CDC tables created by SQL Server a lot easier.

Before this release we had to do all the house keeping our self but now almost everything is done out of the box.

There are three new components that you need to know. The first one is the CDC Source with reads

from the CDC tables and return data like any other data source component. The second one is the CDC Splitter it's basically a conditional split hard coded to split on the CDC type (__$operation). It will return three data flows containing rows for insert, delete or update. The last component is the CDC Control Task it’s the housekeeper. The CDC Control Task keep track on which rows to process.

Working with incremental loads always requires two SSIS packages. One handling the first initial load and one handling the future incremental loads.

Hands on:

First create a SSIS package for the initial load. Insert a CDC Control Task, a Data Flow Task and again a CDC Control Task.

ssis

Open the first CDC Control Task and set it up like this.

  1. Select or create a connection to the database containing the CDC tables.
  2. Select “Mark initial load start” in the CDC control operation.
  3. Create a new variable by pressing the “New”-button. It will be called CDC_State unless you change it.
  4. Select or create a connection to the database where you what the housekeeping table to be stored. This table is holding an string for tracking the rows to process.
  5. Press the “New”-button to create the cdc_state table.
  6. State name holds the key-value for this CDC flow. If have more than one CDC table, include ex. the name of the table.

The result should be something like this.

init-begin

Now open the last CDC Control Task and set it up like the first one except from the CDC control operation. Now select “Mark initial load end”.

The dataflow for testing purpose could be like this.

ssis-2

Read everything from the CDC source table and write it to a new table in the stage or data warehouse database.

The second SSIS package begins like the first one. Insert a CDC Control Task, a Data Flow Task and again a CDC Control Task.

Configure the first CDC Control Task like the other ones except for the CDC control operation. Now select “Get processing range”.

Configure the last CDC Control Task like the other ones except for the CDC control operation. Now select “Mark processing range”.

Insert a CDC Source, a CDC Splitter and three Row Counts in the Data Flow

Setup the CDC Source like this.

  1. The connection in the CDC Source is the connection to the database from where you want to read.
  2. CDC enabled table is the table containing the data you want to read.
  3. Select “Net” in the CDC processing mode. It will only give us the newest row if it has been updated more than one time.
  4. Select the same variable as in the CDC control task.

Connect the CDC Source and the CDC Splitter. One each of the three outputs from the CDC splitter connect one of the row counts.

The result should be like this:

DF

To see it working run the packages like this.

  1. Run the Init package. All the data from the source table should be copied.
  2. Run the Incr. package. No rows should be copied.
  3. Run an update, a delete and an insert on the source table.
  4. Run the Incr. package again. Now there should be one row in each row count.