In SSIS, how do I get the number of rows returned from the source that MUST be processed

I am working on a project to add registration to our SSIS packages. I am doing my own custom log by implementing some event handlers. I have implemented an OnInformation event to write the time, source name and message to a log file. When data moves from one table to another, the OnInformation event will give me a message like:

component "TABLENAME" (1) "wrote 87 lines.

If one of the lines fails and lets say that out of the expected 87, only 85 lines were processed. I would guess the above line would read wrote 85 rows

. How to keep track of how many rows should be processed in this case? I would like to see something like wrote 85 of 87 rows

. Basically, I think I need to know how to get the number of rows returned from the original query. Is there an easy way to do this?

thank

+3


source to share


5 answers


You can use Row Count transaformation

after the data source and store it in a variable. This will be the number of lines to process. After it is loaded into the Destination, you must use Execute SQL Task

in Control flow

and use Select Count(*) from <<DestinationTable>>

and store the score in the Other variable. [You must use the Where clause in your request to identify the current load]. This way you will have numeric strings processed for logging.



Hope this helps!

+6


source


Not enough space in comments to provide feedback. Posting an incomplete answer as I need to leave for the day.

You will have trouble doing what you ask for. Based on your comments in Gowdhaman008's answer, the value of the variable is not visible outside the data stream until after the finalizer event (OnPostExecute, I think). You can spoof and retrieve this data by using a script task to count rows and trigger events, custom or predefined, to report a batch of reports. In fact, just capture the event OnPipelineRowsSent

. This will record how many lines pass through a particular moment and the time surrounding it. SSIS performance framework Also, you don't need to do any special work or maintenance on your stuff. Out of the box, functionality is a definite win.

However, you don't really know how many lines are coming out of the source before it finishes. It sounds silly and I totally agree, but it's true. Imagine a simple case - an OLE DB source that sends 1,000,000 rows directly to an OLE DB destination. Most likely not all 1M lines are going to start in the pipeline, maybe only 10k will be in the first buffer. These buffers are placed at the destination and now you know that 10k lines of 10k lines have been processed. Flush, rinse, repeat several times and in this buffer, the string is NULL where it shouldn't. The boom is dynamite and the process fails. We had 60k lines going into the pipeline and everything we know is due to the crash.

The only way to ensure that we have accounted for all the original lines is asynchronously converting to a pipeline to block all downstream components until all data has been received. This will destroy any chance of getting good performance out of your packages. You still have to adhere to the aforementioned restrictions on updating variables, but your FireXEvent message accurately describes how many lines could have been processed in the queue.

If you started an explicit transaction, you might do something ugly like the Execute SQL Task to get the expected counter, write that to a variable, and then process the log lines, but then you query your data twice and you increase the chance of blocking by the original system due to the double pump. And that will only work for something like a database. The same concept applies to a flat file, except that you need a script task to read all lines first.



If it gets uglier, for a slow original data source like a web service. The default buffer size can cause the entire packet to run much longer than it needs to because we expect the data to arrive Slow start

What would i do

I would record the start and number of errors (and more) using a line counter. This will help you keep track of all the data that came in and where it went. Then I turned on the event OnPipelineRowsSent

so that I can query the log and see how many lines are going through it. RIGHT NOW.

enter image description here

+4


source


What you want is Convert number of strings . Just add this to the datastream after the original request and assign its output to a variable. Then you can write this variable to your log file.

+2


source


Here's what I'm doing now. It's super boring, but it works.

1) SSIS Method

2) I have a constant value "1" for all records. They are literally all the same.

3) Using the multicast step, I send the data stream in two directions. Despite everything the same, we still have to sort by this constant value.

4) Use an aggregation step to aggregate on this constant and then resort to it to combine with the bottom data stream (it contains all the actual data records without aggregation).

Doing this allows me to have my initial row count.

  1. Later, as shown below, a conditional split step is used and repeat the same after applying your condition. If the number of lines is the same, everything is fine and no problem.

If the number of lines does not match, then something is wrong.

Checking for Success

This is a general idea for an approach to solving your problem without having to use another data flow step.

TL; DR:

Get the row count for one of the conditions using multicast, sorting by some constant, and aggregation step.

Do a sort and combine to get the row count.

Use conditional splitting and do it again.

If the number of records before and after the line is the same, do this.

If the number of lines before and after recording does not match, do so.

0


source


This MAY help if you have a column that has no bad data. Add a second flat file to the package. Use the same connection as your existing file. Select only the first column and direct the output to row count.

-2


source







All Articles