How to find out which line SSIS failed to parse data from flat file
I have a flat file from which I am importing data into a DB using SSIS. When SSIS is down, it simply reports that the specific column could not be imported. But to be more precise, I want to know on which line in the flat file the error occurred so that I can know where to look in the flat file.
Example: Consider a flat file that has the following columns Name, Age, Date. Let's say the file has 100 lines. But if SSIS was unable to say 80 on a specific line while processing the Date. I am getting the error:
"Component" Derived Column "(19)" failed because error code 0xC0049063 occurred and the location of the error line in the "output column" DATE (33) "indicates an error on error. By this I can understand that the date of the column has some a non-numeric value. But how do you know which line SSIS failed on (in this case it's line: 88)
I have to know this because I have large files, so I cannot figure out where the error occurred when parsing.
Can anyone give me a call, which is 19 in "Derived Column" (19) "and which is 33 in" DATE "(33)" that I got the error.
source to share
This is a short description of what we do. First, we have stock tables for all of our imports, which include the rowid. We have two of them, one with raw data and one with cleaned data. We then have an exception table that records the name of the file being processed, the data, the rowid of the row that was sent to the exception file, the reason for the exception, and the client ID generated for the record if there is one (we usually require one, but it could be wrong for you). Your needs may differ from what you put in the table. The first data stream is the one that transfers data to the processing table after all cleanup steps have been completed.
Now, after each step that may have exceptions to pull, we put the next on the failure path (red arrow - green, going out of the task)), a derived column task to get the file name and exceptionreason information we need and data, and then the target join with exceptions table to actually insert data.
Finally, we have the task of executing SQl after the data stream to determine if there were too many exceptions or exceptions that should kill the whole process. Only after it has gone through this step do we make another data stream to insert into prod tables.
Now what do you get with all this complexity? At first, you can easily see the differences between cleaned and original data if there is a data problem after loading the data. This tells us that the data that was sent was wrong (99% of timne is, but we have to prove it to the client), or if we cleared it, it was wrong. We then know exactly what things have gone wrong and we can easily create a list for our data provider about the bad data they need to fix. Finally, we almost never have to roll back the load to prod because we made all the fixes before we went to prod. And the actual load on prod is faster (our product is on a different server than our data server) because it doesn't need to do cleanup steps at this point. Yes,general imports may take longer, but the part that actually has the ability to affect our customers takes less time.
We also have control over what we do when something fails. For the most part (with the exception of one specific type of import) we use the percentage (agreed with the client) of failed entries to determine if the process has completed. Thus, 4 bad records in the millionth record file will not stop the process, but 100,000 will. And we have a few things that are showstoppers, where even one bad entry is a reason to terminate the process. This gives us the freedom to determine on a case-by-case basis what we want to use to stop the process.
source to share