Process each row and copy it to a new table using C #

I have an MSSQL 2008 table with several million records. I need to iterate over each row, change some data, and copy the updated record to a new table using a C # app that runs every day.

I tried to do it using ADO.NET entities, but there are memory issues associated with this method, not to mention it is very slow. I have read Bulk Copy Libraries and SQL-only methods for copying one table to another, but none of them require modifying records before copying them. I need to find the best way to accomplish this operation.

+3


source to share


4 answers


As you mentioned memory issues, my guess is that you are trying to load millions of rows into memory, process them, and then write back to the database. You can avoid this by "streaming" the data rather than downloading it entirely. SqlDataReader

will handle the buffering for you, so on the read side, you can do a simple loop WHILE

that fetches the lines one by one. The actual transformation that you already have it seems like all you have to do is take care of getting the results back to the database. IMHO the fastest way to do this is to store a buffer with multiple results in the data table (start at 100, work around and see where the sweet spot is) and then insert that data table into the database withSqlBulkCopy

class. Rinse and repeat.



PS: Sounds like a "funny" problem. Do you have any sample data sitting around somewhere to test this? 5 hours sounds like a LONG time for something that seems trivial at first, then again 20 million times almost nothing adds up. Specifically, I'm wondering how "big" data relates to the RTF side: are ca 2k values ​​on average, or more precisely 200k? And what equipment do you use?

+3


source


The fastest option is to rewrite your C # application logic into a CLR stored procedure so that all processing is done on the server.



+3


source


Checking on the internet looks like Microsoft's official answer for converting rich text to plain text is loading the data into the RichTextBox control and then pulling it out using the RichTextBox.Text property. There are many reasons for this, but mostly because it means you have to get your hands dirty. Your best bet is to write a small application that calls the RichTextBox control and transfers all of your data to / from the database (using the SqlDataReader should ease the memory issues you mentioned).

In the process - I would suggest creating a staging table that will flush your "cleaned up" data rows before adding them to your production table. After you get the stored proc, you can make a trigger that automatically calls your stored proc every time a record is added to your dirty table. This will ultimately save you the trouble of running your program every day to move records, as the trigger will be sure to do this on the fly.

Change is one last thought

It occurred to me that you might not like writing stored procedures and triggers, which is fine. A more "programmatic" solution would be to churn out all the files in your messy table into a delimited text file that can be easily loaded and parsed. Once you have a text file, you can manipulate it with your application (read it, clean it up, create a cleaned file .. what you have) and then upload it to your database for reading. Depending on your comfort level / background / skill level, this might be the best solution to get the job done.

Hope this helps!

+1


source


Use SSIS . Schedule a daily job that does your transformation and runs the SSIS package. This will take care of dosing and memory consumption and offer multiple fast connectors for reading and writing data . You can embed your own C # code (RTF split to plain text) as an SSIS component, see Developing Custom Objects for Integration Services .

+1


source







All Articles