Updating rows in a table with many columns

I have a table (several in fact) that contains many columns (maybe 100+). Which is best for updating rows in a table if only a few columns have changed.

  • To dynamically create an UPDATE statement, only update the changed columns.
  • To create a parameterized UPDATE statement that contains all columns, including those that have not changed.
  • To create a procedure that takes ALL values ​​as parameters and updates a row.

I am using SQL Server. There are no BLOBS in the table.

Thank you, m

+2


source to share


3 answers


Options 2 and 3 require more data to be sent to the server during the update, and thus have a larger data-only communication load.

Does each row have a different set of updated columns, or does the set of columns update the same for any given run (but the list may be different from the run to run)?

In the latter case (the same set of columns updated in this run), option 1 is likely to perform better; the operator will be ready once and will be used many times with a minimum amount of data sent to the server for each update.



In the first case, I would like to see if there is a relatively small subset of columns that change (say 10 columns that change in different rows, even if any one row only changes to 3 of those 10). In this case, I would probably parameterize for 10 columns, taking on the relatively small overhead of passing column values ​​7-9 that have not been changed for the convenience of one prepared statement. If the set of updated columns is all over the map (say, more than 50 out of 100 columns are updated during the entire operation), then it is probably easier to just deal with the entire batch.

To some extent, it depends on how easily your host language (client API) allows it to handle the various possible ways to parameterize updates.

+1


source


I would say the numbers 2 and 3 are equivalent in terms of performance. If you're using PK to figure out which row to update and it's a clustered key, I wouldn't bother updating the column for myself. The problem with Situation 1 is that you will be causing a "procedure cache bloat" where you have many similar plans that are taking up your plan cache because they are slightly different from the update iteration.

If you are planning on doing bulk updates, I may hesitate to recommend updating all columns, as it might trigger FK searches, etc.



Thanks Eric

+4


source


I voted p.1 mixed with p.2 i.e. dynamically builds a parameterized UPDATE statement that will only update the changed columns. This will work when the read / write speed is on the "read" side and you don't update too often, so we can safely trade the execution plan cache for a (physical) update.

0


source







All Articles