SQL query on a table with 30mm records

I am having trouble creating a table on my local SQL Server. This essentially caused the tempdb table to fill up and throw an exception. This has many joins and external uses, and so to find specifically where the problem is, I made a selection in the first table in the sql query to determine how long it took, it was fast, so I then added the following table which was first join in request and retry, I kept doing this until I found a table that was stuck.

I found a problem (or at least the first problem) with the shipper_container table. This table is huge and by itself it gets a System.OutOfMemoryException just showing a selection based on the results of this table only (it has only 5 columns). It strips out 16 million records but has 30 million lines. It is 1.2 GB in size. It doesn't seem so big to me that SQL Management studio couldn't handle it.

Using the WHERE clause to collect values ​​from Jan 1 to Jan 10, 2015 still resulted in a lookup that took more than 5 minutes and was still running when canceled. I also added indexes for each of the select options and this did not increase performance either.

Here is the SQL query. You can see that I have commented out other options that have not yet been added to other connections and external.

DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
DECLARE @Shipper_Key INT = NULL
DECLARE @Part_Key INT = NULL

SET @startDate = '2015-01-01'
SET @endDate = '2015-01-10'

SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

INSERT Shipped_Container
(
    Ship_Date,  
    Invoice_Quantity,
    Shipper_No,
    Serial_No,
    Truck_Key,
    Shipper_Key
)
SELECT
    S.Ship_Date,
    SC.Quantity,
    S.Shipper_No,
    SC.Serial_No, 
    S.Truck_Key,
    S.Shipper_Key
    FROM Shipper AS S
JOIN Shipper_Line AS SL
    --ON SL.PCN = S.PCN
    ON SL.Shipper_Key = S.Shipper_Key
JOIN Shipper_Container AS SC
    --ON SC.PCN = SL.PCN
    ON SC.Shipper_Line_Key = SL.Shipper_Line_Key
WHERE S.Ship_Date >= @startDate AND S.Ship_Date <= @endDate
    AND S.Shipper_Key = ISNULL(@Shipper_Key, S.Shipper_Key)
    AND SL.Part_Key = ISNULL(@Part_Key, SL.Part_Key)

      

The server instance is running on the local network - could this be a problem? I do have minimal experience with this and would be very grateful for the help and as detailed and clear as possible. Often in the SQL forums people jump straight into the technical details, I don't follow that well.

+3


source to share


1 answer


Don't do a Select ... From yourtable

in SS Management Studio when it returns a hundrend of thousands or millions of lines. 1GB of data becomes a lot more when the system needs to draw and display it on the screen in the Management Studio data sheet.

Server instance runs on local network

When you execute Select ... From yourtable

in SSMS, the server should send all data to your laptop / desktop. This is quite a lot of raw network pressure. This shouldn't be a problem when pasting, because everything stays on the server. However, being on the server does not mean that it will be fast if your data model is not good enough.

SET INSULATION LEVEL SET TRANSACTION DO NOT MISS,

You might get dirty data, you use this ... It might be better to delete it if you don't know why it exists and why you need it.

I also added indexes for each of the select parameters and this did not increase performance either

If you mean indexes on:

  • S.Ship_Date,
  • SC.Quantity,
  • S.Shipper_No,
  • SC.Serial_No,
  • S.Truck_Key,
  • S.Shipper_Key

What are their definitions? If they are separate indexes on 1 column, you can drop indexes on SC.Quantity, S.Shipper_No, SC.Serial_No and S.Truck_Key. They are not used. Ship_Date and Shipper_key might be helpful. It all depends on your model and existing primary keys. (what you need to describe, see below)



This would help give a more accurate answer if you could tell us:

  • the relationship between your 3 tables (which field is the link A to B and in which direction)
  • primary key on three tables
  • a complete list of all your indexes (and columns) on three tables

If none of your indexes are in use, or if they are missing, it will most likely read all 3 tables and try to match them. Since it is quite large, it does not have enough memory to process it and it uses tempdb to store intermediate data.

Now I will assume that shipper_key + PCN is the primary key for each table. I think you can try this:

  • You can create an index on S.Ship_Date

    Create Index Shipper_Line_Ship_Date(Ship_Date) -- subject to updates according to your Primary Key
    
          

  • The query optimizer might not use indexes (if they exist) with a where clause like this:

    AND S.Shipper_Key = ISNULL(@Shipper_Key, S.Shipper_Key)
    AND SL.Part_Key = ISNULL(@Part_Key, SL.Part_Key)
    
          

    you can use:

    AND (S.Shipper_Key = @Shipper_Key or @Shipper_Key is null)
    AND (SL.Part_Key = @Part_Key or @Part_Keyis null)
    
          

  • It would help to have indexes on Shipper_Key and PCN

Finally

As I said above, we need to learn more about your data model (create table ...), primary keys and indexes (create indexes). You can create a mode here http://sqlfiddle.com/ with all 3 create tables and their indices. Then follow the link and add the link here.

In SSMS, you can right click on the table and go to Script Table as / Create To / New Query Window

and add it here or at http://sqlfiddle.com/ . Hold the piece CREATE TABLE ...

to the first GO

. Then you can do the same in all of your indices.

You must also add a copy of the query plan. In SSMS go to Query menu / Display Estimated Execution Plan

and right click to save it as xml (xml is better). This is only an estimate and will not fulfill the entire request. This should be pretty fast.

+2


source







All Articles