Recommended indexes for a query in a large table containing "date range" and "order id"

Question

Recommended indexes for a query in a large table containing "date range" and "order id"

I have a query (which was generated by LINQ to SQL) to get a list of "site visits" that were made between a specific date range that resulted in the order (orderid is not null).

There is nothing wrong with the request. I just need some advice on creating the correct index for it. I have been playing around with various combinations in a production facility and managed to damage things to get the foreign key out. I fixed this after some panic - but thought I would ask for advice before recreating the index.

The table is approaching a million rows and I need indexes to help me here. This request is only used for reports, so it shouldn't be extremely fast, just don't delay other user requests (which it does).

SELECT TOP 1000
  t0.SiteVisitId, t0.OrderId, t0.Date, 
  t1.Domain, t0.Referer, t0.CampaignId
FROM 
  SiteVisit AS t0
  LEFT OUTER JOIN KnownReferer AS t1 ON t1.KnownRefererId = t0.KnownRefererId
WHERE
  t0.Date <= @p0 
  AND t0.Date >= @p1
  AND t0.OrderId IS NOT NULL
ORDER BY 
  t0.Date DESC

@p0='2008-11-1 23:59:59:000', @p1='2008-10-1 00:00:00:000'

I currently have a clustered index on SiteVisitId

which is my integer id column.

I don't know which of the following are most likely the most effective:

Create an index on Date
Create an index on Date

AND a separate index onOrderId
Create an Date

AND "multicolumn" indexOrderId
Some other combination?

I am also wondering if I should create a separate column of bits for hasOrder

instead of if checking if OrderId IS NOT NULL

that could be more efficient.

FYI: KnownReferer is just a table containing a list of 100 known HttpReferers, so I can easily see how many hits are from Google, Yahoo, etc.

0

sql sql-server indexing

Simon_Weaver 23 nov. '08 at 7:38

source to share

5 answers

Brannon · Answer 1 · 2008-11-23T08:34:48+0000

How many lines do you expect to have between a typical date range? Do you usually look at a month at a time?

I would start with an index on a column [Date]

. If there are fewer total rows for a typical query, then you do not need to add the column [OrderId]

to your index.

On the other hand, if you have a large number of rows in a typical month, you can add a column [OrderId]

in the index, though, because it is regarded as a Boolean value that it can not buy you a lot of it depends on how many lines NULL

vs NOT NULL

. If you have many rows in a given month, but only a few have valid rows [OrderId]

, then the index is likely to improve performance.

Read the accepted answer in this related question and determine if the extra column is worth indexing:

Should bit be indexed in SQL Server?

And of course check indexes and plans created with and without index.

Update: Some other answers point to a more aggressive index, which should improve the performance of this query, but may negatively affect other operations on the table. For example, the proposed coverage index will allow SQL Server to process this query without having much impact on the actual table, but may cause problems when writing other queries to the actual table (since SQL Server would need to update the table and coverage index in this case).

Since this is a reporting query, I would optimize it as little as possible. If this query is taking a long time, causing other, more critical queries to run slowly or time out, I would only optimize this query to reduce its impact on those other queries.

Although, if you expect this table to continue to grow, I would consider a separate reporting schema and periodically pull data from this table.

Mitch wheat · Answer 2 · 2008-11-23T09:47:06+0000

I would create an index on Date and OrderId columns and INCLUDE SiteVisitId, Referer, CampaignId (assuming you are using SQL Server 2005 onwards). Also create an index on the KnownRefererId foreign key column.

Given that this is a reportable query and it can survive an odd line, I would suggest using the NOLOCK (or READ UNCOMMITED) hint:

using (var trans = new TransactionScope(TransactionScopeOption.Required,
                      new TransactionOptions
                      {
                          IsolationLevel = IsolationLevel.ReadUncommitted
                      }))
{
    // Put your linq to sql query here
}

Link .

Caution . Only use NOLOCK hints where you have a very good reason . I've seen devs get terrified with a blanket before!

Ben R · Answer 3 · 2008-11-23T09:17:27+0000

It's also worth considering whether to store rows in SiteVisit that don't have a KnownRefererId in your KnownReferer table, and that have a Null OrderId. If you don't need that, change their deletion from the table and change your clustered index to both SiteVisitId and Date and the query should be pretty fast.

But I'm sure you've kept those extra lines for some reason.

WW. · Answer 4 · 2008-11-23T09:38:07+0000

If you really want to optimize bejesus from this query, and you can accept a slightly slower insert into the table, you should create an index on: -

(Date, OrderId, SiteVisitId, Domain, Referer, CampaignId)

This will allow the database to return the entire response from the index without sorting or accessing a separate table.

Mark brackett · Answer 5 · 2008-11-23T15:19:48+0000

SELECT TOP 1000
  t0.SiteVisitId, t0.OrderId, t0.Date, 
  t1.Domain, t0.Referer, t0.CampaignId
FROM 
  SiteVisit AS t0
LEFT OUTER JOIN KnownReferer AS t1 ON t1.KnownRefererId = t0.KnownRefererId
WHERE
  t0.Date <= @p0 
  AND t0.Date >= @p1
  AND t0.OrderId IS NOT NULL
ORDER BY 
  t0.Date DESC

@p0='2008-11-1 23:59:59:000', @p1='2008-10-1 00:00:00:000'

I'm going to talk about table statistics here, and the resulting design might slow down other queries - but that's generally a trade-off. I usually find that when moving a clustered index, it is best to create a replacement index to avoid too many other queries.

Assuming there are many rows in the date range of the month, and relatively few of them have OrderId IS NULL - your best bet would be to have a clustered index on Date. This should give you a cluster index scan, with the results that nicleley ordered for your TOP 1000.

You may also want KnownReferer.KnownRefererId to be either a clustered index or a concatenated index with knownRefererId + Domain to avoid looking up that table. I would assume you have a small number of KnownReferers, so I would not expect much benefit from this.

Recommended indexes for a query in a large table containing "date range" and "order id"

More articles: