Database and EF problems?

I have a sql selection question that people have given me different answers over the years. Let's say I have a couple of tables, each with more than 40 columns and will potentially contain ten and thousand rows, I am using SqlServer2005.

When joining these tables in a where clause, if I have things like

select * from t1, t2
where t1.UserID = 5 
and t1.SomeID = t2.SomeOtherID

      

some people say that you should alwasys have constant "t1.UserID = 5" in front and not after "t1.SomeID = t2.SomeOtherID", this improves selection performance. Others say it doesn't matter.

What's the correct answer?

Also, if I use ADO.NET Entity Framework to implement my DAL, would tables that have more than 40 columns be modeled and doing CRUD operations is a performance issue?

Thank,

Ray.

0


source to share


4 answers


In general, with database optimizations, you should write SQL that is conceptually correct first, and then tune performance if profiling shows that it is necessary. When doing inner join, it is better to use SQL-92 explicit INNER JOINs than Cartesian ones. So I would start by writing your SQL like this:

SELECT * 
FROM t1
  INNER JOIN t2
    ON t1.SomeID = t2.SomeOtherID
WHERE
  t1.UserID = 5 

      

t1.SomeID = t2.SomeOtherID, which is included in the ON part of the INNER JOIN because it expresses a relationship between two tables. UserID, which is included in the WHERE clause because it is a filter that limits the result set. Writing your SQL this way gives the database optimizer more information as it expresses your intentions about joins and filtering.



Now, if you are not getting acceptable performance with this syntax in a real database, then feel free to experiment with moving bits. But, as I said, start with something fundamentally correct.

As for the second part of your question, the most obvious implication of this is that when you select a collection of objects, the Entity Framework should return all properties for the objects it materializes. So if you have 40 columns, you will be pulling that data back down the wire if you materialize it as entities. However, you can write LINQ queries that return anonymous types that contain only the columns you want. However, to do a full CRUD, you will need to return objects.

+4


source


According to people, this will change over time as RDBMS query optimization has changed over time and different RDBMSs will have different approaches. I can't speak for every system, but it's unlikely that in 2008 it will matter. YMMV if you are only interested in a specific system.



I can tell you that for any latest version of Oracle it doesn't matter.

+2


source


I know this answer is pretty commonplace, but I would suggest writing tests. Bring up the console app and test it yourself. Run the query a couple hundred times and see how long it takes for each method.

There are many superstitions when it comes to SQL query performance and optimization. Some people think it's faster, but they don't actually check their facts. In addition, the way EF or LinqToSql work and interact with the database can lead to performance differences not obvious in SQL.

If you are optimizing your code, you can also use a profiler like RedGate ANTS. It's not free, but it can help a lot to find bottlenecks in your code. Then you can find places in your code to optimize much easier. It's not always your database slowing down your applications. Or sometimes you do a quick query, but you do it a million times when you can actually cache the result.

+1


source


First, build your query using the explicit JOIN syntax, not Cartesian product. It probably won't affect performance for any modern optimizer, but it does make information about how JOINs work more readily available to programmers.


SELECT Player.Name, Game.Date
 FROM Player
  INNER JOIN Game ON Game.WinnerPlayerID = Player.PlayerID
 WHERE Game.WinnerFrags > Game.TotalFrags/2
 ORDER BY Player.Name

      

Which will give us all the players, sorted by name, who will take more fragments in the game than all the other players in the game combined, and the dates of the games. Including both conditions in the JOIN probably won't have a performance impact either, as the optimizer will most likely do the filtering as part of the JOIN. However, this is starting to matter for the LEFT JOIN. Let's say we are looking for how many games that won the top 10 players per week won by the margin above. Since it is possible that some of them never have such an impressive effect, we need a LEFT JOIN.


SELECT Player.WeekRank, Player.Name, COUNT(Game.*) AS WhitewashCount
 FROM Player
  LEFT JOIN Game ON Game.WinnerPlayerID = Player.PlayerID
 WHERE Player.WeekRank >= 10
  AND Game.WinnerFrags > Game.TotalFrags/2
 GROUP BY Player.WeekRank, Player.Name
 ORDER BY Player.WeekRank

      

Well, not quite. JOIN will return entries for every game the player is playing, or player data and NULL data if the player hasn't played any games. These results will be filtered during or after the JOIN depending on the optimizer's decision based on frag criteria. This will eliminate any records that do not meet the snippet criteria. So there will be no entries for players who have never had such an impressive win. Efficient creation of INNER JOIN .... FAIL.


SELECT Player.WeekRank, Player.Name, COUNT(Game.*) AS WhitewashCount
 FROM Player
  LEFT JOIN Game ON Game.WinnerPlayerID = Player.PlayerID
   AND Game.WinnerFrags > Game.TotalFrags/2
 WHERE Player.WeekRank >= 10
 GROUP BY Player.WeekRank, Player.Name
 ORDER BY Player.WeekRank

      

Once we move the frag criteria to JOIN, the query will behave correctly, returning records for all players in the top ten of the week, regardless of whether they have reached white.

After all this, the short answer is:

For INNER JOIN situations, this probably doesn't make a difference in performance when you set conditions. Queries are more readable if you separate join and filter conditions. And getting the state in the wrong place can seriously mess up LEFT JOIN results.

0


source







All Articles