DbQuery behaves differently in a foreach loop. What for?
If I use the following code, I get a list of students who are taking course 1 and course 2. (This is almost what I want.)
IQueryable<Student> filteredStudents = context.Students;
filteredStudents = filteredStudents
.Where(s => s.Courses.Select(c => c.CourseID).Contains(1)).Select(s => s);
filteredStudents = filteredStudents
.Where(s => s.Courses.Select(c => c.CourseID).Contains(2)).Select(s => s);
List<Student> studentList = filteredStudents.ToList<Student>();
However, if I try to do this in a foreach loop (as shown in the following code), then I get a list of all students who are registered for the last course in the loop.
IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
if (course != null) {
filteredStudents = filteredStudents
.Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID))
.Select(s => s);
}
}
List<Student> studentList = filteredStudents.ToList<Student>();
This behavior confuses me. Can anyone explain why this is happening? And how to get around this? Thank.
source to share
The problem is that the foreach loop only creates one variable course
for all iterations of the loop, and that single variable is then captured in the closure. Also remember that filters are not actually executed until the end of the loop. Put them together, and by the time the filters run this single variable course
, it has gone to the last item in the Course window; you only check that last course.
I see four ways to fix the problem.
The first
Create a new variable for each loop iteration (this is probably the best quick fix)
IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
if (course != null) {
int CourseID = course.CourseID;
filteredStudents = filteredStudents
.Where(s => s.Courses.Select(c => c.CourseID).Contains(CourseID));
}
}
List<Student> studentList = filteredStudents.ToList<Student>();
Second
Allow IEnumerable expression inside a loop (possibly much less efficient):
IEnumerable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
if (course != null) {
filteredStudents = filteredStudents
.Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID))
.ToList();
}
}
List<Student> studentList = filteredStudents.ToList<Student>();
Third
Use more appropriate linq statements / lambda expressions to exclude the foreach loop:
var studentList = context.Students.Where(s => s.Courses.Select(c => c.CourseID).Intersect(filter.Courses.Select(c => c.CourseID)).Any()).ToList();
Or in a more readable way:
IQueryable<Student> filteredStudents = context.Students;
var courses = filter.Courses.Select(c => c.CourseID);
var studentList = filteredStudents
.Where(s => s.Courses.Select(c => c.CourseID)
.Intersect(courses)
.Any()
).ToList();
If you're playing with this a bit, the performance should match or far exceed the foreach loop through clever internal use of HashSets or - if you're very lucky - by sending a JOIN to the DB). Just be careful, because it is easy to write something here that makes numerous "extra" calls to the DB inside this method Intersect()
or Any()
. However, this is the option I prefer, except that I probably wouldn't mention .ToList()
at the end.
It also illustrates why I have little use for ORMs like Entity Framework, linq-to-sql, or even NHibernate or ActiveRecord. If I am just writing SQL, I can know that I am getting the correct connection request. I could do it with the ORM too, but now I still need to know about the specific SQL I'm creating and I also need to know how to get the ORM to do what I want.
Fourth
Use C # 5.0. This has been fixed in the most recent version of C # so that each iteration of the for / foreach loop is its own variable.
source to share
If you are trying to get everyone Student
that enrolled in every course in filter.Courses
, you can try:
var courseIDs = filter.Courses.Select(c => c.CourseID);
var filteredStudents = context.Students
.Where(s => !courseIDs.Except(s.Courses.Select(c => c.CourseId)).Any())
which filters if the courseIDs
subset is course IDs Student
.
Edit
Joel Coehoorn and Mikael Eliasson provide a good explanation of why all final year students were reinstated.
source to share
I don't think it has anything to do with Entity Framework. It is a bug (not really, but silly design in C #) where the variable is declared outside the loop.
In this case, it means that since the IEnumerable is lazy evaluated, it will use the LAST value of the variable. Use a temporary variable in a loop to solve it.
foreach (Course course in filter.Courses) {
if (course != null) {
var cId = course.CourseID;
filteredStudents = filteredStudents
.Where(s => s.Courses.Select(c => c.CourseID).Contains(cId))
.Select(s => s);
}
}
Or even better if you have your navigation properties defined correctly. Just do:
var studentList = filter.Courses.SelectMany(c => c.Students).ToList()
See here for more details: Is there a reason for C # to reuse variable in foreach?
source to share