DbQuery behaves differently in a foreach loop. What for?

If I use the following code, I get a list of students who are taking course 1 and course 2. (This is almost what I want.)

IQueryable<Student> filteredStudents = context.Students;
filteredStudents = filteredStudents
    .Where(s => s.Courses.Select(c => c.CourseID).Contains(1)).Select(s => s);
filteredStudents = filteredStudents
    .Where(s => s.Courses.Select(c => c.CourseID).Contains(2)).Select(s => s);
List<Student> studentList = filteredStudents.ToList<Student>();  

      

However, if I try to do this in a foreach loop (as shown in the following code), then I get a list of all students who are registered for the last course in the loop.

IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {             
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID))
            .Select(s => s);
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

      

This behavior confuses me. Can anyone explain why this is happening? And how to get around this? Thank.

+3


source to share


4 answers


The problem is that the foreach loop only creates one variable course

for all iterations of the loop, and that single variable is then captured in the closure. Also remember that filters are not actually executed until the end of the loop. Put them together, and by the time the filters run this single variable course

, it has gone to the last item in the Course window; you only check that last course.

I see four ways to fix the problem.

The first

Create a new variable for each loop iteration (this is probably the best quick fix)

IQueryable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {  
        int CourseID = course.CourseID;            
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(CourseID));
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

      

Second

Allow IEnumerable expression inside a loop (possibly much less efficient):

IEnumerable<Student> filteredStudents = context.Students;
foreach (Course course in filter.Courses) {
    if (course != null) {             
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(course.CourseID))
            .ToList(); 
    }
}
List<Student> studentList = filteredStudents.ToList<Student>();

      

Third



Use more appropriate linq statements / lambda expressions to exclude the foreach loop:

var studentList = context.Students.Where(s => s.Courses.Select(c => c.CourseID).Intersect(filter.Courses.Select(c => c.CourseID)).Any()).ToList();

      

Or in a more readable way:

IQueryable<Student> filteredStudents = context.Students;
var courses = filter.Courses.Select(c => c.CourseID);
var studentList = filteredStudents
       .Where(s => s.Courses.Select(c => c.CourseID)
                       .Intersect(courses)
                       .Any()
       ).ToList();

      

If you're playing with this a bit, the performance should match or far exceed the foreach loop through clever internal use of HashSets or - if you're very lucky - by sending a JOIN to the DB). Just be careful, because it is easy to write something here that makes numerous "extra" calls to the DB inside this method Intersect()

or Any()

. However, this is the option I prefer, except that I probably wouldn't mention .ToList()

at the end.

It also illustrates why I have little use for ORMs like Entity Framework, linq-to-sql, or even NHibernate or ActiveRecord. If I am just writing SQL, I can know that I am getting the correct connection request. I could do it with the ORM too, but now I still need to know about the specific SQL I'm creating and I also need to know how to get the ORM to do what I want.

Fourth

Use C # 5.0. This has been fixed in the most recent version of C # so that each iteration of the for / foreach loop is its own variable.

+4


source


If you are trying to get everyone Student

that enrolled in every course in filter.Courses

, you can try:

var courseIDs = filter.Courses.Select(c => c.CourseID);
var filteredStudents = context.Students
    .Where(s => !courseIDs.Except(s.Courses.Select(c => c.CourseId)).Any())

      

which filters if the courseIDs

subset is course IDs Student

.



Edit

Joel Coehoorn and Mikael Eliasson provide a good explanation of why all final year students were reinstated.

+1


source


Since "filterStudents = filtStudents.Where ..." is a direct assignment to a variable, every time through the loop you completely replace what was in it before. you need to add it, not replace it. Try searching for "C # AddRange"

0


source


I don't think it has anything to do with Entity Framework. It is a bug (not really, but silly design in C #) where the variable is declared outside the loop.

In this case, it means that since the IEnumerable is lazy evaluated, it will use the LAST value of the variable. Use a temporary variable in a loop to solve it.

foreach (Course course in filter.Courses) {
    if (course != null) {
        var cId = course.CourseID;       
        filteredStudents = filteredStudents
            .Where(s => s.Courses.Select(c => c.CourseID).Contains(cId))
                .Select(s => s);
    }
}

      

Or even better if you have your navigation properties defined correctly. Just do:

var studentList = filter.Courses.SelectMany(c => c.Students).ToList()

      

See here for more details: Is there a reason for C # to reuse variable in foreach?

0


source







All Articles