So the main problem some people have when coming across LINQ-to-Objects is that they think it will work in the same way as LINQ-to-SQL. The fabled John Skeet has written a very in-depth blog series into how LINQ-to-Objects is implemented, and this post is intended to give you a quick overview and some handy pointers.
With LINQ-to-SQL, and other well implemented IQueryables, your LINQ statements compile down to the most efficient query possible. This often takes advantage of SQLs aggregation abilities. LINQ-to-Objects however only operates on the previous enumerable input. This means that sometimes it can be easy to write statements that iterate through your collections many times more than necessary. Often this isn’t an issue as your collections might be small, but the efficiency gains learnt from understanding LINQ-to-Objects can also lead to more readable code.
So without further ado, lets look at some examples!
Checking that a collection contains at least one item
So maybe you have written something like the following to test that a collection contains items?
_collection.Count() > 0;
But this will cause you to iterate through the entire collection to get a Count and then do a comparison of that to 0. If you have 100,000 items in your collection (maybe your caching something?!) then you’re going to have a long running for/foreach loop in there eh?
Why not do the following instead?
Checking that a collection contains at least one specific item (i.e. using a where clause)
Ok, so it seems simple when you know about it, but sometimes you start learning these things and it’s just not explained to you. This method simply tries to start an iteration, and if items are available then it returns true. If it can’t iterate it returns false. So you instantly get a huge performance saving if you have more than 1 item in your collection.
With SQL you might write something like this:
SELECT TOP 1 ISNULL('1', '0') FROM Table WHERE SomeField = @SomeValue
This is just testing if a table contains at least one of our value. I’ve often seen this implemented as this:
_collection.Select(x=> x.SomeField).Where(x=> x == SomeValue).Count() > 0;
I bet you’re thinking “well that should at least finish with .Any() eh?!”, and you’d be right of course. However in our query we have iterated through the collection fully to get an IEnumerable<T> of SomeField for the Select, then again to get the subset (although iterating fully) of IEnumerable<T> for the Where and then iterated through it fully again to get a Count. Obviously using Any() provides a way of reducing these iterations. However there is further scope for reducing these iterations …
Most of the methods for LINQ-to-Objects selection methods also take some kind of filter, we can use this with Any() to produce a way to iterate only part way, if a match exists, of the Collection.
_collection.Any(x=> x.SomeField == SomeValue);
Getting the first item out of a list using a predecate
So just like the previous example you might want to get the first value from the list in it’s entirety. Maybe you copied your previous query and just put the following in:
_collection.Select(x=> x.SomeField).Where(x=> x == SomeValue).FirstOrDefault();
As with the previous example, this will cause at least 2 full iterations of our collection (or intermediate collections). To our rescue though is the fact that we can add predicates to FirstOrDefault() just as we could with Any(). So if we re-write it like so:
_collection.FirstOrDefault(x=> x.SomeField == SomeValue);
Then we get a performance gain as well as reducing the amount of code we have to read and write.
I sincerely apologise if this post might be obvious to you, and if it wasn’t and it came across as patronising then I am sorry for that too. However hopefully it has highlighted that poking around a little with the method signatures for LINQ-to-Objects will give you some great performance gains as well as increase the readability of your code.
As a final note you might want to stay away from doing ToList() or ToArray() unless you absolutely need to convert your collection as those types. Almost all of the LINQ selectors use Yield, which delays execution of the query until you start iterating through the Collection. If you do call ToList() or ToArray() then it will iterate through the entire collection and copy it to an in-memory object, which negates any benefit from using LINQ.