Make SharePoint fly, Part 1

How to speed up Sharepoint 2007 and SharePoint 2010 list data access

 

In this article I want to share my experiences working with real-life lists in SharePoint with regards to data access performance. To access list data in SharePoint can be rather slow but it can actually be pretty fast. In order to code efficient SharePoint data access code you must bear a couple of rules in mind:

  1. Keep the list size below 10.000 items
  2. Use the fastest data access object model methods
    1. PortalSiteMapProvider.GetCachedListItemsByQuery(PortalWebSiteMapNode, string, SPQuery, SPWeb)
    2. SPList.GetItems(SPQuery)
  3. Always use CAML queries to get data. The only exceptions are:
    1. SPList.GetItemById can be used in SharePoint 2007, but never use SPListItemCollection.GetItemById
    2. SPList.GetItemByIdSelectedFields should be preferred over List.GetItemById in SharePoint 2010
    3. LinqToSharePoint with SharePoint 2010 can be used but you must always review the generated CAML
  4. Index the site columns you query on
  5. Verify in production

In the following I will discuss the above rules more detailed and give a couple of references. Everything written here is based on my personal experience and I cannot guarantee that the information provided is flawless. So please share your comments and experiences here and help me to improve the article. On contrary these rules of thumb have helped some seriously troubled SharePoint applications. In one case the use of SPList.GetItems(SPQuery), which is good, still lead to frequent server break-downs. The database server was getting to much load. Replacing it with PortalSiteMapProvider.GetCachedListItemsByQuery(PortalWebSiteMapNode, string, SPQuery, SPWeb) in the hot spots of the calling code took a lot of pressure from the database server. In another example we could improve the average page rendering from 8s to 2s by using the PortalSiteMapProvider.GetCachedListItemsByQuery(PortalWebSiteMapNode, string, SPQuery, SPWeb) and replacing SPListItemCollection.GetItemById with SPList.GetItemById.

But how can you determine the weak spots in your application? Like in the first example with the troubled database server you should take a look at a SQL Server trace. Calls to a stored procedure called spexecutequery (I am not 100% sure on the name) indicate the execution of CAML queries from the SharePoint application. CAML is translated into a so-called ad-hoc SQL query. To determine what lists and queries are generating the load on the database server you can only tell if you have the actual SQL included in the trace. Normally this is not an option in a production system. Get a copy of the production content database on your test environment, create a load test and enable the tracing on the database server in the test environment. Another option to find weak spots is to populate lists with production or production like data and run a ASP.NET profiler such as Red Gates ANTS Profiler on certain pages on your development machine.

A note on load testing: When testing on a contained environment you might notice that regular list data access via SPList.GetItems at some point becomes very fast. This will happen if you query the same data multiple time so that SQL Server will cache the query results. Use the following T-SQL statement to clean all SQL Server buffers:

DBCC DROPCLEANBUFFERS;

A simple and free tool for load testing is Load UI from eviware. It allows you to request pages at variable rates. It has a drag&drop user interface in the same way as good old Lab View. If you need more complex load patterns, do custom stuff, click on things in the browser a good starting point is Visual Studio for Testers or Ultimate edition with its web test/load test capabilities.

1. List Size

The list size is the key issue with list data performance. If you have a list with ten items you get great performance no matter how you access the data in the list. On the contrary a list with ten million items cannot be used at all in terms of normally expected application performance no matter how you access the data. What you really need to understand is how the performance scales in between these two extremes. That depends a lot on how you access the data. The best starting point is to ready the Microsoft white paper on big lists.

When designing a SharePoint application you should, as a rule of thumb, only create lists with maximum 10.000 items. When you need more items either create folders or multiple lists. Folders and lists perform in the same way. Using multiple folders within a list complicates querying a little bit, because you have to set the folder on the SPQuery object. You should avoid to use recursive CAML queries because it will be slower and you might match items from another folder that you are not interested in.

2. Data Access Methods

The lesson learned from the big list white paper is that you should either use PortalSiteMapProvider.GetCachedListItemsByQuery(PortalWebSiteMapNode, string, SPQuery, SPWeb) or SPList.GetItems(SPQuery). Nothing else. In some rare case it might be acceptable to use List.Items but only if you are 100 % sure that the list will have few items. Working with SharePoint 2010 you can also use Linq to SharePoint with absolute care (see the following paragraph for details).

The PortalSiteMapProvider is a great way to access list data via the SharePoint object cache. So when using this data access method make sure to configure the object cache to something useful. As with every cache this method incurs an overhead for adding items to the cache and checking the caches validity. The most technical information I could find on this method is from the SharePoint team blog. In a situation with many reads compared to writes you will get a great performance boost from this method. Otherwise it will slow down your application. One limitation with this data access method is that you cannot access list item attachments or document library documents through it. It provides only all the site columns of the list or document library. The PortalSiteMapProvider uses security trimming and updates changed items instantly in the cache.

In situation where you cannot use the PortalSiteMapProvider you should opt for SPList.GetItems(SPQuery) or the Linq to SharePoint solution.

3. CAML Only

CAML queries are the only efficient way to query list data in SharePoint. The alternative is to retrieve all list items from the content database and than filter these in code. It is obvious that the alternative is slow.

So, PortalSiteMapProvider.GetCachedListItemsByQuery, SPList.GetItems and Linq to SharePoint use CAML queries to query the content database. That is why you must be extremely careful when using Linq to SharePoint. It hides the CAML generation from you by abstraction. Underneath it creates a CAML query in the Linq to SharePoint provider which you must inspect because not all Linq capabilities are supported by CAML. For example when you use IEnumerable<T>.Contains in a Linq to SharePoint query it will not generate CAML or clauses but retrieve all list items and than filter these with Linq to Objects in memory afterwards (two-stage query). And that is exactly what you do not want to happen. Check-out this MSDN article for more details about two-stage queries. To see the auto-generated CAML attach a TextWriter (e.g. StringWriter) to DataContext.Log, see here.

An interesting issue exists with the GetItemById method of the SharePoint object model. There is a big performance difference between using SPList.GetItemById and SPListItemCollection.GetItemById. I have no idea why but the Microsoft best practices do not recommend to use SPListItemCollection.GetItemById and my own experience backs this up. See this article for a reference. So it is safe to use SPList.GetItemById when know the ID of the list item and want to retrieve it. In SharePoint 2010 you should consider and prefer to use the SPList.GetItemByIdSelectedFields method as it only loads data for the specified fields and hence performs better.

4. Indexing

SharePoint site columns can be indexed. When you have a list with a couple of thousands list items a you always query on a site column called Name you should index that site column. See this article how to create indexes in SharePoint. It is the same principle as creating indexes in databases. Some sort of hash map is maintained for your index in memory so that items can be matched much faster than by a retrieving and filtering approach.

5. Verify

One thing is to code for performance the other thing is verify that everything behaves as expected. I would suggest two things to make sure that your optimizations are effective:

  1. Check the object cache.
    Because the PortalSiteMapProvider.GetCachedListItemsByQuery uses the object cache you want to make sure that it works efficiently. This is not a trivial task. You can find more information here. The bottom line is that you should monitor the ‘SharePoint Publishing Cache’ performance counter on the production server. The number of ‘Total number of cache compactions’ should be 0 to 1 within an hour.
  2. Use SPMonitoredScope (SharePoint 2010 only) for data access methods.
    Wrapping the data access method with a using(new SPMonitoredScope(“Scope name”)) { … } adds an entry to the developer dashboard. This is rather targeted at your development and environment than production. But it gives you the chance to evaluate the data access performance all the time easily in the browser.

 

Well, I hope this article helps you to writer faster data access code and makes your SharePoint applications fly. Please drop a comment with your experience, advice or challenges.

 

The next part will be about caching in SharePoint applications. I always disliked the way I used to do control output caching so far. I am working on a more general approach that uses the ASP.NET (partial) page output cache and the ASP.NET cache with a CacheDependency or HttpCachePolicy.AddValidationCallback.

3 thoughts on “Make SharePoint fly, Part 1

  1. Thanks for this, very useful. I cannot seem to find part 2, was that published somewhere else or never made it to press?

    1. Thanks for your comment. I am sorry but I never came around to write it. It is still on the TODO list :-)…

Leave a Reply

Your email address will not be published. Required fields are marked *