This is an informal post about one of the less frequently visited parts of Sitecore ContentSearch - the context and more specifically the context in relation to the underlying Lucene indexes that ships with Sitecore and the context that the different Sitecore client interfaces run in.
So why should you read this post? Well if you do read it you will realize that you should in fact be very aware about the context and unfortunately even when you manage to steer clear of the pitfalls you will most likely still wind up in trouble regardless.
It shouldn’t come as a surprise to anyone that the different client interfaces in Sitecore use different contexts that are more or less related to the available site definitions that are present in the solution. For instance the content editor and the desktop are tied to the shell website definition and run in this context. However when you run some of the other client interfaces such as the preview, page editor or debug interfaces the context is usually tied to the default website definition or to a solution specific site definition.
The context that the client interface is running in is of course also coupled to an underlying Sitecore database which is referred to as the context database and this is where things get a little tricky.
It is common knowledge that when you run one of the Sitecore shell tools such as the content editor or the desktop the context database is in fact not the master database but the core database that contains all the data configuration for the client interfaces and instead the master database is made available through the Sitecore.Context.ContentDatabase member in the Sitecore API. It is also well known that when you run one of the site orientated client interfaces such as the page editor, preview or the debug tool the context database is the master database and the content database is null.
So it can be concluded that the context in Sitecore is fairly well managed and presented in a uniform manner through the API to make it easy to interact with the appropriate database in the given context.
This is unfortunately less true when it comes to the ContentSearch API and you will often find yourself referencing specific Lucene indexes directly which means that you have to be aware of and manage the context and which index it is appropriate to access in the different client interfaces manually.
This is of course all rather logical and obvious but even so I regret to say that I still see an awful lot of code examples where the Lucene web index is hard coded and no thought has been given to the fact that the context can change. Usually this leads to editors who are confused by the fact that they cannot see their work in preview simply because the content hasn’t been published yet or isn’t scheduled to be published yet and therefore doesn’t exist in the web database and logically so is also missing in the Lucene web index. In some cases I have even seen examples of editors creating content in the page editor but since it was retrieved using Lucene from the web index and the content was only just created in the master database their data “simply vanished”.
Ok so obviously we need to be a bit more conscious about our context. One way of doing this could be by creating a simple context service that we can use to provide the appropriate Lucene index for the current context such as the following example.
using System; using Sitecore.ContentSearch; using Sitecore.Data; namespace TheGrumpyCoder { public static class SearchIndexService { public static ISearchIndex GetContextIndex() { String databaseName = ContentOrContextDatabase.Name.ToLowerInvariant(); return ContentSearchManager.GetIndex(String.Format("sitecore_{0}_index", databaseName)); } private static Database ContentOrContextDatabase { get { return Sitecore.Context.ContentDatabase ?? Sitecore.Context.Database; } } } } USAGE using (IProviderSearchContext context = SearchIndexService.GetContextIndex().CreateSearchContext()) { ...... }
So now everything is perfect. The editors are happy because now you are drawing content from the master database when appropriate and they can see their unpublished data and they can run previews.
Well almost perfect because suddenly you are facing an even bigger problem. Now the editors are seeing too much content!
When they create new versions of existing items they are suddenly presented with multiple versions of the same item in their Lucene based data list renderings and not only that they are also presented with an almost sci-fi alternative reality when trying to do previews on future dates because all of a sudden they have items from more or less every time span in existence - past, present and future!
Ok so obviously some kind of version control is required. We need to be able to determine exactly which single version of an item is the correct item in the given context which means both taking multiple versions and different publishing restrictions versus different preview dates into account.
And this is where ContentSearch falls silent - dead silent. Well at least I would have preferred if it did.
However it does present one final offer in the form of the _latestversion field that is stored in the Lucene index. When this field is set to 1 it indicates that this is the latest version of an item.
Now at a first glance this is not half bad but once you think about it for more than a second you realize that this is unfortunately no help at all. Well it would be if you could teach the editors to always make consecutive advancing item versions and never use a previous version. However editors never do what you intend for them to do and especially not if the solution is running some kind of publishing workflow where it is very common to have newer versions that are not approved and therefore by all definitions are not the correct versions to present in a preview.
So unfortunately but in all honesty the _latestversion quickly turns into a moot point - irrelevant and impractical.
I wish that I could offer any kind of golden solution to this puzzle but I have yet to find the philosopher’s stone on this one and for the time being I have found myself resorting to post processing of the search results to root out all unwanted items based on removing multiple item versions, checking publishing restrictions manually and so on.
I just hope that Sitecore realize the impractical nature of the current implementation when applied to the real world and is able to present a more elegant solution in future releases.