Sunday, May 08, 2016

Effective use of Sitecore LinkDatabase

Keep track of your Reference Field references

I’ve often mentioned the Sitecore LinkDatabase, which I’ve found to be a hugely useful resource over the years but it’s getting very little exposure. And given I need to test out the Open Source version of my favourite blogging tool, I figured I might as well hit 2 birds with 1 stone and write a post about it.

What is Sitecore LinkDatabase?

Sitecore LinkDatabase (to save my sanity, hereafter referred to as SLDB) is a simple table that keeps track of Reference Field references. And what are Reference Fields?   Basically anything that stores one or more IDs as the raw field value. So Droplink, Treelist, Image, Internal Link and so on.

SNAGHTML13175c1f

And raw values looking like this:

image

And this is where the SLDB comes into play. What it does, is keep track of those references. It basically stores something to the effect of; “Item {id} on Field {id} keeps a reference to Target Items {id}{id}{id}”.  The reasons this functionality exists, is to support the Sitecore Content Editor so that when you are about to delete an item that might be used elsewhere, it pops a warning.

image

In this case I am not given other choice (as I’m trying to delete a Template in use), but in other cases Sitecore will offer you to remove the references to the Item you’re about to delete, or relink the references to someplace else. As a side note; this, too, is immensely useful functionality if you’re refactoring your content – but that’s a story for a different day.

The nuts and bolts

Currently, the SLDB looks like this:

image

It sits in your “core” database. Be mindful of this if you’re going to be putting it to use – “core” may or may not be available in your Content Delivery environment depending on your setup. Easily fixed, however. Configuration for it sits here:

  <!-- LINK DATABASE -->
  <LinkDatabase type="Sitecore.Data.$(database).$(database)LinkDatabase, Sitecore.Kernel">
    <param connectionStringName="core" />
  </LinkDatabase>

Migrate the “Links” table from “core” to “web” or wherever you please, and you’re good to go.

The day to day is handled by this event handler, you’ll find it on most of the item events.

<handler type="Sitecore.Links.ItemEventHandler, Sitecore.Kernel" method="OnItemCopied" />

And there’s also a setting. Leave it at it’s default if you’re going to be using the SLDB at runtime (if you toggle this “off”, SLDB won’t be kept up to date for “web”).

<setting name="LinkDatabase.UpdateDuringPublish" value="true" />

Caveat: If you’re using the excellent Unicorn; be sure to allow it to update SLDB during sync operations. It is currently off by default. You need version 3.1.5 or later.

The nitty-gritty

Right, so enough. WHY should you care?  Well let me show you.

So I’ve set up a vanilla solution, and defined me some templates. It’s not particularly advanced, but it represents a pretty common IA challenge in our solutions today.

SNAGHTML134587f2

So we have a “News” template. It can exist in several categories, and has 1 Author.

IA challenge #1: We can’t organise “News” in category “folders”, as it is not a 1:1 relationship
IA challenge #2: We need news organised in a deep “folder” structure for SEO and performance

Using a bit of c# code so hacky I would never share it, I end up with 1000 “News” articles, with a random number of categories and so on.

image

I’ve kept the organisation simple, as this will be enough for what I’m trying to demonstrate.

image

So basically a “month” folder. In a real scenario, likely you would go 1 level deeper and include “year”. It won’t matter for this example however.

Now; our common problems with this are:

Problem #1: How do I list all “News” in a specific Category?
Problem #2: How do I list all “News” by a specific Author?

I’m going to pretend we live in a world where indexing this data isn’t an option, and we’re left with just the basic tools available to us.

Solution #1: Use Sitecore Query
Solution #2: Use Sitecore Fast Query
Solution #2: Use SLDB

Let’s see what that looks like. I cook up some rudimentary (and ugly) code:

var db = Factory.GetDatabase("web");
List<Item> authors = db.GetItem("/sitecore/content/Solution Data/Authors").GetChildren().ToArray().ToList();
List<Item> categories = db.GetItem("/sitecore/content/Solution Data/Categories").GetChildren().ToArray().ToList();
ID newsTemplateId = new ID("{5CB0FC6D-DCAA-4A51-8745-D9933F77679A}");
var newsRoot = db.GetItem("/sitecore/content/Home/news");

// Make sure Sitecore is warmed up
newsRoot.Axes.GetDescendants();

Response.Write("<h2>Authors</h2>");
var sw = new Stopwatch();
foreach (var author in authors)
{
    Response.Write($"<strong>{author.Name}</strong><br />");
    sw.Start();
    var referrers = GetAuthorReferrers(newsRoot, author.ID);
    sw.Stop();
    Response.Write(referrers.Length + " articles; Time in Ms: " + sw.ElapsedMilliseconds + "<hr />");
    sw.Reset();
}

And the most important bit, how I query it.

Regular Sitecore Query

Item[] GetAuthorReferrers(Item root, ID authorId)
{
    string query = root.Paths.FullPath + "//*[contains(@Author, '" + authorId + "')]";
    return root.Database.SelectItems(query);
}

I run this a couple of times (I’m not exactly in a stable environment to conduct any real scientific test of this – keep this in mind).

image

I then rewrite it for Sitecore Fast Query.

Sitecore Fast Query

BE AWARE!   Sitecore Fast Query sacrifices functionality to gain performance. Be especially aware of its limitations if your solution is multilingual.

Code now looks like this:

Item[] GetAuthorReferrers(Item root, ID authorId)
{
    string query = "fast:" + root.Paths.FullPath + "//*[@Author = '%" + authorId + "%']";
    return root.Database.SelectItems(query);
}

And an output that comes out like this. (Believe me, I ran this dozens of times).

image

Are we having fun yet?  I can offer up a bit of speculation on what’s going on here, but I’m honestly not completely sure. Mostly because I would never use neither solution to query my data, so my experience is somewhat limited. My best guess would be; to benefit from fast: query, you need more data. I will try this out with a larger dataset a bit further down.

So anyway. Roll on SLDB.

Sitecore LinkDatabase

Drawback of this approach is, that it’s a bit more code heavy. Not by much though.

Item[] GetAuthorReferrers(Item root, Item author)
{
    ID authorFieldId = new ID("{438E45E5-9F85-4705-976E-FC76E563F5EF}");

    var items = new List<Item>();
    ItemLink[] itemLinks = Sitecore.Globals.LinkDatabase.GetItemReferrers(author, false);
    foreach (var il in itemLinks)
    {
        if (il.SourceDatabaseName.Equals(root.Database.Name) && il.SourceFieldID == authorFieldId)
        {
            items.Add(il.GetSourceItem());
        }
    }
    return items.ToArray();
}

And here’s how we end up.

image

Am I managing to convince you yet?   so far we’re doing ok. We’re outperforming Sitecore Query by about 10 to 1. And Sitecore Fast Query by 50 to 1. A pretty decent start I’d say.

The sleight of hand

So I’m left with two problems. I’ve not brought “Category” into play. And I’ve not got a simple output, that fully compares these 3 side by side. I’ve created a Gist with the full source code – it is to long to include here, even by my standards.

Enter, a code rewrite. This time I’m looping through all Categories, then all Authors – finding all News articles that match the Category AND the Author. It completes the setup, and it actually demonstrates one of the slightly more complex situations when using SLDB. Using querying this becomes just another “and” expression.

SLDB code looks like this:

private Item[] GetNewsArticlesSitecoreLinkDatabase(Item newsRoot, Item authorItem, Item categoryItem)
{
    // Note. SLDB version of this actually needs to make 2 lookups. First to find all news articles referencing our author, then all that reference the category.
    // Note #2: Still, my hand is not shaking ;-)

    var authorFieldId = new ID("{438E45E5-9F85-4705-976E-FC76E563F5EF}");
    var categoriesFieldId = new ID("{31D2A6CC-6984-4CA0-BB73-9D39C3B8D0AA}");

    var authorLinks = Globals.LinkDatabase.GetItemReferrers(authorItem, false).Where(al => al.SourceFieldID == authorFieldId).ToList();
    var categoryLinks = Globals.LinkDatabase.GetItemReferrers(categoryItem, false).Where(cl => cl.SourceFieldID == categoriesFieldId).ToList();

    var newsArticles = new List<Item>();
    foreach (var authorIl in authorLinks)
    {
        var categoryIl = categoryLinks.ToList().Find(cl => cl.SourceItemID == authorIl.SourceItemID);
        if (categoryIl != null && categoryIl.SourceDatabaseName == newsRoot.Database.Name)
        {
            newsArticles.Add(categoryIl.GetSourceItem());
        }
    }

    return newsArticles.ToArray();
}

And what do we get?   Well this.

image

It matches up pretty well the results already established. Adding the extra query seems to have added a bit to all of the query execution times – this is completely expected. In relative terms however, I think it is still very clear who the winner is.

2 comments:

Casper Simonsen said...

Hi Mark,

Great comparisson of queries vs SLDB, but I think you are missing an option of sitecore Linq as a basic tool? If we pretend we live in a world where indexing data is not an option, sitecore would still have its own indexes running (sitecore_master_index and sitecore_web_index)?


I haven't testet it, but then something like this would be possible:
private Item[] GetNewsArticlesSitecoreLinkDatabase(Item newsRoot, Item authorItem, Item categoryItem)
{
List newsArticles;
var authorFieldId = new ID("{438E45E5-9F85-4705-976E-FC76E563F5EF}");
var categoriesFieldId = new ID("{31D2A6CC-6984-4CA0-BB73-9D39C3B8D0AA}");

using (var context = ContentSearchManager.GetIndex((SitecoreIndexableItem)newsRoot).CreateSearchContext())
{
newsArticles = context.GetQueryable()
.Where(item => item.GetField(authorFieldId).Value.Contains(authorItem.ID.ToString()))
.Where(item => item.GetField(categoriesFieldId).Value.Contains(categoryItem.ID.ToString()))
.Select(p => p.GetItem())
.ToList();
}

return newsArticles.ToArray();
}

Im not aware where this fits in performance wise, but it should be quite fast and scale well?

Mark Cassidy said...

Hi Casper,

It will indeed scale quite well. It's not an oversight, "Index all things" is still pretty much my motto. I just wanted to show a better approach to getting to items that relate in some way, that did NOT involve any form of Sitecore Query.

The next Decennial post will take a deep dive into indexing :-)