One of the very neat things about Sitecore, is the way the architecture allows you to mould, shape, and work with the configuration files to come up with an implementation that suits your purpose.
As the title of this post will suggest, I will be taking a look at Sitecore databases in this post; and how you are free to work with as many of them as you see fit in your projects.
For sake of argument, let’s say that you were tasked with expanding an existing Sitecore website with a Products database. Potentially, this database would be holding tens-of-thousands of products – at least if you are to believe the PowerPoint slides of sales projections the CEO presented last week ;-)
Now I KNOW what the first argument would be; “Don’t store in Sitecore. Sitecore is meant to build and store websites, and something as “businessey” as a Products Database has no place there”. I beg to differ however – as long as we’re not assuming there are ERP systems involved; we’re starting entirely from scratch.
I find, that actually, Sitecore is perfect for the job. Just in short summary, by using Sitecore as our data platform, we get (at the very least) the following handed to us on a silver platter:
- Flexible hierarchical storage structure
- Multi-lingual meta data for product descriptions and so on
- Built-in advanced media library and media handling
- Easily modelled data templates
- Standard stuff, like workflows, security and so on
- Can be edited and maintained using familiar tools
- Don’t overlook this one. If you place the data in “traditional” SQL tables – YOU are going to need to write an interface that creates, edits and maintains your product data
- WHAT are you going to say, when the customer asks for “advanced” stuff such as Workflows, Automatic Image Scaling / Thumbnail creation, granular (field based) security, Publishing functionality, Spell checking… ? Just naming a few here, but let’s not be blind to what Sitecore is offering out of the box
- What will it cost?
So just bear with me here. Am not saying that every case is a case for data going into Sitecore and “living” there. But what I am saying is, it’s not something that should be discarded as an option without further investigation. Like everything software, there are tradeoffs involved. Make sure you make the right trade.
Setting it up
Right. So let’s get started.
In my example here, I downloaded a fresh copy of Sitecore 090416 (ZIP archive of the web root, we’re all developers here. The Installer is for marketers ;-))
I’m going to be using SQL Server Express, so I get rid of the Oracle and SQL 2000 files. For my Products Database, I will be using the Sitecore “Master” database as a foundation, so I take a copy of the files and rename them like this:
And then I proceed to attach them:
And eventually end up with 4 databases attached, like this:
So far so good. I continue to set up an IIS site for this, and a local host header of “sc090416”. All of this you hopefully know all about, so I won’t go into detail with it here. The point of this post is not basic Sitecore installation – we’re all here to look at databases ;-)
A few things that you need to do, which you wouldn’t normally, is to configure our new Products Database in Sitecore. First, open up /Website/App_Config/ConnectionStrings.config and configure the extra database. It could look like this:
<?xml version="1.0" encoding="utf-8"?>
Sitecore connection strings.
All database connections for Sitecore are configured here.
<add name="core" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Core" />
<add name="master" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Master" />
<add name="web" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Web" />
<add name="products" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Products" />
Very straight forward, so far. But we’re not done yet. Open up Web.Config, look for the <databases> element, and find <!—master —>. For now, just copy the entire section – like this:
! <!-- products –>
! <database id="products" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
<dataProvider ref="dataProviders/main" param1="$(id)">
<sc.include file="/App_Config/Prefetch/Common.config" />
<sc.include file="/App_Config/Prefetch/Master.config" />
<proxyDataProvider ref="proxyDataProviders/main" param1="$(id)" />
<workflowProvider hint="defer" type="Sitecore.Workflows.Simple.WorkflowProvider, Sitecore.Kernel">
<param desc="history store" ref="workflowHistoryStores/main" param1="$(id)" />
<index path="indexes/index[@id='system']" />
<archive name="archive" />
<archive name="recyclebin" />
<obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
<param connectionStringName="$(id)" />
Right. The only changes I made to this copy, are marked on the lines with !. Essentially the only thing changing are references to “master” which now become “products”.
With this change, I am now ready to log into Sitecore for the first time and check that everything is in order.
So far, everything is looking good. Sitecore has recognised my new database. I can switch to it – and you know… it looks just like the “master” database ;-) At this point, this should not really be a surprise.
To further test things, I create a couple of content items. In the “master” database, I delete the /Home node, and create:
I then switch to the “products” database, and create a similar (yet different) folder.
Time to stop for a minute. Why did I delete /Home?
Well here’s the thing. The Home node that “master” is "born” with, so to speak, is just a placeholder really. At least that’s how I see it. Right now, my concern is, that if we leave the /Home node in both databases – we will have two items in two different databases, but sharing the same ID. What happens if you edit it in one database – should it overwrite changes done in the other? While pursuing this question could be fun – I don’t really think this is a scenario Sitecore will support and I frankly don’t know what would happen. At this point I don’t much care to find out either :P
I have my two new folders, and I do a publish. Now at this point, there are a couple of things you would be expecting to see. Upon switching to the “web” database to have a look, I think I can pretty much guarantee that whatever you were expecting, it wasn’t this:
Well ok. To be fair, maybe it was. But of all the things I personally expected when I first tried this, this was not the result I was hoping for and certainly not expecting ;-)
So what is happening here?
I guess, the most accurate answer would be, Sitecore isn’t really designed to work like this. While the concept of multiple databases IS certainly supported – you are supposed to use Proxy items to “merge” all of the data from “extra” databases (like our Products) into the main “master” database and then publish from there.
This doesn’t answer the question however, what IS happening?
Well I started investigating, and the first thing I looked into was the publishItem pipeline. Out of the box, it looks like this:
<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true" />
<processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">
And if going by names is enough (and it is), my suspicion instantly fell on RemoveUnknownChildren. A little work with Reflector quickly reveals what one of the main purposes of this item processor is.
It essentially gets a list of child item IDs in the “source” database and removes them if they are not present in the “destination” database.
This can be tested quickly enough. Switch to “master” – run a publish and check the result. Sure enough, our “Master Database” folder is now there, alone. Swithing to “products” and running a publish gives us a new result; now the “Master Database” folder is gone, but the “Products Database” folder is present.
Curious as I am, I proceeded to disable this processor, to see what happened.
<!--<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel" />-->
Voila. It looks good. At least on paper ;-)
While I am not completely comfortable with an intrusion such as this, disabling a system processor in the publishing pipeline, it at least allows me to move a bit forward on what I was aiming to achieve. If I dare to, that is.
In “master” I mock up a new template, and an item named Home, based on it:
And in Products, something similar.
And after publishing the respective databases, I get the (now) expected end result.
Pretty neat :-)
Alas however, as I mentioned above, having to modify web.config to achieve this kind of behaviour worries me. I can certainly see some advantages to this model, and I hope that at some point in the future, this will be an officially supported way to work with multiple databases. For now, the route we have to go, is via Proxy Items. They are not entirely bad either – that’s not it at all – but they seem (to me) a little less intuitive to use. Worst of all, however, they don’t hide from view the potentially thousands (see CEO presentation above) of content items being proxied in from the “products” database – I would personally prefer to be able to work like I just described here.
(In reality, there are lots of potential issues involved in this approach, and I can sort of see why Sitecore wouldn’t immediately support it)
But let’s proceed.
Configuring multiple databases using Sitecore Proxy Items
First thing I do is enable the RemoveUnknownChildren processor again. Now I’m back to a normal (and therefore supported) configuration.
First thing that needs to be done, is enabling proxies on the “master” database. Find it in web.config, and toggle the setting.
Then, in the Content Editor (“master” database), navigate to /sitecore/system/proxies – and add a new Proxy.
Most of the settings on the Proxy Item are fairly straight forward.
The “Source Item” field is a little bit tricky however. If you click, you get a navigation tree from your… “master” database. Not products, as one would hope. This is not news, I blogged about this in January 2006 – and the workaround is fortunately even simpler today than it was back then. I open up the View tab, switch on “Raw Values” and quickly paste my ID of the “Products Database” root folder into the field. After saving my Products Proxy, I can safely disable “Raw Values” again, and now I have:
Because of what appears to be a slight quirk in the Sitecore Content Editor interface, I disable and then re-enable proxies using the new option that has now appeared in my database selector.
Once done, my content tree looks like I expect:
Notice how the items coming in from the “products” database are shown in grey. This is a visual cue to the editor, that these items are “different” – in effect in this case not coming from the same database at all.
Running a publish also yields the same results – we are now back to where we were using my first approach.
Setting up a shortcut to Products
One of the last things you would probably want to do, is set up an application shortcut to your “products” database.
Fortunately, this is very easily achieved. Switch to the “core” database, and find /sitecore/content/documents and settings/all users/start menu/left/content editor – make a duplicate of it, and name your new item Product Editor.
Configure parameters like this:
Especially make note of the “Parameters” field; where I am instructing the Content Editor application to use the “products” database instead of the default database.
Switch back to “master”, and you now have an extra option available on your Start Menu.
And clicking the new “Product Editor” now take you directly to the “products” database, ready to edit. Since this application shortcut can be configured with security just like you would expect, you can therefore configure users who can ONLY work with the “products” and not mess around with the rest of your site.
When I set out writing this article, I had a few ideas in mind. I thought I had a “new creative” approach to handling multiple databases in Sitecore – but it turned out to be perhaps a little TOO creative ;-) The recommended approach is going via Proxy Items, and it seems like the safer way to go.
Regardless of method used, I still feel that Sitecore offers plenty of options of partitioning your data if the need arises. Performance-wise… well sure – I have no doubt you could produce a QUICKER (as in; performing faster) Product Catalogue working directly with SQL Server and Products/Categories and whatnot tables.
Just like you could absolutely create a QUICKER website, using only flat .html files ;-)
But there is a LOT to be gained by utilising the tools Sitecore makes available to us. Many of them were mentioned in the beginning of this article.
I, for one, do NOT relish the idea of having to create a full blown web based product administrative interface. Especially not late Friday afternoon. Anyone else? ;-)