Friday, May 08, 2009

Working with multiple content databases in Sitecore 6

One of the very neat things about Sitecore, is the way the architecture allows you to mould, shape, and work with the configuration files to come up with an implementation that suits your purpose.


As the title of this post will suggest, I will be taking a look at Sitecore databases in this post; and how you are free to work with as many of them as you see fit in your projects.


For sake of argument, let’s say that you were tasked with expanding an existing Sitecore website with a Products database. Potentially, this database would be holding tens-of-thousands of products – at least if you are to believe the PowerPoint slides of sales projections the CEO presented last week ;-)


Now I KNOW what the first argument would be; “Don’t store in Sitecore. Sitecore is meant to build and store websites, and something as “businessey” as a Products Database has no place there”. I beg to differ however – as long as we’re not assuming there are ERP systems involved; we’re starting entirely from scratch.


I find, that actually, Sitecore is perfect for the job. Just in short summary, by using Sitecore as our data platform, we get (at the very least) the following handed to us on a silver platter:


  • Flexible hierarchical storage structure
  • Multi-lingual meta data for product descriptions and so on
  • Built-in advanced media library and media handling
  • Easily modelled data templates
  • Standard stuff, like workflows, security and so on
  • Can be edited and maintained using familiar tools
    • Don’t overlook this one. If you place the data in “traditional” SQL tables – YOU are going to need to write an interface that creates, edits and maintains your product data
    • WHAT are you going to say, when the customer asks for “advanced” stuff such as Workflows, Automatic Image Scaling / Thumbnail creation, granular (field based) security, Publishing functionality, Spell checking… ?   Just naming a few here, but let’s not be blind to what Sitecore is offering out of the box
    • What will it cost?


So just bear with me here. Am not saying that every case is a case for data going into Sitecore and “living” there. But what I am saying is, it’s not something that should be discarded as an option without further investigation. Like everything software, there are tradeoffs involved. Make sure you make the right trade.


Setting it up

Right. So let’s get started.


In my example here, I downloaded a fresh copy of Sitecore 090416 (ZIP archive of the web root, we’re all developers here. The Installer is for marketers ;-))


I’m going to be using SQL Server Express, so I get rid of the Oracle and SQL 2000 files. For my Products Database, I will be using the Sitecore “Master” database as a foundation, so I take a copy of the files and rename them like this:




And then I proceed to attach them:




And eventually end up with 4 databases attached, like this:




So far so good. I continue to set up an IIS site for this, and a local host header of “sc090416”. All of this you hopefully know all about, so I won’t go into detail with it here. The point of this post is not basic Sitecore installation – we’re all here to look at databases ;-)


A few things that you need to do, which you wouldn’t normally, is to configure our new Products Database in Sitecore. First, open up /Website/App_Config/ConnectionStrings.config and configure the extra database. It could look like this:

xml version="1.0" encoding="utf-8"?>
Sitecore connection strings.
    All database connections for Sitecore are configured here.
add name="core" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Core" />
add name="master" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Master" />
add name="web" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Web" />

add name="products" connectionString="user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=sc090416_Products" />

Very straight forward, so far. But we’re not done yet. Open up Web.Config, look for the <databases> element, and find <!—master —>. For now, just copy the entire section – like this:

!      <!-- products>
!      <database id="products" singleInstance="true" type="Sitecore.Data.Database, Sitecore.Kernel">
        <param desc="name">$(id)</param>
        <dataProviders hint="list:AddDataProvider">
          <dataProvider ref="dataProviders/main" param1="$(id)">
            <prefetch hint="raw:AddPrefetch">
              <sc.include file="/App_Config/Prefetch/Common.config" />
              <sc.include file="/App_Config/Prefetch/Master.config" />
        <proxyDataProvider ref="proxyDataProviders/main" param1="$(id)" />
        <workflowProvider hint="defer" type="Sitecore.Workflows.Simple.WorkflowProvider, Sitecore.Kernel">
          <param desc="database">$(id)</param>
          <param desc="history store" ref="workflowHistoryStores/main" param1="$(id)" />
        <indexes hint="list:AddIndex">
          <index path="indexes/index[@id='system']" />
        <archives hint="raw:AddArchive">
          <archive name="archive" />
          <archive name="recyclebin" />
          <obj type="Sitecore.Data.$(database).$(database)HistoryStorage, Sitecore.Kernel">
            <param connectionStringName="$(id)" />
        <cacheSizes hint="setting">


Right. The only changes I made to this copy, are marked on the lines with !. Essentially the only thing changing are references to “master” which now become “products”.


With this change, I am now ready to log into Sitecore for the first time and check that everything is in order.




So far, everything is looking good. Sitecore has recognised my new database. I can switch to it – and you know…  it looks just like the “master” database ;-)   At this point, this should not really be a surprise.


Testing it


To further test things, I create a couple of content items. In the “master” database, I delete the /Home node, and create:




I then switch to the “products” database, and create a similar (yet different) folder.




Time to stop for a minute. Why did I delete /Home?


Well here’s the thing. The Home node that “master” is "born” with, so to speak, is just a placeholder really. At least that’s how I see it. Right now, my concern is, that if we leave the /Home node in both databases – we will have two items in two different databases, but sharing the same ID. What happens if you edit it in one database – should it overwrite changes done in the other?  While pursuing this question could be fun – I don’t really think this is a scenario Sitecore will support and I frankly don’t know what would happen. At this point I don’t much care to find out either :P


So anyway.


I have my two new folders, and I do a publish. Now at this point, there are a couple of things you would be expecting to see. Upon switching to the “web” database to have a look, I think I can pretty much guarantee that whatever you were expecting, it wasn’t this:




Well ok. To be fair, maybe it was. But of all the things I personally expected when I first tried this, this was not the result I was hoping for and certainly not expecting ;-)


So what is happening here?


I guess, the most accurate answer would be, Sitecore isn’t really designed to work like this. While the concept of multiple databases IS certainly supported – you are supposed to use Proxy items to “merge” all of the data from “extra” databases (like our Products) into the main “master” database and then publish from there.


This doesn’t answer the question however, what IS happening?


Well I started investigating, and the first thing I looked into was the publishItem pipeline. Out of the box, it looks like this:

<publishItem help="Processors should derive from Sitecore.Publishing.Pipelines.PublishItem.PublishItemProcessor">
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessingEvent, Sitecore.Kernel" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckVirtualItem, Sitecore.Kernel" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.CheckSecurity, Sitecore.Kernel" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.DetermineAction, Sitecore.Kernel" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.PerformAction, Sitecore.Kernel" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.MoveItems, Sitecore.Kernel" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.RaiseProcessedEvent, Sitecore.Kernel" runIfAborted="true" />
  <processor type="Sitecore.Publishing.Pipelines.PublishItem.UpdateStatistics, Sitecore.Kernel" runIfAborted="true">


And if going by names is enough (and it is), my suspicion instantly fell on RemoveUnknownChildren. A little work with Reflector quickly reveals what one of the main purposes of this item processor is.


It essentially gets a list of child item IDs in the “source” database and removes them if they are not present in the “destination” database.


This can be tested quickly enough. Switch to “master” – run a publish and check the result. Sure enough, our “Master Database” folder is now there, alone. Swithing to “products” and running a publish gives us a new result; now the “Master Database” folder is gone, but the “Products Database” folder is present.


Curious as I am, I proceeded to disable this processor, to see what happened.

<!--<processor type="Sitecore.Publishing.Pipelines.PublishItem.RemoveUnknownChildren, Sitecore.Kernel" />-->





Voila. It looks good. At least on paper ;-)


While I am not completely comfortable with an intrusion such as this, disabling a system processor in the publishing pipeline, it at least allows me to move a bit forward on what I was aiming to achieve. If I dare to, that is.


In “master” I mock up a new template, and an item named Home, based on it:




And in Products, something similar.




And after publishing the respective databases, I get the (now) expected end result.




Pretty neat :-)


Alas however, as I mentioned above, having to modify web.config to achieve this kind of behaviour worries me. I can certainly see some advantages to this model, and I hope that at some point in the future, this will be an officially supported way to work with multiple databases. For now, the route we have to go, is via Proxy Items. They are not entirely bad either – that’s not it at all – but they seem (to me) a little less intuitive to use. Worst of all, however, they don’t hide from view the potentially thousands (see CEO presentation above) of content items being proxied in from the “products” database – I would personally prefer to be able to work like I just described here.


(In reality, there are lots of potential issues involved in this approach, and I can sort of see why Sitecore wouldn’t immediately support it)


But let’s proceed.


Configuring multiple databases using Sitecore Proxy Items

First thing I do is enable the RemoveUnknownChildren processor again. Now I’m back to a normal (and therefore supported) configuration.


First thing that needs to be done, is enabling proxies on the “master” database.  Find it in web.config, and toggle the setting.


Then, in the Content Editor (“master” database), navigate to /sitecore/system/proxies – and add a new Proxy.


Most of the settings on the Proxy Item are fairly straight forward.




The “Source Item” field is a little bit tricky however. If you click, you get a navigation tree from your… “master” database. Not products, as one would hope. This is not news, I blogged about this in January 2006 – and the workaround is fortunately even simpler today than it was back then. I open up the View tab, switch on “Raw Values” and quickly paste my ID of the “Products Database” root folder into the field. After saving my Products Proxy, I can safely disable “Raw Values” again, and now I have:




Because of what appears to be a slight quirk in the Sitecore Content Editor interface, I disable and then re-enable proxies using the new option that has now appeared in my database selector.




Once done, my content tree looks like I expect:




Notice how the items coming in from the “products” database are shown in grey. This is a visual cue to the editor, that these items are “different” – in effect in this case not coming from the same database at all.


Running a publish also yields the same results – we are now back to where we were using my first approach.


Setting up a shortcut to Products

One of the last things you would probably want to do, is set up an application shortcut to your “products” database.


Fortunately, this is very easily achieved. Switch to the “core” database, and find /sitecore/content/documents and settings/all users/start menu/left/content editor – make a duplicate of it, and name your new item Product Editor.


Configure parameters like this:




Especially make note of the “Parameters” field; where I am instructing the Content Editor application to use the “products” database instead of the default database.


Switch back to “master”, and you now have an extra option available on your Start Menu.




And clicking the new “Product Editor” now take you directly to the “products” database, ready to edit.  Since this application shortcut can be configured with security just like you would expect, you can therefore configure users who can ONLY work with the “products” and not mess around with the rest of your site.


In summary

When I set out writing this article, I had a few ideas in mind. I thought I had a “new creative” approach to handling multiple databases in Sitecore – but it turned out to be perhaps a little TOO creative ;-)   The recommended approach is going via Proxy Items, and it seems like the safer way to go.


Regardless of method used, I still feel that Sitecore offers plenty of options of partitioning your data if the need arises. Performance-wise… well sure – I have no doubt you could produce a QUICKER (as in; performing faster) Product Catalogue working directly with SQL Server and Products/Categories and whatnot tables.


Just like you could absolutely create a QUICKER website, using only flat .html files ;-)


But there is a LOT to be gained by utilising the tools Sitecore makes available to us. Many of them were mentioned in the beginning of this article.


I, for one, do NOT relish the idea of having to create a full blown web based product administrative interface. Especially not late Friday afternoon. Anyone else? ;-)


Jens Mikkelsen said...

Very interesting Mark. I certainly agree that you gain a lot by using Sitecore data, so you keep all the given functionality from Sitecore.

Normally I go for a custom non-sitecore database when:
* There is way to much data
* The data is of a relational structur. Sitecore just isn't geared for this.

We have considered using a custom Sitecore database several times, and we wondered why Sitecore didn't provide an "empty" sitecore database. But I guess it is because, as you said, that this isn't fully supported by Sitecore.

Imagine these two situations:
1. If you need a complete content separated multisite solution, where the different sites shouldn't share content or settings in any way. It would be very simpel to implement this way. It would be even more cool, if they could share templates from a "master" database, making it easy to maintain.
2. When you have "live" content from visitors. Instead of publishing usercreated content, you could have one custom database, which runs in live mode where no publish is neede.

Lars Nielsen said...

Great that somebody blogs about this topic!

I have recently been considering if we as Sitecore should change our best practices in regards to not storing content in the Sitecore databases.

Your blog post points out very obvious benefits of using Sitecore as data repository for custom data.

The reason that we didn't wanted site owners to store content was that most would do so by mixing master data and customer enriched data. I firmly believe you should separate user generated data and content, - and would at any time recommend exactly what you propose: Use a separate database.

Now, off course, there may be challenges to do so if you use Sitecore staging servers, unless the data islands have access to the master data storage (this can be solved with SQL replication), but as you also point out, you get a lot of benefits using Sitecore as the data storage.

I was thinking, - why not create an web config include with the "customdatabase" settings. All you would need to do then is to attach another Sitecore database, and use the web include...

commodore73 said...

Maybe I need to re-read a little closer, but I don't understand why you use another database instead of just storing the products in master. At least, I assume the same people maintain all of them, and those people aren't admins (can't see the database icon in the desktop, etc.).

Mark Cassidy said...

I think cloning databases in the manner I did here is in fact fully supported by Sitecore. The unsupported aspect would be my attempt to publish from multiple databases into a single joint "web" db - that was just a theory I had that I was pursuing.

And it is also very true, there are many scenarios that happen all the time in "real life" - live servers, user created content, multiple content editors, and so on - all of which will affect your setup. My main hope with posting about this was to kick up a bit of debate on the matter - since the predominant assumption was leaning towards "always go bespoke" for data and not putting it in Sitecore.

What really got me thinking a lot about these things was, a site I was involved in about a year and a half ago. They had some 36.000 news articles in their Sitecore solution. This raises some interesting questions - because I am sure noone would argue that Sitecore SHOULD absolutely cater for systems that have this amount of news articles online.

1) How much data is "way too much"?
2) Is relational data really so bad?

Sitecore is inherently a relational system, even if it IS primarily a parent/child hierarchy. With a bit of "clever" coding however, mapping out cross indexes and true N..M relationships is actually not that difficult - and the Link Database helps us retrieve these indexes almost (!) as effeciently as if we had gone completely bespoke.

Again, however, am not saying "do it always". But maybe "do it more" ;-)

I'd argue that giving up a little performance to gain a LOT of functionality is a good trade on most days.

John said...

Do you think SQL Express is adaquate for a production implamentation?

Mark Cassidy said...

That is a very broad question, and I'd say it depends largely on the load the production environment will be under.

There may also be licensing issues in regards to the SQL Express, I haven't actually looked much into this.

Anonymous said...

thanks for the article ,it made me happy :)

Dick Gennissen said...

Very interesting, how did you resolve the previewmode?

In general we have found that the preview-mode uses the Master database as the default preview context database. We would like to change this, but have 2 master databases.

Any ideas?