Friday, April 17, 2009

Sitecore Wildcard Nodes

So here I was, doing a bit of late Friday afternoon catching up on the Sitecore Forums. I come across this post, in which a fairly common question is being asked, for which there are quite a number of possible solutions.

 

What got my attention was the reply by Kern, who mentions something called a “wildcard item”.

 

This is one of those humbling experiences. Having worked with Sitecore for over 3 years now, been involved directly and indirectly with a very large number of Sitecore Solution Partners, being certified, for a while had an office space right next to where certification training is being conducted in the UK; heck even going out for Friday night beers with one of the most experienced Sitecore Trainers in the UK….

 

And I’ve never ever heard of “wildcard items” before.

 

I asked around between everyone I could find online on MSN (with Sitecore experience), and although they all did offer various guesses – none knew the answer.

 

Best… Kept… Secret… Ever… LOL ;-)     Or maybe it just never came up.

 

But I find it so very useful, that I’m now going to do my bit to spreading the word.   Once again, it is proof that the really true gems in the Sitecore blogosphere tend to come from Lars Nielsen’s blog.

 

Sitecore – Avoiding query string in dynamic URL

 

I’ll just bring a quick snippet from the full article here, to give you some sort of idea of what it’s all about. Lars talks at length about how you can implement and use wildcard items in XSLT, and I find it in a way “drowns” the underlying real message – how you can use wildcard items in your day-to-day solutions.

 

Anyway, give it a look. And have a good weekend :-)

 

Snippet:

wildcard

Monday, April 13, 2009

Migrating data into Sitecore

One question I’ve noticed gets posed ever so often on the Sitecore Forums is; “How do I migrate data from my clients previous CMS into Sitecore?”  Or similar type questions, to more or less the same effect.  Also commonly asked on the same occasion is; “Does Sitecore provide a tool for this?”

 

Just to make sure we’re all on the same page here; I’m referring to the act of importing structured data (i.e. the clients existing website, sitting in any structured repository – CMS or otherwise – and getting that migrated into Sitecore for use in the Next Generation CMS system being developed by you).

 

This process is also known as Data Migration, and WikiPedia defines it as follows:

 

Data migration is the process of transferring data between storage types, formats, or computer systems. Data migration is usually performed programmatically to achieve an automated migration, freeing up human resources from tedious tasks. It is required when organizations or individuals change computer systems or upgrade to new systems, or when systems merge (such as when the organizations that use them undergo a merger/takeover).”

 

It further states:

 

“To achieve an effective data migration procedure, data on the old system is mapped to the new system providing a design for data extraction and data loading. The design relates old data formats to the new system's formats and requirements. Programmatic data migration may involve many phases but it minimally includes data extraction where data is read from the old system and data loading where data is written to the new system.”

 

And lastly:

 

“Changing application vendor - for instance a new CRM or ERP platform will inevitably involve substantial transformation as almost every application or suite operates on its own specific data model.”

 

If we boil all of this information down, we get something like “Data Migration is usually performed programmatically. To do it, a design for extracting data and loading data needs to be developed.”

 

And that’s where you come in.

 

We already have at least a partial answer to one of the above questions. Sitecore couldn’t really provide a tool that could achieve this. Data extraction would always be bespoke to the specific system being migrated from – so while you might be lucky enough to find something that would extract data from say a specific version EpiServer to a specific version of Sitecore; chances are you would still end up having to do quite a bit of the data design yourself to satisfy your business requirements.

 

That being said, I do believe Sitecore offers a few features to help you out.

 

For instance, there’s the Xml Importer.

    • It essentially works as a Transformation Engine. Given an XML input file (presumably produced by the system you intend to migrate from), you then write an XSLT file to transform the XML input file to a pre-defined format, which the Xml Importer will then use to create Sitecore Content Items.
    • This is probably only fine for the simplest of cases. In most migrations I have done, there has almost always been business requirements to perform more “advanced” features such as UrlMapping (setting up Url Redirection so that the existing links will work on the target Sitecore installation), Media Importing, category mapping to name a few. Not that you couldn’t achieve most of this using XSLT. I just think it will be very hard and unworkable.

 

There may or may not be similar tools available. I’ve never used them myself simply because in the vast majority of cases, a pre-built tool would not be able to achieve the requirements I was tasked to fulfil. The process of actually creating Sitecore Content Items is the least of my worries when it comes to migration.

 

While I can’t provide a specific solution in this post to whatever migration problem you may be facing, I will take you through a fictional scenario to help explain some of the points that needs to be addressed, and provide a few techniques that you may find useful.

 

For demonstration purposes, I will be importing the good old trusted Northwind database. Now before you start, I’m not saying this is neither a very good idea or anything similar – Northwind doesn’t hold web content and is therefore a very unlikely candidate for this type of migration. I have chosen it simply because many of you are already familiar with the structure of it, and it is freely available for anyone to download and experiment with.

 

Designing the model

To begin the process of migration, you first need to analyse the data model, and come up with an implementation that will work in your Sitecore solution.

 

In the example I’m going to cover, I’ve been tasked with the following:

 

“Migrate the products into a relevant categorised Sitecore Content Structure, making sure you preserve information about each supplier for the migrated products”.

 

After looking a bit at the products table definition, I conclude the following:

 

products

  • Each product has only 1 supplier
  • Each product is only defined in 1 category

 

This is fortunate, as it will make the migration a fairly simple task. More often than not, things will not be this well defined and simple.

 

So in terms of a Sitecore Content Structure, this could look like this:

 

  • /sitecore/content/Home
    • /sitecore/content/Home/Products
      • /sitecore/content/Home/Products/CategoryA
      • /sitecore/content/Home/Products/CategoryB
      • … and so on
    • /sitecore/content/Home/Suppliers
      • /sitecore/content/Home/Suppliers/SupplierA
      • /sitecore/content/Home/Suppliers/SupplierB
      • … and so on

 

You could of course map this in many different ways. A few of the things you should be considering when designing your structure are:

 

  • How many Sitecore Items will be created on each branch? 
    • Sitecore recommends that no more than 100 Items are created on any one branch level.
  • How will the existing meta-structure be preserved?  
    • In this case, a products category is implicitly derived from the Category node it sits under. Alternatively, as I will need to do with suppliers, you could create references from your data to point to the relevant meta data.
  • Will any filtering or data transformation need to take place?
    • Often, there are old data in the source system that you might as well filter out when doing the migration. This could be data prior to a certain date (i.e. only migrate the last 3 years worth of data) or certain categories that will not be migrated to the new system.
    • Are there inline links that you will need to handle?   As Northwind doesn’t hold web content, this is not relevant to this example. But in most cases, you will need to RegEx through content fields and rewrite Internal Links to point to the new locations of your content within Sitecore. This topic is complex enough to deserve an entire post of it’s own, so I won’t be going into detail with this here.

 

Having defined this structure, I now explore the relevant Database Schemas, and come up with the following 3 Sitecore Templates that will eventually hold my migrated data. In this case I started on a blank Sitecore 6 installation (090212, but that isn’t really relevant – this migration process would not differ significantly between Sitecore 5 and Sitecore 6).

 

Product:

product

 

Category:

category

 

Supplier:

supplier

 

I also create the basic skeleton structure I defined above, and run a Smart Publish. I’ll explain why, in just a bit.

 

Skeleton structure:

structure

 

And I am now ready to proceed to the next stage.

Extracting the data

When it comes to extracting the Northwind Data, fortunately this is almost a textbook example. The data is nice and clean, referential integrity is in place, and .NET makes it very straightforward to connect to my underlying MS SQL Server to reach the relevant tables. I know there are many boilerplate examples out there of how to easily access Northwind using LINQ and similar technologies, but since the data extraction will be so simple in this case I don’t really see a need for invoking any of them.

 

One toolkit I will use however, is my CorePoint.DomainObjects. Don’t see this as a shameless plug; one of the main reasons I developed the toolkit was to help in my many migration projects and it just makes the task ahead so much easier.

 

I set up a standard Sitecore project, and quickly mock up a few entities:

 

project

 

Product.cs:

using System;
using System.Data;
using CorePoint.DomainObjects.SC;
using CorePoint.DomainObjects;

namespace Website.Migration
{
    [Template("user defined/product")]
    public class Product : StandardTemplate
    {
        // A few properties that will be needed to map out the Sitecore hierarchy
        public int ProductId { get; set; }
        public int SupplierId { get; set; }
        public int CategoryId { get; set; }

        [Field("product name")]         public string ProductName { get; set; }
        [Field("quantity per unit")]    public string QuantityPerUnit { get; set; }
        [Field("unit price")]           public double UnitPrice { get; set; }
        [Field("units in stock")]       public int UnitsInStock { get; set; }
        [Field("units on order")]       public int UnitsOnOrder { get; set; }
        [Field("reorder level")]        public int ReorderLevel { get; set; }
        [Field("discontinued")]         public bool Discontinued { get; set; }
        [Field("supplier")]             public Guid Supplier { get; set; }

        public Product()
        {
        }

        public Product( DataRow row )
        {
            // For brewity, null values are not checked for. There are none in the 
            // Northwind products table
            ProductName = Convert.ToString( row[ "ProductName" ] );
            SupplierId = Convert.ToInt32( row[ "SupplierID" ] );
            CategoryId = Convert.ToInt32( row[ "CategoryID" ] );
            QuantityPerUnit = Convert.ToString( row[ "QuantityPerUnit" ] );
            UnitPrice = Convert.ToDouble( row[ "UnitPrice" ] );
            UnitsInStock = Convert.ToInt32( row[ "UnitsInStock" ] );
            UnitsOnOrder = Convert.ToInt32( row[ "UnitsOnOrder" ] );
            ReorderLevel = Convert.ToInt32( row[ "ReorderLevel" ] );
            Discontinued = Convert.ToBoolean( row[ "Discontinued" ] );
        }
    }
}

Category.cs:

using System;
using System.Data;
using CorePoint.DomainObjects.SC;
using CorePoint.DomainObjects;

namespace Website.Migration
{
    [Template("user defined/category")]
    public class Category : StandardTemplate
    {
        public int CategoryId { get; set; }

        [Field("category name")]    public string CategoryName { get; set; }
        [Field("description")]      public string Description { get; set; }

        public Category()
        {
        }

        public Category( DataRow row )
        {
            CategoryId = Convert.ToInt32( row[ "CategoryID" ] );
            CategoryName = Convert.ToString( row[ "CategoryName" ] );
            Description = Convert.ToString( row[ "Description" ] );
        }
    }
}

Supplier.cs:

using System;
using System.Data;
using CorePoint.DomainObjects.SC;
using CorePoint.DomainObjects;

namespace Website.Migration
{
    [Template("user defined/supplier")]
    public class Supplier : StandardTemplate
    {
        public int SupplierId { get; set; }

        [Field("company name")]        public string CompanyName { get; set; }

        // For brewity, the rest of the fields are omitted, as
        // they're not really needed to demonstrate the migration

        public Supplier()
        {
        }

        public Supplier( DataRow row )
        {
            SupplierId = Convert.ToInt32( row[ "SupplierID" ] );
            CompanyName = Convert.ToString( row[ "CompanyName" ] );
        }
    }
}

 

And now we’re ready to get going. As you can see, I cheated a bit on Supplier.cs. This article is long enough, without me listing out 15 or so similar fields that don’t really add value to the demonstration / example itself.

 

I also created a completely standard .ASPX page under the root of the site, DoMigrate.aspx. It is in this page I will be executing the migration itself.

 

I start by adding some code that will load up all of the required source data into memory. When developing a migration, there are a few things to keep in mind:

 

  • Your code is more or less fire and forget. It will not run in a live environment, it will not need to be maintained by yourself or your peers for an extended period of time (normally) and therefore many of the rules and guidelines that apply to developing web applications don’t really apply here.
  • For performance reasons, load up as much as you possibly can into RAM. Run 64 bit if you can, and don’t worry if your migration script is eating up gigabytes of memory. If your migration mapping is complex (this example isn’t), you will appreciate the performance gain from having in-memory tables and lookups, and can in many cases mean a difference of 4 hours versus 4 minutes of execution time.
  • Don’t expect to get it spot-on the first time. You will be running the script many times over.

 

I add a bit of code to the Page_Load method:

protected void Page_Load( object sender, EventArgs e )
{
    // First we start by loading all source data into memory.

    string connectionString = 
        @"user id=sa;password=removed;Data Source=.\SQLEXPRESS;Database=Northwind";

    DataTable products = new DataTable();
    DataTable suppliers = new DataTable();
    DataTable categories = new DataTable();

    using ( var conn = new SqlConnection( connectionString ) )
    {
        using ( var da = new SqlDataAdapter( "SELECT * FROM Products", conn ) )
        {
            da.Fill( products );
        }

        using ( var da = new SqlDataAdapter( "SELECT * FROM Categories", conn ) )
        {
            da.Fill( categories );
        }

        using ( var da = new SqlDataAdapter( "SELECT * FROM Suppliers", conn ) )
        {
            da.Fill( suppliers );
        }
    }
}

 

As you can see, no surprises there. I am basically filling up 3 Datatables with information from the Northwind database.

 

I add a little code to load up all the data in the entities I created above, like this:

List<Product> ProductList = new List<Product>();
List<Category> CategoryList = new List<Category>();
List<Supplier> SupplierList = new List<Supplier>();

foreach( DataRow row in products.Rows )
    ProductList.Add( new Product( row ) );
foreach ( DataRow row in categories.Rows )
    CategoryList.Add( new Category( row ) );
foreach ( DataRow row in suppliers.Rows )
    SupplierList.Add( new Supplier( row ) );

Response.Write( string.Format( "{0} products, {1} categories, {2} suppliers<br />",
                ProductList.Count,
                CategoryList.Count,
                SupplierList.Count ) );

 

And for the first time, I run my migration script-to-be.

 

“77 products, 8 categories, 29 suppliers”

 

This is a good time to sanity check your code and make sure the numbers you are getting are more or less what you expect. Comparing with the rowcounts of the source database, I conclude that everything appears to be in order, and continue. It is now time to create some Sitecore Items based on this data.

 

Before I do that, there are a few things to point out:

 

  • The order which you create the Sitecore items in, is not irrelevant. In our case here, Products refer Suppliers and sit beneath Categories. Therefore the first Sitecore Items you create must be Suppliers (or Categories, in this case that won’t matter) followed by Categories and finally followed by Products.
  • I always run test migrations against the “web” database. This has a few advantages:
    • It runs faster (I think…  it should, since there is no publishing queue handling)
    • It allows you to easily undo a failed migration. Just publish your skeleton again, and the whole thing resets back to start

 

I add some code to create Sitecore Content. Don’t worry, we’re almost done now :-)

// Set up a director for "web", standard language and security disabled
SCDirector director = new SCDirector( "web", "en", true );
var suppliersRoot = director.GetObjectByIdentifier( "/sitecore/content/home/suppliers" );
var productsRoot = director.GetObjectByIdentifier( "/sitecore/content/home/products" );

// And with these references in place, it's time to create our Sitecore Content Items

SupplierList.ForEach( delegate( Supplier s )
{
    s.Name = s.CompanyName;        // A name for the Sitecore Item (duh.. :P)
    s.ParentId = suppliersRoot.Id;
    s.Director = director;
    s.Store();
} );

CategoryList.ForEach( delegate( Category c )
{
    // First create the category
    c.Name = c.CategoryName;
    c.ParentId = productsRoot.Id;
    c.Director = director;
    c.Store();

    // And now create all products under the category
    ProductList.FindAll( p => p.CategoryId == c.CategoryId ).ForEach( delegate( Product p )
    {
        p.Name = p.ProductName;
        p.ParentId = c.Id; // .Id was assigned when the category was stored
        p.Director = director;

        // Pay attention to this one. I now map the Supplier Id field with the
        // Sitecore ID of the related supplier (created above)
        p.Supplier = SupplierList.Find( s => s.SupplierId == p.SupplierId ).Id;
        p.Store();
    } );
} );

Response.Write( "All done!<br />" );

 

And here we go. The data is now migrated. On my overworked laptop, this code executes in just a couple of seconds. The results look like this:

 

migration

 

And you’re all done.

 

A few last things to consider are this:

 

  • If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
  • Use BulkUpdateContext()
  • Don’t forget your target language
  • If you can, make the fields shared and unversioned. This should help migration execution speed.
  • And probably most importantly: Most of your work will almost always be during analysis. Figuring out
    • How to get to the source data
    • Determining what data needs to be migrated
    • Designing a reasonable way to store the data in Sitecore

 

I don’t know of many shortcuts. This is very rarely as easy a task as this one was. I know there are standards in the making, for how CMS systems could store (or at least interchange data), but I don’t know if those are realistic and (more importantly) will be widely adopted by the various CMS vendors.

 

I hope you found this post useful. Probably my longest post ever. Comments and feedback welcomed :-)

Wednesday, April 08, 2009

DomainObjects Update 3 released

Yet another update was just released to the Sitecore Shared Source space.

 

Before I go into detail, let me just pre-warn that this is the first blog post on this blog where I’m using Windows Live Writer (thanks to John West for pointing my attention towards this tool) – and in case everything goes all wrong, this would (or could) be why ;-)

 

So what’s new?

 

  • A bug was fixed, where creating multiple language versions of the same DomainObject wouldn’t get correctly placed in Sitecore. Additional Language versions would get added to the current Context Language instead.
  • Support for Sitecore Query was added. I am a little bit torn on this functionality and whether it “fits” with the DomainObjects mindset or not. However – as there is little in way of alternatives right now – I decided to include it.
  • Support for Sitecore Media Items is now part of the package. As most of you will know, Media Items in Sitecore are (almost) like any other Items, yet there are subtle differences. Getting the Url of a Media Item for instance, takes you via the MediaManager instead of the LinkManager. With this implementation, users of the DomainObject are shielded from such implementation details.

 

How to use it

 

Probably best shown with a few example from the Unit Tests written for this release:

StandardTemplate item = director.QuerySingleObject( "/sitecore/content//*[@IntegerTestField='10']" );
Assert.IsNotNull( item );
Assert.IsTrue( item is TestTemplate );

item = director.QuerySingleObject<TestTemplate>( "/sitecore/content//*[@IntegerTestField='10']" );
Assert.IsNotNull( item );

item = director.QuerySingleObject<StandardTemplate>( "/sitecore/content//*[@IntegerTestField='10']" );
Assert.IsNotNull( item );

item = director.QuerySingleObject<StandardTemplate>( "/sitecore/content/home" );
Assert.IsNotNull( item );

item = director.QuerySingleObject<TestTemplate>( "/sitecore/content/home" );
Assert.IsNull( item );

 

There is, as one would probably expect, a similar method for querying multiple objects. Query gets passed straight through to the Sitecore API, so same rules apply as would any other Sitecore Query.

 

As for Media Items, the following classes have been added – all mapping to equivalent templates in Sitecore:

 

  • Audio, VersionedAudio
  • Doc, VersionedDoc
  • Document, VersionedDocument
  • Docx, VersionedDocx
  • File, VersionedFile
  • Flash, VersionedFlash
  • Image, VersionedImage
  • Jpeg, VersionedJpeg
  • MediaFolder
  • Movie, VersionedMovie
  • MP3, VersionedMP3
  • PDF, VersionedPDF
  • Zip, VersionedZip

 

They all inherit from StandardTemplate and mostly only differ from the fact that their .Url property is gained in a different manner.

 

Creating a new Media Item therefore is no different than any other DomainObject, except that the media file itself needs to be attached. Something like this:

SCDirector director = new SCDirector();
Jpeg logoFile = new Jpeg();
logoFile.CreateFromFilePath = @"C:\Logo.jpg";
logoFile.Title = "New Company Logo";
logoFile.Alt = "Company Logo";
logoFile.Description = "The new logo we received from Big Media Inc";
logoFile.ParentId = director.GetObjectByIdentifier( "/sitecore/media library" ).Id;
logoFile.Name = "Logo";
logoFile.Director = director;
logoFile.Store();

 

That’s it. Enjoy :-)

Wednesday, April 01, 2009

Old and trusted technology...

A friend of mine made me aware of an ongoing initiative today. It's one of those "of course. Why didn't I think of that?" things that are found in many corners around the Web, but of which only few gets truly wide acceptance.

I'll do my bit to promote this initiative however.

"Back in the day", when I was most heavily involved in what I term "Web Client Development" or "Front End Development", IE 4.0 was "the thing". Having blown the competition out of the water (by what means, is not a discussion I get much into), finally we were seeing the full potential of the web client get unlocked. Fancy javascript, alphaFilter direct X attributes, old fashioned TABLE layouts - and best of all, there simply WERE no other browsers on the market with any significant market share. Cross Browser testing was something you'd specify as an item on any proposal, and make sure the price tag was high enough that any client would opt out.

Then along came a few... mishaps. IE 5.0 and 5.5 - none of them ever really took off. Firefox was released, and ol' Microsoft HAD to come up with something to compete again. And not a moment too soon. The answer came, in form of Internet Explorer 6.0. Or IE6 as it is lovingly coined today. According to my statistics on this blog, it is still being used by more than 35% of all readers of this blog - making it one of the largest players in the Browser Marketplace still. Not bad, considering it has been around since 2001. 8 years. What's that... two Internet Lifetimes?

Of course, in THIS day and age, everybody wants a piece of IE6 sunshine. New versions of Firefox has been released, little brothers were spawned in form of IE7 and now IE8, and even Google Chrome has entered the arena. This is not even mentioning the preferred Safari browser on the Mac platform, or perhaps even Opera.

But I say, forget all that. IE6 is a proven trusty workhorse. It will render more or less ANYTHING, and not be concerned with making your life miserable with words such as "Compliance", "Standards", "Reliability". It won't make your users confused with such advanced concepts as "Tabs". It has more security related updates than perhaps any other product in the world, and therefore MUST be the most secure product in the world.

And best of all, everyone working in IT Operations don't have to figure out how to work the Internet Explorer Administration Kit to install a later version. IE6 is already there, it's free of charge, it's integrated in the platform. What more could anyone want?

I fully back this initiative. Save IE6. Everything else just adds to an already complex environment, and trying to conform to every Tom, Dick and Harry browser product around is just a waste of everyone's time.

So say we all!

(at least today, we do ;-))

 

EDIT: As some of you guessed (I would have thought all, but that was not the case), this post was inspired by all the April Fool pages being posted everywhere that day – where SaveIE6 is among them. Now back to more serious matters… ;-)