.NET and me Coding dreams since 1998!

14Apr/120

One BIG problem with Azure Tables

Migrating SQL Azure to Azure Tables - GUID gotcha

Windows Azure rocks! I am so impressed with the power it gives me with the price I can afford that I started porting the codebase I work on at home in my spare time.

So far, I’ve connected my clients to Azure Service Bus Topics (works great), created my own custom app SkyDrive using Azure Blob storage (works great) and today I tried to port my SQL Server databases to Azure Table Storage.

3 main reasons when to pick Azure Tables – fulfilled

The reason why I decided to port my data is that in a way I was using already my SQL Server DB as it is table:

  • My primary key of the table is sequential Guid (in order to remove performance issue caused by normal guid PKs)
    RowKey checked
  • In some of the tables I have a column OwnerID which I already use to horizontally partition my data –
    PartitionKey checked
  • I do all of the joins etc.  on client side and perform only two select queries: SelectByPK and SelectNewerPKs.This second select gets from client input parameter anchor identity value and returns all of the rows whose RowKey is greater then a given value. I use it to get the change set / data deltas which I need to sync from server to client DB in order to update client DB with the new data existing in the cloud.( get me all of the rows which PK is greater then the given value)

No need to bother you with more details regarding my database, II guess it is clear just based on these 3 things how good match my DB structure is for moving to Azure Tables.

Why Azure tables?

Quite simple: price and scalability.

SQL Azure is very affordable (especially after the last price cut) so in order to get 5 Gb DB you pay only 25 USD/month which is really nothing. Still, if your app architecture doesn’t use SQL server relational capabilities and relies primarily on clients and “PK selects”  (as mine does) then Azure tables can be used and the price for 5 Gb storage is 0.625 USD/month. Let me repeat that one more time: less then one USD per month would cost me 5 Gb Table storage space. Completely and utterly insanely awesome!

Scalability is the same story as with the price. While SQL Azure is making big steps to enable sharding scenarios with SQL Azure Federations, tables are having partitioning built in from “day 1” as a fundamental design principle and allows scaling datasets which size measures in TBs. Awesome!

So what is the problem then with Azure Tables?

Problem is that Azure Tables uses Comparer<Guid> (.NET approach) and not  the Comparer<SqlGuid> (SqlServer and SqlCompact approach).

In other words

"SQL Server and Azure Tables are sorting rows differentlly

when RowKey is unique idenitifier (GUID)

which leads to the row shuffling"

 

Let me explain it in one simple example….

Let say we have next dummy table in SQL Server database

image

And let say this table has two rows:

  1. ID: 7240963F-D384-4D78-BADF-A03300F678CC, Name:  First
  2. ID: 15DF6719-9671-43D3-BAFA-A03300F678CD, Name:  Second

If I would execute next query on Sql Server and Sql Compact (databases I support on local client boxes

SELECT * FROM GuidTest ORDER BY ID

I get (not surprising) next results

image

Now, I insert the same data to Azure Table (using Neudisk Azure Storage Explorer is one easy way to do that) and run the same query I get opposite results where row second in sql server is first in Azure table storage.

image

Why it happens?

.NET and Sql Server are having comparing guid values differently – here’s a simple illustration

using System;
using System.Data.SqlTypes;
namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            // .NET guid
            Guid first = Guid.Parse("7240963F-D384-4D78-BADF-A03300F678CC");
            Guid second = Guid.Parse("15DF6719-9671-43D3-BAFA-A03300F678CD");

            // .SQL server guid
            SqlGuid firstSql = new SqlGuid(first);
            SqlGuid secondSql = new SqlGuid(second);

            Console.WriteLine(".NET compare:" + first.CompareTo(second));
            Console.WriteLine("SQL compare:" + firstSql.CompareTo(secondSql));

            Console.ReadKey();
        }
    }
}

And here’s the result illustrating the difference between .NET and Sql Server.

image

What now?

In my case, I am probably going to drop Azure Tables and use SQL Azure because I have a single code shared between the cloud and clients where it does the SelectNewer on SQL clients so the different results would break the cloud data sync code I have.

The other option I consider is to have a custom code for Azure Tables maybe utilizing the mandatory timestamp values but it is probably going to end as too complicated.

Real bummer Azure team choose a different path to follow then the SQL Server one - It was too god to be true

Filed under: Azure Leave a comment
Comments (0) Trackbacks (0)

No comments yet.


Leave a comment

No trackbacks yet.