Last spring, I lead the Allen Park Neighborhood Center (APNC) team at the annual Lansing Michigan Give Camp during a very sleep-deprived weekend. At the event, more than 100 developers and designers donated their time to create a dozen Web sites and database applications for local charities. The APNC has a service program in which volunteers canvas the entire neighborhood every spring. They survey residents on numerous community health and well-being measures and then work to connect people with available services. The APNC needed a way to collect these surveys while out in the neighborhood and then synchronize the data back to the SQL Server database.
This type of scenario occurs at many businesses and non-profit organizations. A central SQL Server 2008 Standard (or Enterprise) instance will synchronize to multiple SQL Server 2008 Express instances at remote locations, perhaps connected over slower network lines, with the need to take the data offline on laptops or tablets at each remote location. Once the data is updated on the laptop, it needs to be synchronized with the SQL Server 2008 Express instance, and the Express instance is then synchronized with the central SQL Server 2008 Standard/Enterprise instance. The code in this article is a sample database synchronization library for this common scenario - the need to synchronize SQL Server between server editions and offline databases. The sample application is written as a hybrid C# and Visual Basic client.
Synchronization might appear relatively straightforward, but it can quickly become complicated and problematic, driving the need for a tool such as the Microsoft Sync Framework. The Microsoft Sync Framework 2.1 Software Development Kit (SDK) is available here.
Synchronization typically refers to file or database row-level synchronization. Most of us have used applications with offline synchronization such as Microsoft Outlook or Live Mesh, or third-party applications such as Dropbox or Evernote. These applications work well and can lull us into believing that synchronization is easy, when it's quite the opposite. Synchronization is hard for the developer, particularly when multiple nodes are independently updating a master database.
How can records be created at each node so that primary keys remain truly unique across all nodes? If multiple nodes attempt to update the same row, how are physical and logical conflicts resolved? The obvious conflict is if two nodes modify the same record. Often, both updates are permitted, if they impact different columns. But what if one node updates the first name and another node updates the last name in the same row? It's not a physical column conflict, but it's reasonable to believe that the nodes were not referring to the same person. Therefore, the updates present a logical conflict. These conflicts need to be detected and addressed.
Synchronization loops can occur in poorly designed architectures when one node synchronizes a change with one or more nodes. The same update is ultimately received back by that node as new data from another node, which is unaware that the receiving node already has the update. This problem is avoided by carefully defining the synchronization topology; a hub and spoke arrangement is often used to prevent loops.
Each table to be synchronized must have a unique primary key for each row. New records must generate primary key values that are unique across all nodes of the master database. If the schema for all nodes is exactly the same, the commonly used Identity Seed integer primary keys can...