Data Synchronization Patterns

Masri, David

doi:10.1007/978-1-4842-4209-4_9

David Masri²

746 Accesses

Abstract

Data synchronization jobs are by far the most common type of integration job, during which we take data from one or more systems and move it into another (keeping the data in sync)—in our case, Salesforce. Data can flow from one system to another, and back. Sometimes this causes data conflicts that must be dealt with. Getting synchronizations working right can be tricky, particularly when there are heavy transformations and/or summarizations involved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In general, synchronization jobs are for integrations as opposed to migrations because the need for data synchronization is usually required to be ongoing. This is not to say we can’t use these patterns for migrations when we want to phase out a system (as opposed to turning it off one day) and need to keep data in sync until the legacy system is fully phased out. In addition, if we are migrating large amounts of data and are concerned about the time needed for the migration, we can use one of the incremental patterns to perform the migration in parts. In this way, we only have to load a delta when we are ready to go live.
2.
We examine how to handle this use case in Chapter 13.
3.
In this case, “simultaneously” does not mean at the same moment; it means between job runs. Let’s say your sync job runs once every four hours. If both systems are updated in that same time window, the updates are considered to have been done simultaneously. Keep in mind that the user who made the second update never saw the new data entered. His update was made while looking at “old” data, because the updates existed only in the first system at the time the second system was updated. This situation is called an uninformed update.
4.
I guess you could flag all records as deleted in lieu of the deletion step, and then unflag them as part of the upsert-all step, which will only unflag the records that exist in the transformed data source.
5.
For a good discussion on the various ways to get “records where not in,” see https://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join .
6.
It is, in fact, so difficult to solve that, during the early 1900s, dozens of patents where being filed to solve the issue. For example: How do you synchronize clocks between two train stations hundreds of miles away? This is exactly the type of patent Albert Einstein was working on when he was employed as a junior patent officer in the Berne Patent Office. It’s a direct result of thinking about these issues that Einstein realized that simultaneity is not absolute but relative, and his theory of special relativity was born (not to be confused with ~~Universal~~ general relativity, which is way too specialized to be used universally). It’s also why so many of his fun thought experiments involve trains. For more information, see https://www.telegraph.co.uk/culture/books/3601647/Space-time-and-patents.html .
7.
And if they do, it will usually be in a data structure that’s not easy to use, such as an audit log.
8.
Coalesce takes in any number of parameters and returns the first one that is not null.
9.
For more information, see https://stackoverflow.com/questions/1843451/why-does-null-null-evaluate-to-false-in-sql-server .
10.
Just to clarify this again, I do think systems are very good at tracking what records have changed, but not great at reporting the relevant changes when transformations (with lots of joins) are involved. These are often very complex queries to write.
11.
Also, as a general rule, I tend not to trust systems I don’t control.
12.
See the discussion in Chapter 3 on the Salesforce recycle bin.

Author information

Authors and Affiliations

Brooklyn, NY, USA
David Masri

Authors

David Masri
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Masri, D. (2019). Data Synchronization Patterns. In: Developing Data Migrations and Integrations with Salesforce. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4209-4_9

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4209-4_9
Published: 19 December 2018
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4208-7
Online ISBN: 978-1-4842-4209-4
eBook Packages: Professional and Applied ComputingProfessional and Applied Computing (R0)Apress Access Books

Publish with us

Policies and ethics