This happens to be about Drupal sites, but isn’t really Drupal-specific. If you must know, the custom classes I refer to later are Migrate module migration classes, with the JSON source plugin.
I’m working on a project (let’s call it Satellite site) that shares data with the Parent site. Content editors on the parent site work on items and those items get synced every few minutes to Satellite site. How they get synced is an integral part of this story. As part of our Drupal 6 to 7 upgrade, we already have some URLs on Parent site that allow us to list which items have changed since a given timestamp, and to get the raw data dump for a given item. We have a nightly build that calls all those URLs and has some custom classes to process the data into the new Drupal 7 format. Satellite extends those classes to do some additional logic. The problem given to me by my PM is that new items would transfer over ok from Parent to Satellite, but updates on Parent wouldn’t come over. I found and fixed the following problems:
On Parent site to save processing during our nightly upgrade test build, each item’s data is written out to a file. When the item’s URL is called, it first looks for a file with the item’s ID and if it’s there returns the JSON data within. If it doesn’t exist, it writes the file and returns the JSON content. Since the upgrade build is under major construction, rather than rebuild each file when the item changes, we just have a batch script that regenerates the files since it’s mostly getting the structure migrated correctly that we care about for now. When we go live, we’ll be locking down content edits and re-running the batch script at that point to get the latest data. But for Satellite’s purposes, it needs the freshest data. I added code on Parent to accept a query string parameter that would force generation of a new file if passed in.
Ok, but the updates were still not coming over. The next problem I found was a truly nasty logic bug and one that didn’t occur to me when I architected this setup, nor did it occur to the engineer who followed my instructions and actually did the work. Data migrations occur in two steps: first, we call a URL saying ‘get me a list of the item IDs of everything that’s changed since X’, where X is a unix timestamp of the last time an update was run; then, we loop through each of those items and call their individual item URLs. After successful completion of a migration cron cycle, we store the current timestamp for the next round. While debugging why the upgrades weren’t happening I looked at the item list URL and compared the timestamp to the last changed date for a test item. And there was the problem… Parent and Satellite are on two completely different servers (and hosts, for that matter). Satellite’s clock happens to be ahead of Parent’s clock, therefore the last migrated timestamp we were storing for the next round on Satellite was actually ahead of the Parent. This meant that no items would have an updated time more recent than the server’s time for quite a while, when Parent’s clock caught up to Satellite’s stored timestamp. So I rewrote the code on Satellite to get the next round of items as all items newer than the most recent updated time of the items from the previous batch.
For example, let’s say a regular sync iteration runs at 1:50; 3 items were in the last batch and their updated times were 1:23, 1:25, 1:22. Under the original code, I’d store 1:50 as the ‘since’ timestamp (i.e. next round would request ‘get me all items that have been updated since 1:50’). As you can see, anything updated between 1:25 and 1:49 would miss out. Under the corrected code, I’d store 1:25.
A couple people now who know this story have asked why we don’t just have the times synced through the standard tools (NTP, for example). We don’t have control over systems level stuff on Parent, for one. Also, I needed a solution now before the content editors staged a witch hunt. And hell, I can do basic sysadmin stuff, but y’know, I’m not a sysadmin; I didn’t even know there were options for syncing servers until said people mentioned it.
And yet that still didn’t solve the lack of updates! WTF! The last problem was relatively straightforward. I first tried pulling the item data URL from Parent in the browser and the latest data was there. Then I remembered that I had a logged-in session on Parent and logged in users bypass Parent’s Varnish cache. So I pulled up an Incognito tab in Chrome, tried the same URL and sure enough, the old data was presented instead. After consulting with a coworker, he reminded me that you can bypass Varnish by putting a unique query string parameter in the URL. I added the current timestamp to the item data URL and was able to get the latest changes.
Long story short, what a pain.