When I did my first Azure BE project from Access, I quickly noticed that if I had more hand one or two comboboxes on a form, it took forever to even change records because the sources of these combos got updated for each new record, along with the usual form data. Ditching the comboboxes wasn’t really an option, so I decided that if I was going to be working with a remote BE, I’d better implement some sort of data caching on at least the stuff that stays static for the most part.

It’s not an elaborate setup by any means: for each table that I want to cache data to, I keep a snapshot table in a sideload database (e.g., a temp Access database in the FE folder), labeled “Snapshot_[ServerTableName]”. There’s a couple of rules that I set out for myself to make things easier across the board:

There’d be a module providing a single entry point to managing the data caching
It would only support one-way data flow – updates would be made to the server, and the cache would be refreshed. I would not make updates to the snapshot data – this is not a replication feature
The snapshot tables will have all of the same fields as the server tables, with the lone exception of the server’s ROWVERSION field. Likewise, all of these fields would have identical names
Getting the data into the local snapshots is done through a DataCache module, with one public method and an enum of the tables that will be updated:

One of the nice things about breaking these down as such means that we can, at any point from within the application, request a reload of the cache for a single table if we’ve changed some data that we know might have an effect on it.

Now we have a single pace we call to manage the cached data, which can be expanded on as we add more functionality. We’re able to break down that request into private functions/subs based on the input parameters, so let’s take a look at how to actually move the data. I’ve found the best performance to copy a table’s contents from an Azure BE to a local Access table is through the use of dual DAO recordset objects. A source and a target, one snapshot the other dynaset. Because we have identical field names and correctly mapped types in the snapshot tables, we can use a single procedure for all of the processes:

This tends to work pretty well and is reasonable as part of an application load time: maybe 3 to 5 seconds for 20,000-30,000 rows of data. Put a progress meter on a splash form and the user wait time is usually quite forgivable while the application loads.

While this tends to be “good enough”, why pull 20k+ records down the wire if there’s no reason to? We ought to be able to implement some sort of means to identify if that server data has actually changed before we pull it all in.

One means of doing so would be to utilize the source record’s DateCreated and DateModified fields (assuming you track these, of course). The idea would be to store, in a local table, the date that the cached data was last refreshed, and then check the DateCreated server field. If it’s later than the local last cached date, go ahead and INSERT those records. Next would be the DateModified field, this time using an UPDATE (but only required if the records weren’t already INSERTed locally).

The problem with this approach is twofold, one of which is the killer. The non-killer is that instead of a straightforward “delete all and reload all” approach as previously described, we’d have to get into comparing IDs to know what to update. This isn’t a huge deal, but does add a significant layer of logical complexity to what was an otherwise easy task. Furthermore, the terrible performance on a heterogeneous query means that we don’t want to update on a join between the server and local tables, so instead we’d have to load this data into an entirely separate staging table, then join the local staging table and the local cache table to update the records in a set based manner for optimum performance. All feasible for sure, but the second problem – the killer – is that this whole setup provides no means to handle any records that may have been deleted from the server.

Now, we could get into some routines to verify the local cache against the server records and determine which might have been deleted, but I really can’t imagine that this would be worth the trouble, mainly because of the performance implications of such a comparison.

A different approach, which is one that I tend to like, hits a middle ground between complexity and extra data across the wire while maintaining a high performance level. This is a relatively simple matter of logging the last modified time on the table level, then checking those values before reloading the cache.

First we’ll identify which server table will be locally cached, though this is for documentation purposes only, and not for any sort of logic purposes. Then we’ll create three triggers for each table, one on delete, insert and update. These triggers will simple update a log table on the server with the table name and a datestamp.

Then, on the local application side, we’ll grab this list of last modified tables with a date greater than now, and then run the original sync code that we came up with for only those tables. Aside from a bit of server work to handle the triggers (a stored procedure to create the triggers for you is a definite plus), everything remains very easy to handle, performance remains 3-5 seconds at worst, but now any tables that haven’t been modified since our last cache load are entirely skipped.

We’re still bringing more info over the line than is absolutely necessary, but the payoff in the simple logic (as opposed to adding delete tracking in the first approach described) is generally sufficient for most applications. On the other hand, if you’ve got 100 users all loading 50 or so cache tables at roughly the same time of morning against the same database, you might expect a bit of throttling from the remote DB server. Of course, such cases would require a more granular approach to what data to cache and how to sync it to the local store.

Happy data management!
-jack