Synchronization has been on my mind recently. I have to build a sync engine, primarily on Apple platforms. While the ultimate goal is to make sure that a user can access and modify data on any device, there are many implementation details which have different tradeoffs.

For instance, how do various different devices communicate changes between each other? The general assumption is a network connection will transmit some form of state. Is the state transmitted via HTTP or some other protocol? Is the complete state of the device transmitted or only changes made on the device? Should data be synchronized in real-time or can the data be synchronized with a delay?

Beyond how the state is transmitted, is each device a complete view of the entire data set? Does each device even have the capacity for all of the data? If each device is only a partial view of the data, is there a single source of truth? How does a device determine what is relevant data if it only has a partial view, and how does it obtain the relevant data?

Do devices have constant and reliable network connectivity? If they do not, how does data get merged together when a device cannot communicate? While the device is unreachable, it may make changes and other devices could make changes as well. If there are conflicting changes made, how do the conflicts get resolved?

These questions and decisions are only a few of the considerations which will have major effects on how the synchronization is made.

To clarify which choices to make, there are other desired goals when synchronizing data:

  • The data transmitted to and from a device should only be what is absolutely necessary. Network resources can be fairly scarce especially on mobile devices. Whether on Wi-Fi, on cellular networks, or on wired ethernet, using the network should be considered relatively expensive. Furthermore, any data sent or received requries power to process, so it is better to only process what is necessary.

  • Synchronization data should be batchable. With network resources being considered scarce, the number of network requests should be limited. Therefore, any data changes should be able to be batched together as one network request (whether data is transmitted to or from a device).

  • Devices should be able to receive a partial set of the synchronization data and make progress towards the latest state. If a device received only half of the data updates required to transition the device to the latest state, it should still be able to process the updates received and transition the local device state. The synchronization updates could be large enough either in quantity or size that a device may need to process multiple batches.

  • The processing of synchronization data should be streamable. Whether there is one other device or hundreds of other devices, a device could need to synchronize a massive number of updates made since the last time the device synced. In a mobile device world, the devices have varying degrees of processing power with limited amounts of memory, so the synchronization data needs to be able to be processed as a stream of data versus processing all of the data at once.

With the above desired goals, a few other properties become desirable:

  • The processing of the synchronization data should be idempotent. A device could receive the same synchronization data multiple times. Either the device needs to be able to identify that the updates have already been processed or processing the data should be idempotent. Using specialized data structures such as conflict-free replicated data types may help simplify processing.

  • Any device should be able to track what changes have been made since a local point in relative time. By keeping track of changes, a device can identify what changes were made locally which need to be transmitted to another device. Furthermore, a device could synchronize with multiple devices (or services) which requires multiple points in relative time to be able to be tracked. Note that relative time may not be a traditional clock time but could be as simple as a counter or a change token.

  • As a corollary, a local device should be able to inform another service or device what data has already been synchronized. Whether it is using (change) tokens or other means, a local device can inform another device what it has already processed to reduce the number of requests and data required to get the local device up to date.

  • Any device (including services) which have the data should have ACID transactions. ACID-compliance make synchronization easier by ensuring updates are actually processed when an operation says it is processed.

There are many other considerations for synchronization, but hopefully these thoughts give an idea on what are some of the possible complexities and desired properties when synchronizing data.

In most languages, there is the concept of public, protected, internal, fileprivate, private, and other access control keywords. The intention is to restrict the usage of methods and access to data. In a way, it is the most basic form of encapsulation. After some recent work on a few apps, I’ve come to the conclusion that there are only two forms of access control that should be used in the world: public and internal.

Most of the effort in maintaining these different forms of access control is wasted. If you are in the position to change the code, you can modify the access control from private to public with just a few keystrokes. You may have initially wanted something to be private because you don’t want to accidentially leak the implementation details (even to yourself). Or you don’t trust other code developers working on your project to use the data or methods. However, anyone who has write access to the code can change the access control or make other code changes which break implementation preconditions and invariants. Using private is a small speed bump to preventing bad code.

Instead of maintaining such detailed access control, for libraries, you should use only public and the equivalent of internal if available. internal means any other code in the same module (e.g. package/library) has access to the internal data/function. In the end, libraries have only two forms of access control that anyone cares about. public is for all the consumers of the library. internal is for implementation details that only the library authors should have access to.

Users of your app do not care what the access control is, so let everything be internal. The idea is that while it does not matter today, you may extract code into a re-usable module later. So make things internal in the app, and then go back and expose the required types/methods as public if the code is extracted to a module.

protected, fileprivate, private, and other access control should hardly be used. There have been too many times where I’ve seen people expose the implementation details through leaky abstractions already. Or someone either changes the access control protection to be less restrictive or copies the private code for their uses (which can be even worse). Instead of relying on private or similiar access control levels, it seems to be better to just rely on code reviews and discipline instead. For larger code bases with more than a handful of developers, break the code base into separate modules. Encapsulation should be done at the module/library level versus in every code file across a monolith application.

Most of the time, I do believe that actually codifying the intent behind data/methods into the codebase is a best practice, but restrictive access control is not one of them. The next time that someone (maybe even yourself) changes the access control level of code you work on, try to imagine a world with only two levels of access control.

Knowing how to program involves learning a programming language. Like languages in general, there are many different kinds of programming languages which offer different (and maybe subtle) structures. Languages can share similar roots (such as Latin based languages), and therefore some attributes such as grammar. Languages can be a reflection of a culture and also lead to different ways of thinking, so knowing multiple languages is a great benefit to advancing your skills (in programming or beyond).

I saw a recent post asking why someone should learn Swift, so I started thinking what are some of the things I’ve learned from programming languages.

Basic and then Visual Basic taught me some of the magic of creating from virtually nothing to a working useful program. In retrospect, it also showed me how easy UI programming could be with a UI builder; programs that can help with programming are very useful tools.

Turbo Pascal taught me general imperative style programming with records. It also made me aware of operators and all these years later, I still wonder about having an explicit := operator versus overloading =.

C taught me about pointers, memory managment, heap allocation versus stack allocation, and type casting. It drove home the pains of manual memory management and the random segfault.

C++ taught me about classes and inheritance, polymorphism and virtual methods (e.g. dynamic dispatch versus static), templates, and namespaces. It was the first programming language where I felt that the syntax with all the templates and operator overloading was a bit much, especially when reading the source code for things like Boost.

PHP taught me about the ease which you can put a website on the web. In retrospect, it also showed the great power of having a single request/response context lifecycle. Not having to worry about multithreaded access to a variable, concurrency deadlocks, etc. makes for a much simpler programming model.

Java taught me about interfaces/protocols, bytecode and virtual machines, garbage collection, package management and distribution (e.g. Maven), schemas and the advantages/disadvantages of XML, multithreading in a server based environment, exception handling, and enterprise design patterns (for better or worse). JavaDoc was also very influential as well as automated unit based testing.

Scheme taught me about functional programming and side-effects. It also showed me how beautiful and ugly parentheses are.

JavaScript taught me the power of what a runtime enviornment can do and the power of a platform. The prototype based inheritance is interesting, but JavaScript’s ability to continue to evolve and power so many critical functions today (within just the browser) is amazing. Of course, the actual JavaScript language in the early days of the web is very different than today. The DOM, event handling, closures and capturing of contexts, and callbacks are other things that I learned with JavaScript. Promises and the async/await coroutines are also some recent things I’ve picked up. While all of these are great leaps and strides over the original language, it is really the power of the web platform that has made the continued investment in this language worthwhile.

Objective-C taught me about reference counting and dynamic messaging passing. From the Cocoa framework, it taught me about the power of immutability versus mutability, practical design patterns, and framework and API design. Grand Central Dispatch (a.k.a Dispatch) taught me about different ways to think about concurrency.

Ruby taught me about the power of duck typing. While not particular to Ruby, the community around Ruby on Rails taught me about the power of community derived conventions and common abstractions. It also taught me about monkey patching, and how programming languages/libraries/frameworks should really be meant to be joyful. Ruby is also the language where I learned how TDD is almost a must have skill for some languages.

Node.js taught me about event loops and how one can really re-purpose a language from one environment to another. It also showed me how micro-libraries and the framework of the hour can both help and harm the community.

Swift is teaching me about protocol oriented programming.

Go is teaching me more about co-routines and different ways of looking at interface design.

Rust has taught me about memory ownership in ways that I like and dislike.

If I were to name the most influential languages, I would have to say C, Scheme, and JavaScript. Each one of them has a fundamental trait (e.g. memory management, functional programming, and closures) that practically define the usage of the language. Rust may also be along the same lines eventually as I use it more.

From an API/framework point of view, Foundation/Grand Central Dispatch/Cocoa (Objective-C/Swift), and Ruby’s libraries are perhaps the most fun to work with. Swift’s Standard Library and Rust’s have perhaps the most modernized abstractions (which can be great but also frustrating like working with individual characters in Strings even though it is correct and safe).

If I were to suggest a programming today, JavaScript is something everyone will probably eventually work with. However, for fundamentally changing how you program, Scheme (for functional programming) or Rust (for a modernized take on all the previous systems programming langauges) would be my choices with a nod towards Rust for practicality.