At &yet, we've always specialized in realtime web apps. We've implemented them in a wide variety of ways and we've consulted with numerous customers to help them understand the challenges of building such apps. A key difference is that realtime apps need a way of updating the application without direct intervention from the user.
Growing Pains
What data you send, and how much you send, is completely contextual to the application itself. Your choice of transport (polling, long-polling, WebSockets, server-sent events, etc.) is inconsequential as far as updating the page is concerned. App experience and performance are all about the data.
In our earliest experiments, we tightly coupled client logic with the updates, allowing the server side to orchestrate the application entirely. This seems rather "cool," but it ends up being a pain due to lack of separation of concerns. Having a tightly-coupled relationship between client and server means a lot of back and forth, nearly infinite amounts of pain (especially with flaky connections), and too much application orchestration logic.
We moved on to simply giving the new data to the client when things changed, which removed all of the orchestration pain and gave more control to the application. Even this had some interesting pain points when dealing with shifts from offline to online, cache control, and lack of control over memory. Here are some of the problems we ran into:
- When replaying events after a reconnect, the service had to remember what should have been sent, in what order, for that user specifically.
- Similar issues arose when switching subscription states on data.
- Applications that simply couldn't handle all of the data being loaded into the app at once had a lot of timing issues with updates and API calls.
- If your permissions were tricky, pushes became an extra place that they had to be checked.
Getting the Hint
Eventually we realized all of these problems could be solved with hinting. Rather than sending the data, we send the data type, the ID, and whether it was an update, a delete, or new data. The application then requests the data it cares about - and only the data it cares about - from the HTTP API.
Now after a reconnect, we can query for a set of IDs that have changed since that time plus some extra for good measure, without caring about order at all. Since all of the data comes from the API, we don't have timing issues with API data versus update data, and we have a single source of truth and a true separation of concerns. The application can now keep caches, and mark the cached data as dirty or delete it accurately over time, whether the data is being displayed or just kept handy.
All-in-all, simply hinting data changes removes a lot of tricky edge cases, and empowers the application developer to control their data effectively. Hinting FTW!
We can help you leverage our expertise on architecture.
Feel free to comment directly to Nathan Fritz @fritzy.