An Introduction to Thoonk!
A persistent (and fast!) system for push feeds, queues, and jobs, leveraging Redis.
As application developers, we persist data in tables which are constantly updated, leaving most of the application’s components and user-interface in the dark until it asks for the data.
[Movie trailer voice] Imagine a world where these tables push change-events to any piece of your application stack, in diverse languages and on multiple servers.[/Movie trailer voice]
Clustering Node.js instances, communicating between service components in different languages and on different machines, forking off asynchronous jobs for reliability and queuing of work, communicating between APIs and views, and sending events to real-time webapps are all problems that can be solved with messaging.
Thoonk solves these problems more gracefully than simple messaging because the messages are change-events on persisted data.
Thoonk is a Redis schema for manipulating advanced, live objects (feeds, sorted-feeds, queues, and job-queues, etc). Thoonk is also a couple of implementations of this schema (currently thoonk.js for Node.js and thook.py for Python).
Thoonk is a lot of things, which I will describe, but really what I would like you to get out of this is what the concept is useful for.
A feed is a list of data entries that have publish, edit, retract, and other events associated with those entries. A feed brings to mind ATOM or RSS to most people, but I think feeds are more useful when the associated events are broadcast on publish-subscribe channels so that data can be synchronized. Redis contains both of the necessary components (object storage and publish-subscribe channels).
Thoonk feeds enable our “live tables” fantasy.
Let’s get specific about Thoonk feed-types.
The basic feed is a list of items sorted by publish time. Verbs on these objects include publish, edit, and retract. Feeds may be configured to have a max-number of items, which when exceeded, drops the oldest items. Every item may have a unique assigned id, or Thoonk will generate one for you.
Sorted-Feeds are similar to feeds, but they have no item limit (beyond practical memory limitations) and are sorted by publishing items relative to existing item ids. Verbs for sorted-feeds include append, prepend, publishBefore, publishAfter, move, edit, and retract. Sorted-feeds emit position updates when an item is published or moved in addition to publish, edit, and retract events.
Queues contain items that can be placed at the beginning or end, producing FIFO and LIFO queues. A queue get is a blocking operation with an optional timeout that pops an item off of the end. Queues can be used for simple messaging and task distribution.
Job channels distribute items in a guaranteed completion manner. Jobs consist of three queues: available jobs, in-flight jobs, and stalled job. Like queues, jobs can be pushed to the beginning or end of available jobs and getting a job is a blocking operation with a timeout. Job verbs include: publish, retract, get, cancel (place an in-flight job back into available-jobs), stall (place a job out of the way that has been a problem), retry (place a stalled job as available).
Sets will be added in the near future as a means for maintaining live filters/queuries for feeds and other data.
An example Thoonk ecosystem:
Thoonk is a tool which allows you create an Internet service as a wide ecosystem rather than a deep application. Say we provide a series of 8 node.js processes to take advantage of the number of CPU threads available. This node.js application provides a websocket interface to a browser-js application with live events coming from Thoonk feeds on Redis, organized by individual users and teams. In another process, we might run a Ruby service that provides a REST interface for manipulating and querying objects within users and groups. Say also that we want to peer certain data with other services -- we can run a Python process which provides XMPP Publish-Subscribe (XEP-0060) and a Java interface which provides a PubsubHubbub interface. In addition to that, background jobs that absolutely have to be done can be pushed through a job system with workers running in C.
All of these separate components subscribe to the feeds pertinent to their function as well as provide relevant ACL and interface to the end-points. You are now free to use the most appropriate tools for the job, distribute load, organize application data, and selectively synchronize state easily. Of course, if you don’t have to have a lot of processes on a lot of servers in a lot of languages, you can still take advantage of compartmentalizing and duplicating your componets.
I find Messaging to be an interesting problem, particularly when machines communicate to share state, make requests, etc. However, messaging has limited use without persistent data, which is why I like XMPP Publish-Subscribe (XEP-0060) so much. Feeds of data -- combining data-persistence with publish-subscribe events about changes to the data, is incredibly valuable in machine-to-machine communication.
This is something that I’ve been applying to clustering, configuration distribution, job distribution and management, and real-time webapps, and other problems for years now in my consulting work.
Then, I discovered Redis, which is a very fast key-store-with-containers database that also includes publish-subscribe, and I immediately knew what I had to build.
I’m publishing this as MIT because I not only want to share it, but I want your feedback, harsh criticism, and contributions. We need more implementations in other languages, and I’d love to see people publish tools that contribute to Thoonk interfaces. In addition, please point out flaws in the contract.txt (schema) document, show us your extensions and own object types, etc.
-Nathan Fritz, &yet Chief Architect