When launching our Talky videochat service almost two years ago there was one thing missing: the TURN server.

WebRTC creates peer-to-peer connections between browsers to transfer the audio and video data, but sometimes it needs a little help to establish a connection. The TURN server is the thing that helps here: it is deployed on the Internet and relays data for two clients that fail to establish a direct, P2P connection (thus avoiding the dreaded “black video” problem). TURN is generally considered one of the hard topics when people start doing WebRTC and crucial to running a successful service.

Continue reading »

At &yet we have become big fans of CoreOS and Docker, and we’ve recently moved a bunch of our Node applications there.

We wanted to share a little tidbit for something that seems like it should be easy, but ended up being a minor stumbling block: making CoreOS’s Etcd cluster available in Docker.

Here’s a little background for those not up to speed on CoreOS or Etcd. CoreOS bills itself as “Linux for Massive Server Deployments.” CoreOS is usually deployed with Etcd, “a highly-available key value store for shared configuration and service discovery,” giving servers in a cluster the ability to share configuration data. We have been using Etcd and Confd for some time, so we naturally thought: “wouldn’t it be cool to configure applications and services within Docker from Etcd?”

Continue reading »

You’re looking at your todo list and pondering what code to write during one of the brief moments of free time that appear on your daily schedule, when all of the sudden you get a message in team chat: Is the site down for anyone else?

It can be a frustrating experience, but never fear; you’re not alone. We here at &yet experienced this type of outage once before, and then again this week. In fact, nearly every operations team has experienced at least a variation on the above nightmare. It is just a matter of time before you have to deal with people thinking your site or service is down when the problem is really with the Domain Name Service (DNS). Even shops that spend a lot of money to work with DNS vendors who themselves have some serious redundancy and scale will eventually fall prey to an orchestrated Distributed Denial of Service Attack.

Continue reading »

A core tenet of any Operations Team is that you must enable developers to change their code with confidence. For the developer this means they have the flexibility to try new things or to change old, broken code. Unfortunately, however, with every code change comes the risk of breaking production systems, which is something Operations has to manage. A great way to balance these needs is to continuously test new code as close to the point of change as possible by reacting to code commits as they happen.

At &yet the majority of the code that is deployed to production servers is written in NodeJS, so that’s the example I’ll use. NodeJS uses npm as its package manager, and one aspect of npm is its ability to define scripts that are to be run at various stages of the package’s lifetime. To make full use of this feature we need a way to run the defined scripts at the point that a developer is commiting code, as that is the best time to do validation and testing of the newly changed code.

Continue reading »

“If you don’t monitor it, you can’t manage it.”

In the last installment of our Tao of Ops series I pointed out the above maxim as being a variation on the business management saying, “You can’t manage what you can’t measure” (often attributed to Peter Drucker). This has become one of the core principles I try to keep in mind while walking the Operations path.

Keeping this in mind, today I want to tackle testing the TLS Certificates that can be found everywhere in any shop doing web related production - something that needs to be done and can be rather involved in order to do properly.

Continue reading »

“If you don’t monitor it, you can’t manage it.”

That’s a variation on the business management saying “you can’t manage what you can’t measure” (often attributed to Peter Drucker). The saying might not always apply to business, but it definitely applies to Operations.

There are a lot of tools you can bring into your organization to help with monitoring your infrastructure, but they usually look at things only from the “inside perspective” of your own systems. To truly know if the path your Operations team is walking is sane, you need to also check on things from the user’s point of view. Otherwise you are missing the best chance to fix something before it becomes a problem that leads your customers to take their business elsewhere.

Continue reading »

Every Operations Team needs to maintain the system packages installed on their servers. There are various paths toward that goal, with one extreme being to track the packages manually - a tedious, soul-crushing endeavor even if you automate it using Puppet, Fabric, Chef, or (our favorite at &yet) Ansible.

Why? Because even when you automate, you have to be aware of what packages need to be updated. Automating “apt-get upgrade” will work, yes - but you won’t discover any regression issues (and related surprises) until the next time you cycle an app or service.

A more balanced approach is to automate the tedious aspects and let the Operations Team handle the parts that require a purposeful decision. How the upgrade step is performed, via automation or manually, is beyond the scope of this brief post. Instead, I’ll focus on the first step: how to gather data that can be used to make the required decisions.

Continue reading »

On the &yet Ops Team, we use Docker for various purposes on some of the servers we run. We also make extensive use of iptables so that we have consistent firewall rules to protect those servers against attacks.

Unfortunately, we recently ran into an issue that prevented us from building Dockerfiles from behind an iptables firewall.

Here’s a bit more information about the problem, and how we solved it.

The Problem

When trying to run docker build on a host that provides our default DROP policy-based iptables set, apt-get was unable to resolve repository hosts on Dockerfiles that were FROM Ubuntu or debian.

Continue reading »

One of the best tools to use every day for locking down your servers is iptables. (You do lock down your servers, right? ;-)

Not using iptables is akin to having fancy locks with a plywood door - sure it is secure but you just cannot know that someone won’t be able to break through.

To this end I use a small set of bash scripts that ensure I always have a baseline iptables configuration and items can be added or removed quickly.

Let me outline what they are before we get to the fiddly bits…

Continue reading »

So Heartbleed happened, and if you’re a company or individual who has public facing assets that are behind anything using OpenSSL, you need to respond to this now.

The first thing we had to do at &yet was determine what was actually impacted by this disclosure. We had to make list of what services are public facing, which services use OpenSSL directly or indirectly, and which services use keys/tokens that are cryptographically generated. It’s easy to only update your web servers, but really that is just one of many steps.

Here is a list of what you can do to respond to this event.

Continue reading »