&yet Blog

● posted by Bear

Every Operations Team needs to maintain the system packages installed on their servers. There are various paths toward that goal, with one extreme being to track the packages manually - a tedious, soul-crushing endeavor even if you automate it using Puppet, Fabric, Chef, or (our favorite at &yet) Ansible.

Why? Because even when you automate, you have to be aware of what packages need to be updated. Automating “apt-get upgrade” will work, yes - but you won’t discover any regression issues (and related surprises) until the next time you cycle an app or service.

A more balanced approach is to automate the tedious aspects and let the Operations Team handle the parts that require a purposeful decision. How the upgrade step is performed, via automation or manually, is beyond the scope of this brief post. Instead, I’ll focus on the first step: how to gather data that can be used to make the required decisions.

Gathering Data

The first step is to find out what packages need to be updated. To do that we will use the operating system’s package manager. For the purposes of this post I’ll use the apt utility for Debian/Ubuntu and yum for RedHat/Centos.

apt-get -s dist-upgrade
yum list updates

Apt will return output that looks like this:

Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
  libxfixes-dev
The following packages will be upgraded:
  base-files openssl tzdata
3 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Inst base-files [6.5ubuntu6.7] (6.5ubuntu6.8 Ubuntu:12.04/precise-updates [amd64])
Conf base-files (6.5ubuntu6.8 Ubuntu:12.04/precise-updates [amd64])
Inst tzdata [2014c-0ubuntu0.12.04] (2014e-0ubuntu0.12.04 Ubunt
Inst openssl [1.0.1-4ubuntu5.14] (1.0.1-4ubuntu5.17 Ubuntu:12.04/precise-security [amd64])

Yum will return output that contains:

Updated Packages
audit.x86_64       2.2-4.el6_5       rhel-x86_64-server-6
audit-libs.x86_64  2.2-4.el6_5       rhel-x86_64-server-6
avahi-libs.x86_64  0.6.25-12.el6_5.1 rhel-x86_64-server-6

Both of these tools provide the core data we need: package name and version. Apt even gives us a clue that it’s a security update - the presence of “-security” in the repo name. I imagine that yum can also provide that, I just haven’t found the proper command line argument to use.

The Next Step

Having this data is still not enough – we need to gather, store, and then process it. - To that end I’ll share a small Python program to parse the output from apt so the data can be stored. At &yet we use etcd for storage, but any backend data store will suffice. Processing the data for each server reflects the second step of our path - reducing the firehose of data into actionable parts that can then be carried along the path for the next step.

#!/usr/bin/env python
import json
import datetime
import subprocess
import etcd

hostname = subprocess.check_output(['uname', '-h'])
ec       = etcd.Client(host='127.0.0.1', port=4001)
normal   = {}
security = {}
output   = subprocess.check_output(['apt-get', '-s', 'dist-upgrade'])
for line in output.split('\n'):
    if line.startswith('Inst'):
        items      = line.split()
        pkgName    = items[1]
        oldVersion = items[2][1:-1]
        newVersion = items[3][1:]
        if '-security' in line:
            security[pkgName] = { 'old': oldVersion, 'new': newVersion }
        else:
            normal[pkgName] = { 'old': oldVersion, 'new': newVersion }
data = { 'timestamp': datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")),
         'normal': normal,
         'security': security,
       }
key = '/packages/%s' % hostname
ec.write(key, json.dumps(data))

When you run this, you will get an entry in etcd for each server, with a list of packages that need updating.

The remaining steps along the path are now attainable because the groundwork is done - for example, you can write other cron jobs to scan that list, check the timestamp, and produce a report for all servers that need updates. Heck, you can even use your trusty Ops Bot to generate an alert in your team chat channel if a server has gone more than a day without being checked or having a security update applied.

The point is this - if you’re not monitoring, you are guessing. The tool above enables you to monitor your installed package environment and that’s the first step along the many varied paths toward mastering your server environments.

● posted by Marcus Stong

On the &yet Ops Team, we use Docker for various purposes on some of the servers we run. We also make extensive use of iptables so that we have consistent firewall rules to protect those servers against attacks.

Unfortunately, we recently ran into an issue that prevented us from building Dockerfiles from behind an iptables firewall.

Here’s a bit more information about the problem, and how we solved it.

The Problem

When trying to run docker build on a host that provides our default DROP policy-based iptables set, apt-get was unable to resolve repository hosts on Dockerfiles that were FROM Ubuntu or debian.

Any apt-get command would result in something like this:

Step 1 : RUN apt-get update
 ---> Running in 64a37c06d1f4
Err http://http.debian.net wheezy Release.gpg
  Could not resolve 'http.debian.net'
Err http://http.debian.net wheezy-updates Release.gpg
  Could not resolve 'http.debian.net'
Err http://security.debian.org wheezy/updates Release.gpg
  Could not resolve 'security.debian.org'

To figure out what was going wrong, we logged all dropped packets in iptables to syslog like this:

# Log dropped outbound packets
iptables -N LOGGING
iptables -A OUTPUT -j LOGGING
iptables -A INPUT -j LOGGING
iptables -A FORWARD -j LOGGING
iptables -A LOGGING -m limit --limit 2/min -j LOG --log-prefix "IPTables-Dropped: " --log-level 4
iptables -A LOGGING -j DROP

The logs quickly showed that the docker0 interface was trying to FORWARD port 53 to the eth0 interface. In our case, the default FORWARD policy is DROP, so essentially iptables was dropping Docker’s requests to forward the DNS port to the public interface and Internet at large.

Since Docker couldn’t resolve the domain names where the Dockerfiles were located, it couldn’t retrieve the data it needed.

A Solution

Hmm, so we needed to allow forwarding between docker0 and eth0 , eh? That’s easy! We just added the following rules to our iptables set:

# Forward chain between docker0 and eth0
iptables -A FORWARD -i docker0 -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o docker0 -j ACCEPT

# IPv6 chain if needed
ip6tables -A FORWARD -i docker0 -o eth0 -j ACCEPT
ip6tables -A FORWARD -i eth0 -o docker0 -j ACCEPT

Add or alter these rules as needed, and you too will be able to build Dockerfiles properly behind an iptables firewall.

● posted by Bear

One of the best tools to use every day for locking down your servers is iptables. (You do lock down your servers, right? ;-)

Not using iptables is akin to having fancy locks with a plywood door - sure it is secure but you just cannot know that someone won’t be able to break through.

To this end I use a small set of bash scripts that ensure I always have a baseline iptables configuration and items can be added or removed quickly.

Let me outline what they are before we get to the fiddly bits…

  • checkiptables.sh — A script to compare your saved iptables config in /etc/iptables.rules to what is currrently being used. Very handy to see if you have any local changes before modifying the global config.
  • iptables-pre-up — A Debian/Ubuntu centric script that runs when your network interface comes online to ensure that your rules are active on restart. RedHat/CentOS folks don’t need this.
  • iptables.sh — The master script that sets certain defaults and then loads any inbound/outbound scripts.
  • iptables_*.sh — A bash scripts that is very easy to generate using templates for each rule needed to allow inbound/outbound traffic. I use a naming pattern to make them unique within the directory.

These scripts should be placed into your favourite local binary directory, for example

/opt/sbin
  /checkiptables.sh
  /iptables.sh
  /iptables_conf.d/
    iptables_*.sh

checkiptables.sh

#!/bin/bash
# generate a list of active rules and remove all the cruft
iptables-save | sed -e ’/^[#:]/d’ > /tmp/iptables.check
if [ -e /etc/iptables.rules ]; then
  cat /etc/iptables.rules | sed -e ’/^[#:]/d’ > /tmp/iptables.rules
  diff -q /tmp/iptables.rules /tmp/iptables.check
else
  echo "unable to check, /etc/iptables.rules does not exist"
fi

That is really it - the magic is in the sed portion - it removes all of the stuff that iptables-save outputs that isn’t related to rules and often can change between runs. The remainder of the script is performing a diff against the saved state vs current state. If current state has been modified you will see as output:

Files /tmp/iptables.rules and /tmp/iptables.check differ

iptables.sh

#!/bin/bash
PUBLICNET=eth2

iptables -F
ip6tables -F
ip6tables -X
ip6tables -t mangle -F
ip6tables -t mangle -X

# Default policy is drop
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP

ip6tables -P INPUT DROP
ip6tables -P OUTPUT DROP
ip6tables -P FORWARD DROP

iptables -A INPUT -p tcp -m multiport --dports 22 -j fail2ban-ssh
iptables -A fail2ban-ssh -j RETURN
# Allow localhost
iptables -A INPUT  -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
ip6tables -A INPUT  -i lo -j ACCEPT
ip6tables -A OUTPUT -o lo -j ACCEPT
# Allow inbound ipv6 ICMP so we can be seen by neighbors
ip6tables -A INPUT  -i ${PUBLICNET} -p ipv6-icmp -j ACCEPT
ip6tables -A OUTPUT -o ${PUBLICNET} -p ipv6-icmp -j ACCEPT
# Allow incoming SSH
iptables -A INPUT  -p tcp --dport 22 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT
# Allow outbound DNS
iptables -A OUTPUT -p udp --dport 53 -j ACCEPT
iptables -A INPUT  -p udp --sport 53 -j ACCEPT
# Only allow NTP if it’s our request
iptables -A INPUT -s 0/0 -d 0/0 -p udp --source-port 123:123 -m state --state ESTABLISHED -j ACCEPT
iptables -A OUTPUT -s 0/0 -d 0/0 -p udp --destination-port 123:123 -m state --state NEW,ESTABLISHED -j ACCEPT

for s in /opt/sbin/iptables_conf.d/iptables_*.sh ; do
  if [ -e "${s}" ]; then
    source ${s}
  fi
done

There is a lot going on here - flushing all current rules, setting the default policy to DROP so nothing gets through until you explicitly allow it and then allowing all localhost traffic.

After the boilerplate code, the remainder is setting up rules for SSH, DNS and other ports that are common to all server deploys. It’s the last five lines where the fun is - they loop through the files found in the iptables_conf.d directory and load any iptables_*.sh script they find. Here’s an example rule that would be in iptables_conf.d/ - this one allows outbound Etcd:

# Allow outgoing etcd
iptables -A OUTPUT -o eth2 -p tcp --dport 4001 -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT  -i eth2 -p tcp --sport 4001 -m state --state ESTABLISHED -j ACCEPT

Having each rule defined by a script allows you to create the scripts using templates from your configuration system.

With the above you now have a very flexible way to manage iptables that is also self-documenting - how cool is that!

● posted by Bear

So Heartbleed happened, and if you’re a company or individual who has public facing assets that are behind anything using OpenSSL, you need to respond to this now.

The first thing we had to do at &yet was determine what was actually impacted by this disclosure. We had to make list of what services are public facing, which services use OpenSSL directly or indirectly, and which services use keys/tokens that are cryptographically generated. It’s easy to only update your web servers, but really that is just one of many steps.

Here is a list of what you can do to respond to this event.

  1. Upgrade your servers to have the most recent version of OpenSSL, specifically version v1.0.1g, installed. The list of OS packages which have that version are too numerous to list so double check with your package manager for what version is appropriate.

  2. Restart any instance of Nginx, Apache, HAproxy, Varnish, XMPP Server or any other tool that dynamically links to OpenSSL.

  3. For any public facing service you are running that statically links to OpenSSL (or one of its libraries) you will need to rebuild the code and deploy. For us that was “restund” which we use for STUN/TURN.

  4. For any authentication database you have that uses OAuth tokens or cryptographically generated keys, you should mark all of those tokens and keys as invalid and force them to be regenerated.

  5. Any public facing SSL Certificate you have should be revoked and new certificates with new keys generated as well.

The last two items can be somewhat daunting, since due to the nature of this exploit we don’t know if our certs or keys (or really anything in memory) were compromised. The responsible thing to do is to assume that they were compromised, and replace them.

Security Bulletins:

Resources:

● posted by Bear

As more and more people are enjoying the Internet as part of their every day lives, so too are they experiencing its negative aspects. One such aspect is that sometimes the web site you are trying to reach is not accessible. While sites can be out of reach for many reasons, recently one of the more obscure causes has moved out of the shadows: The Denial of Service attack. This type of attack is also known as a DoS attack. It also has a bigger sibling, the Distributed Denial of Service attack.

Why these attacks are able to take web sites offline is right there in their name, since they deny you access to a web site. But how they cause web sites to become unavailable varies and quickly gets into more technical aspects of how the Internet works. My goal is to help describe what happens during these attacks and to identify and clarify key aspects of the problem.

First we need to define some terms:

A Web Site
When you open your browser and type in (or click on) a link, that link tells the browser how to locate and interact with a web site. A link is made up of a number of pieces along with the site address. Other parts include how to talk to the computers that provide that service and also what type of interaction you want with the web site.

Web Address, aka the Uniform Resource Locator
A link, to be geeky for a moment, is what is known as a Uniform Resource Locator (URL). Although most people think of a URL as “how web sites are addressed,” it is actually a part of a much wider method of access to any service on the Internet. That said, the vast majority URLs provide a way to navigate web sites.

This link, https://en.wikipedia.org/wiki/Url contains the following:

  • https:// The browser needs to talk to the web site using the https scheme - a secure web browsing web request
  • en.wikipedia.org The address of the computer (or computers) that will provide the information and content of the web site
  • /wiki/Url The resource you want to get from the web site

The address portion of a link, the “en.wikipedia.org” part above, is itself made up of various parts known as the Top Level Domain (TLD) and the hostname. For our example the TLD is “.org” and the hostname is “en.wikipedia” - the two pieces are then used by the browser to make a query to the Domain Name System (DNS). This request takes the name, determines which Name Server is the authority for that name, and then returns an IP address for the name.

IP Address
Each computer that is connected to the Internet is given a unique address so it can be identified and contacted. This unique Internet Protocol (IP) address allows clients (such as web browsers) and other computers to find and access it. Once your browser retrieves the IP address for a web site it can then begin to contact a computer using the appropriate protocol style to get the contents of the web site and display it for you.

Now that’s a lot of little things all happening behind the scenes when you go visit a web site :) - but now that we know what we’re working with, it will make describing what a DoS is easier.

When someone launches a Denial of Service attack they are trying to make the computers providing a service unable to perform their duties. The difference between a DoS and a DDoS is in how many outside computers are helping perform the attack. A Distributed Denial of Service attack is, as the name implies, distributed across many many computers, all of which are making requests to the target over and over again.

That is the crux of a DoS - one group of computers overload another group of computers by making the target have to process so many requests it cannot keep up. Let’s work through two examples of what a DoS attack would look like in the real world:

Example 1
Think of the Internet as a highway and your browser is trying to access it via the on-ramp. While a small stream of cars is trying to get onto the highway things go well, but when the flow of cars gets to be too many all hell breaks loose and everyone comes to a halt and sits in traffic.

Example 2
During home games, Denver Broncos quarterback Peyton Manning is able to call plays out to his team. The players can hear him just fine, since hometown crowds are quiet during the plays. However, once the Broncos traveled to the Super Bowl to take on the Seattle Seahawks, things changed. The Seattle fans, known for being very loud, were able to act as a “12th man” of the Seattle defense. So much so that Manning’s teammates could not hear him above the noise of the Seattle fans! They were suffering from a Distributed Denial of Service attack from more than one fan at a time.

As with any attack, you immediately begin to wonder how they can be prevented from happening and how to deal with them while they are active. This answer varies, not only because of the nature of each attack, but also because there are quite a few different kinds of DoS attacks. This little essay is getting rather long already so we might discuss counter-measures in a future blog post.

However, now when your browser is giving you an error message or the “spinner” is doing its best to annoy you, you will at least have the information to understand what is happening when the IT or Support people say “Yes, we are being DDoS’d.”

● posted by Nathan LaFreniere

Deploying a production application can be quite the chore. On the road to &! 2.0, our processes have changed significantly. In the beginning stages of &! 1.0, I hate to say it, but deploys were a completely manual process. We logged in to the server over SSH, pulled from Git, and restarted processes all by hand. Less than ideal, to say the least.

Managing those processes was just as bad; we were using forever and a simple SysVInit script (those things in /etc/init.d for you non-ops types) to run it. When the process would crash, forever would restart it and we’d be happy. Everything seemed great, but then one day we accidentally pushed broken code live. What did forever do? Kept trying to help us, by restarting the process. The process that crashes instantly. Several CPU usage warning emails from our hosting provider later, we realized what had happened and fixed the broken code. That’s when we realized that blindly restarting the app when it crashes wasn’t a great idea.

Since our servers all run Ubuntu, we already had Upstart in place so swapping out the old not-so-great init.d scripts for the new, much nicer, Upstart scripts was pretty simple and life was good again. With these we had a simple way to run the app under a different user (running as root is bad, please don’t do it), load environment variables, and even respawn crashed processes (with limits! no more CPU usage warnings!).

But alas, manually deploying code was still a problem. In came fabric. For &! 1.0 we used a very simple fabric script that essentially did our manual deploy process for us. It performed all the same steps, in the same way, but the person deploying only had to run one command instead of several. That was good enough for quite some time. Until.. one day.. we needed to rollback to an old version of the app. But how?

That instance required us to dig through commit logs to find the rollback point, and manually checkout the old version and restart processes. This, as you can guess, took some time. Time that the app was down. We knew that this was bad, but how could we solve it? Inspiration struck, and we modified the fabric script. So then when it deployed a new version of code, it first made a copy of the existing code and archived it with a timestamp. Then it would update the current code and restart the process. This meant that in order to rollback, all we would have to do is move the archive in place of the current code and restart the process. We patted ourselves on the back and merrily went back to work.

Until, one day, we realized the app had once again stopped working. The cause? We overlooked how fast the drive on our server could fill up with us storing a full copy of the code every single time we deploy. A quick little modification to the script so that it kept only the last 10 archives and some manual file deletion, and we were back on track.

Time went on, the deploy process continued to work, but much like every developer out there, we had dreams of making the process even more simple. Why did I have to push code, and then deploy it? Why couldn’t a push to a specific branch deploy the code for me? Thus was born our next deploy process.. A small server to listen for Github web hooks. Someone pushes code, the server would see what branch it was pushed to, if it was the special branch “deploy” the server would run our fabric scripts for us. Success! Developers could now deploy to production without asking, just by pushing to the right branch! What could go wrong?

As I’m sure you guessed, the answer is a lot. People can accidentally push to the wrong branch, deploying their code unintentionally. Dependencies can change and fail to install in the fabric script. The fabric script could crash, and we would have no idea why. We had logs, of course, but the developers didn’t have access to them. All they knew was they pushed code and it wasn’t live. So we’d poke around in the logs, find the problem, fix it, and go about our business grumbling to ourselves. This was also not going to work.

After much deliberation, we went back to running a separate command to deploy to the live server. That way the git branches could be horribly broken, people could make mistakes, and we wouldn’t end up bringing down the whole app.

To help prevent broken code, we also changed our process for contribution. Instead of pushing code to master, developers are now asked to work in their own branch. When their code is complete, and tests pass, they then submit a pull request to have their code merged with master. This means that a second pair of eyes is on everything that goes in to master, and feedback can be given and heard before deploying code.

To help enforce peer review, I wrote a very simple bot that monitors our pull requests and their comments in Github. Pull requests now require two votes (a +1 in the comments) before the “merge” button in the pull request will turn green. Until that happens, the button is gray. Although easy to override, when the button is gray a warning is displayed if it is pressed. This was a nice, neat, unobtrusive way of encouraging everyone to wait until their code has been reviewed before merging it into master and being deployed.

While still not perfect, our methods have definitely matured. Every day we learn something new, and we strive to keep our methods working as cleanly and smoothly as possible. Regular discussions take place, and new ideas are always entertained. Some day, maybe we’ll find the perfect way to deploy to production, but until we do we’re having a lot of fun learning.

Add your email to join the And Bang 2.0 private beta invite list:

● posted by Nathan Fritz

The Problem

When I was at FOSDEM last weekend, I talked to several people who couldn’t believe that I would use Redis as a primary database in single page webapps. When mentioning that on Twitter, someone said, “Redis really only works if it’s acceptable to lose data after a crash.”

For starters, read http://redis.io/topics/persistence. What makes Redis different from other databases in terms of reliability is that a command can return “OK” before the data is written to disk (I’ll get to this). Beyond that, it is easy to take snapshots, compress append-only log files, configure fsync behavior in Redis. There are tests for dealing with disk access suddenly cut off while writing, and steps are taken to prevent this from causing corruption. In addition, you have redis-check-aof for dealing with log file corruption.

Note that because you have fine tuned control over how fsync works, you don’t have to rely on the operating system to make sure that operations are written to disk.

No Really, What Was the Problem Again?

Since commands fail in any database, client libraries wait for OKs, Errors, and Timeouts to deal with data reliability. Every database based application has to deal with the potential error. The difference is that we expect the pattern to be command-result based, when in fact, we can take a more asynchronous approach with Redis.

Asynchronous reliability

The real difference is that Redis will return an OK as long as it was written to RAM (see Antirez’s clarification in the comments) while other databases tend to send OK only after the data is written to disk. We can still get on par (and beyond) with other database reliability easily enough by having a very simple check that you may be doing anyway without realizing it. When sending any command or atomic group of commands to Redis in the context of a single page app, I always send some sort of PUBLISH at the end. This publish bubbles back up to update the user clients as well as inform any other interested party (separate cluster processes for example) about what is going on in the database application. If the client application lets the user know that it didn’t get an update corresponding with a user action within a certain amount of time, then we know the command didn’t complete. Beyond this, we can write to a Redis master and LISTEN for publishes on a Redis slave! Now the client application can know that the data has been saved on more than one server; that sounds pretty reliable to me.

Using this information, the client application can intelligently deal with user action reliability all the way to the slave, and inform users with a simple error, resubmit their action without prompting, or request that the server do some sort of reliability check (in or out of context of the user action), etc.

tl;dr

  1. Single page app sends a command
  2. Application server runs an atomic action on Redis master.
  3. Redis master syncs to Redis slave
  4. PUBLISH at the end of said atomic action routes to application server from Redis slave.
  5. PUBLISH routes to single page app that sent the command, and thus the client application knows that said atomic action succeeded on two servers.
  6. If the client application hasn’t heard a published confirmation, the client can deal with this as an error however it deems appropriate.

Further Thoughts

Data retention, reliability, scaling, and high availability are all related concepts, but not the same thing. This post specifically deals with data retention. There are existing strategies and efforts for the other related problems that aren’t covered in this post.

If data retention is your primary need from a database, I recommend giving Riak a look. I believe in picking your database based on your primary needs. With Riak, commands can wait for X number of servers in the cluster to agree on a result, and while we can do something similar on the application level with Redis, Riak comes with this baked in.

David Search commented while reviewing this post, “Most people don’t realize that a fsync doesn’t actually guarantee data is written these days either (depending on the disk type/hardware raid setup/etc).” This further strengthens the concept of confirming that data exists on multiple servers, either asynchronously as this blog post outlines, or synchronously like with Riak.

About Nathan Fritz

Nathan Fritz aka @fritzy works at &yet as the Chief Architect. He is currently working on a book called “Redis Theory and Patterns.”

If you’re building a single page app, keep in mind that &yet offers consulting, training and development services. Send Fritzy an email (nathan@andyet.net) and tell us what we can do to help.

Update: Comment From Antirez

Antirez chimed in the comments to correct this post.

“actually, it is much better than that ;)

Redis with AOF enabled returns OK only after the data was written on disk. Specifically (sometimes just transmitted to the OS via write() syscall, sometimes after also fsync() was called, depending on the configuration).

1) It returns OK when aof fsync mode is set to ‘no’, after the wirte(2) syscall is performed. But in this mode no fsync() is called.

2) It returns OK when aof fsync mode is set to ‘everysec’ (the default) after write(2) syscall is performed. With the exception of a really busy disk that has still a fsync operation pending after one seconds. In that case, it logs the incident on disk and forces the buffer to be flushed on disk blocking if at least another second passes and still the fsync is pending.

3) It returns OK both after write(2) and fsync(2) if the fsync mode is ‘always’, but in that setup it is extremely slow: only worth it for really special applications.

Redis persistence is not less reliable compared to other databases, it is actually more reliable in most of the cases because Redis writes in an append-only mode, so there are no crashed tables, no strange corruptions possible.”

● posted by Adam Brault

Because we are huge fans of human namespace collisions and amazing people, we’re adding two new members to our team: Adam Baldwin and Nathan LaFreniere, both in transition from nGenuity, the security company Adam Baldwin co-founded and built into a well-respected consultancy that has advised the likes of GitHub, AirBNB, and LastPass on security.

We have relied on Adam and Nathan’s services through nGenuity to inform, improve, and check our development process, validating and invalidating our team’s work and process, providing education and correction along the way. We are thrilled to be able to bring these resources to bear with greater influence, while providing Adam Baldwin with the authority to improve areas in need of such.

Adam Baldwin

Adam Baldwin has served as &yet’s most essential advisor since our first year, providing me with confidence in venturing more into development as an addition to my initial web design freelance business, playing “panoptic debugger” when I struggled with it, helping us establish good policy and process as we built our team, improving our system operations, and always, always, bludgeoning us about the head regarding security.

It really can’t be expressed how much respect I and our team at &yet have for Adam and his work.

He’s uncovered Basecamp vulnerabilities that encouraged 37Signals to change their policies for handling reported vulnerabilities, found huge holes in Sprint/Verizon MiFi (that made for one of the most hilarious stories I’ve been a part of), published vulnerabilities twice to root Rackspace, shared research to uberhackers at DEFCON, and has provided security advice for a number of first-class web apps, including ones you’re using today and conceivably right now.

Adam Baldwin will be joining our team at &yet as CSO—it’s a double title: Chief of Software Operations and Chief Security Officer.

Adam will be adding his security consultancy, alongside &yet’s other consulting services, but will also be overseeing our team’s software processes, something he has informed, shaped, and helped externally verify since, I think, before most of our team was born.

On a personal note (a longer version of which is here), I must say it’s a real joy to be able to welcome one of my best friends into helping lead a business he helped build as much as anyone our team.

Nathan LaFreniere

As excited as I am personally to add Adam Baldwin, our dev team is even more thrilled about adding Nathan, whose services we have become well accustomed to relying on in our contract with nGenuity and in a large project where we’ve served a mutual customer.

Nathan is a multitalented dev/ops badass well-versed in automated deployment tools.

He solves operations problems with a combination of experience, innovation, and willingness to learn new tools and approaches.

He’s already gained a significant depth of experience building custom production systems for Node.js, including some tools we’ve come to rely on heavily for &bang.

Nathan’s passion for well-architected, smoothly running, and meticulously monitored servers has helped our developers sleep at night, very literally.

I know getting the luxury of having a huge amount of Nathan’s time at our developers disposal sounds to them like diving into a pool of soft kittens who don’t mind you diving on them and aren’t hurt at all by it either oh and they’re declawed and maybe wear dentures but took them out.

So that’s what we have for you today.

We think you’re gonna love it.