Modularizing Underscore.js

January 07, 2015Henrik Joreteg

Underscore.js is the single most depended on module on npm.

It's impressive work from Jeremy Ashkenas, who also created Backbone.js and CoffeeScript, all hugely popular projects.

But how many of the modules that depend on underscore only use it for one or two methods? I'd venture to guess quite a few.

Conceptually, it makes sense. Why would you go implement your own debounce method when there's one in Underscore that works great and has clearly been put through its paces. Just install Underscore, be happy, and go fry some bigger fish!

Also, if you end up with 3 different versions of Underscore installed in an app that's just going to run using Node.js on a server somewhere, it's unlikely to cause any problems. node's require mechanism and npm handles that quite nicely for us.

But on the clientside, it's a bit different. We've been happily using node and npm to manage code for all our clientside work at &yet for several years now. Like hipsters, ya know? Before it was cool. But of course for front-end code we have to send all the code we want to use to the browser. In this case, sending 2 or 3 different versions of Underscore plus a few versions of Lo-Dash might not be what you want.

Huge deal? Probably not. Annoying? A bit.

But even if you're building a whole clientside app where you're likely to use a good amount of code from it anyway, Underscore is only 5.2kb after all (min+gzip). It's tolerable, you can probably just include it, try to avoid excess duplication, be happy, and move on.

But it all starts to feel less awesome when you start writing little clientside libraries.

If you're writing a small re-usable module where you want a performant, cross-browser each implementation that works with both objects and arrays, then what? When your whole module is less than 1kb, it may feel a bit odd to tack on a 5kb lib as a dependency.

SemVer to the rescue, perhaps? (By the way, if you're not familiar with semver, the first three points on the SemVer site will get you a basic understanding.) Couldn't we just set a flexible version range in our modules so when they're installed and used they can be de-duped at the project level?

Well, as it turns out Jeremy doesn't like and intentionally doesn't follow SemVer. So, since Underscore doesn't follow SemVer, there can be (and have been) breaking changes between minor or even patch versions. Which means that even when I'm creating small modules that are meant to be used as buildling blocks in other projects, I can't give them a flexible version range. Instead, I have to hard-code a specific version of Underscore into my dependencies if I want to be sure it'll continue to work when future release come out. This isn't just hypothetical; we were saved in a few instances by having hardcoded to 1.6.0 when 1.7.0 came out.

What else could we do? Well, some people try to circumvent the problem altogether by fishing out and including various helper functions directly in their modules. Or even port a whole little mini-Underscore into their project.

But that also seems less than ideal. Is that partial implementation well-tested and cross-browser ready? Likely not.

What about jQuery?

Same thing, right? Except even the new, slimmer, sleeker 2.x version is five times bigger than underscore at 28kb (min+gzip).

Again, by itself, maybe not a big deal. But what if all you need is a single addClass function for your small module? If feels silly to depend on the existence of a global $ that can select elements and has an addClass method.

You may have seen sites like you might not need jQuery that try to show you how to do what jQuery does for them under the hood to encourage people not to depend on it. But that can have some problems too. As it turns out, jQuery's implementation of addClass is a bit more complex than the alternate that is suggested, and arguably for good reason.

The tiny modules philosophy

The whole approach of depending on a larger library or existence of a global goes a bit against the grain of the Node.js community at large. As the instigator behind Ampersand.js, a clientside framework that aims to be the most flexible, composable option out there, it never felt right to me that we have Underscore 1.6.0 as a hard dependency of several of our modules, especially if we're only using a couple of methods in a given module.

In the same way that we don't want to force you to use jQuery, a certain template language, or view layer in Ampersand, we don't want to pick your utility library, either! But, what choice do we have?

What about Lo-Dash?

Well, conceptually it's not too far off. There's a CLI tool that will generate each of them as individual modules and all of those have been published to npm individually.

But... it's not perfect either. For example, below is the dependency tree for lodash.bind:

.
└── node_modules
    ├── lodash._createwrapper
    │   └── node_modules
    │       ├── lodash._basebind
    │       │   └── node_modules
    │       │       ├── lodash._basecreate
    │       │       │   └── node_modules
    │       │       │       ├── lodash._isnative
    │       │       │       └── lodash.noop
    │       │       ├── lodash._setbinddata
    │       │       │   └── node_modules
    │       │       │       ├── lodash._isnative
    │       │       │       └── lodash.noop
    │       │       └── lodash.isobject
    │       │           └── node_modules
    │       │               └── lodash._objecttypes
    │       ├── lodash._basecreatewrapper
    │       │   └── node_modules
    │       │       ├── lodash._basecreate
    │       │       │   └── node_modules
    │       │       │       ├── lodash._isnative
    │       │       │       └── lodash.noop
    │       │       ├── lodash._setbinddata
    │       │       │   └── node_modules
    │       │       │       ├── lodash._isnative
    │       │       │       └── lodash.noop
    │       │       └── lodash.isobject
    │       │           └── node_modules
    │       │               └── lodash._objecttypes
    │       └── lodash.isfunction
    └── lodash._slice

It just seems a bit excessive for something simple. There's also lodash-node, which you then would require something in a nested path like: require('lodash/underscore/bind'). Which is pretty close, because now when we browserify it all, we'll end up only including the code it uses. But those paths are a bit unsightly and hard to remember, in my opinion.

Optimizing for "done"

But there's still some more subtlety there that's a bit annoying. Specifically, that the Lo-Dash and Underscore codebases will march on. Which means we might be at 2.4.1 if we used Lo-Dash right now, for example. So we would have to pick a version range to march along with. But when 3.x.x comes out, we'll have to update dependencies in order to get proper de-duping. I don't want to track Lo-Dash either, if all I want is a utility method, in most cases those seem like they should very rarely need any updating at all, right?

Take for example the lodash.noop method. First, I'm not convinced an empty function deserves to be its own module, but that aside, to me it feels odd that a module like this should ever have reached a SemVer version of 2.4.1. Of course, it happened this way because it was versioned with the rest of the project, but hopefully you see my point, there's no way that a noop function's API has changed. To be perfectly clear, I'm not saying any of this as criticism of the Lo-Dash authors. They've created a hugely successful project and are clearly a very brilliant bunch. In fact, JDD has even given us some helpful feedback on some of our implementations. This is just an example of one tradeoff of thinking about a suite as a singular component rather than individual utilities.

So what about this concept of "done" code? Is there such a thing? Well, APIs can certainly be done.

Let's say I have an API for an addClass function that supported the following 3 APIs;

addClass(el, 'class1');
addClass(el, 'class1', 'class2');
addClass(el, [array of classes]);

I see no reason why that API contract would ever have to change! That could be 1.x.x version forever, right?

The underlying implementation could change if a better/faster implementation was discovered. But that API contract should be able to be done. Same is true of most of these types of utility methods, especially if we make that an explicit goal.

Imagine if we had a solid base-layer of well-tested, low-level, individually installable modules with very stable APIs. Now that's a shoulder you can stand on. It's easy to get caught up in pushing a bunch of new and updated code and constantly sitting and tweaking little things. Browsers are changing weekly, ES6 is coming down the pipe, change, change, change, change! It's awesome, but also a bit overwhelming to many.

Don't get me wrong, I love an awesome bleeding-edge API as much as the next dev, I wrote the first version of SimpleWebRTC several years ago, which was one of the first and most popular WebRTC libraries that I'm aware of and thanks to the tireless stewardship of Fippo, Lance and many others it has been happily powering Talky and many other WebRTC projects for some time.

I've also spoken and written numerous times about building apps that use the web to its fullest. Which is all to say, please don't mistake me for a Luddite.

But I think we, the JavaScript community as a whole, have grown to undervalue stability. It's a boring concept, really. There's nothing exciting about it. But having a solid base platform on which to build means you're free to focus on the more interesting higher levels.

As you've probably guessed by now, we've tried to tackle this problem. Not because we really wanted to, but because we wanted it to go away. I'm not the only one, it seems. My friend Feross apparently reached the same conclusion and split out the most useful methods from async.

I'm in no way claiming this approach is my idea, by any means. TJ Holowaychuck and a whole slew of other people have been doing this type of thing with component. There are also other great examples like this one by Blake Embrey that share the same philosophy.

The challenges of tiny modules

Tiny module all the things! Independent modules, FTW! Right?!

It works great for a handful of modules, but this approach is hard to scale. Turns out it's kind of a pain to manage and maintain 100+ modules.

This is especially true in the happy-fun-land that is the clientside. Because for browser code, you really want some sort of automated cross-browser testing. So let's say, for instance, that you find a faster, better, service for doing cross-browser testing. If you have to go update 80 different GitHub repos with your new test setup and config in order to do it, realistically, you're just not going to bother.

What if you want to add performance benchmarking, or otherwise change the structural elements, or update licenses of all your tiny modules? Fact is, the modularization is fighting against you at that point.

So that's the maintenance side, but also it's a bit problematic as a user, because names. You have to remember that you created a module called extend-object or was it object-extend?! And of course you have to remember it exists to begin with, go find it, then remember how to use it. Hopefully you wrote some good docs.

Less than ideal.

When I used to do a bunch of jQuery it was pretty simple: go to jQuery.com and look up what the closest() method does. All in one, nice, cohesive site of releated stuff.

That's also what's so nice about Underscore. You just go read the simple concise docs.

We wanted something that dealt with all this. We wanted a way to handle small modules that was easier to manage, test, and document.

So we made amp and it works like this:

Independent Modules in a single GitHub repo

The GitHub repo contains a module with a folder for each function.

All the modules have the same basic structure:

the implementation
the tests
the package.json file
the doc file
the generated README

The build system

This is the brains of the operation. The shared build system that manages practically everything:

Lets us have local modules require each other without messing with require paths before publishing
Automatically re-writes package.json files for each module to list dependencies, and devDependencies based on what's used in the code. It even alphabetizes keys in the JSON using fixpack
Gives us a central place to update licenses, READMEs, etc.
Installs hooks to make sure code is linted, and all modules have tests, docs, etc.
Stubs out new files needed when adding a modules with a simple command.
Lets us run DOM-dependent modules from the CLI using, tape, and PhantomJS.
Generates new versions of the docs site.

I feel that this is one of the more interesting aspects of all of this and may well end up with a blog post of its own very soon.

Namespaced package names

They all start with amp-, so we can keep the module names as descriptive, and therefore as memorable as possible (hopefully) without dealing with too many name conflicts.

Testing

Everything is cross-browser tested using a continuous integration system composed of:

Travis CI
Sauce Labs - using their generous Open Sauce tools
zuul - to wire the two together
tape - Substack's minimalistic test harness that produces TAP output that we pipe to tap-spec for purdy colors.

In addition, each module can be tested independently or as part of the whole from the command line with a simple npm test. It does this using PhantomJS.

The documentation

We generate a clean, easily searchable doc site: http://amp.ampersandjs.com/. We did it as a single page to be as Cmd + f friendly as possible.

It includes:

Function signature
Docs
Example usage
Expandable sections where you can see the entire implementation and relevant test file right inline

Other details

All the modules are published individually
Docs are intentionally not in the READMEs so that we can update docs structure, etc. without having to publish patch versions just to get the npm version of the README updated.
Strict SemVer with the goal of publishing a bunch of 1.0.0 versioned modules that optimized for being "complete" and requiring absolutely minimal updates.
Since there's no bundling and they're all individual modules, the collection can keep growing over time and there's very little "inclusion cost." So things that are currently relegated to underscore.contrib or used from jQuery can be included too.
We've got a pretty good start on it, and have ported many of the underscore methods (and related tests) we needed for Ampersand, but plan to continue to expand it.

But why?!

This might all be a little ridiculous, but maybe it's also a little bit awesome? I know I've wanted something like this to exist many times, and once devs get over their initial "why another thing?!" reaction and understand the stability goals, I've found many people who have been wanting something like this, too.

And hey, they good news is, if this isn't something you think is a good idea or don't want to use–fortunately, it's a big web out there with lots of options. Use what works for you. Tools are just tools; it's what you build with them that matters.

I'm @HenrikJoreteg on Twitter. Also, I wrote a book about building maintainable JavaScript apps called Human JavaScript, I teach JS, and in addition, if you liked this post odds are you'll want to check out Ampersand.js as well. If you want to hang out with a bunch of people working with these tools, come join the project chat on Gitter or connect via IRC using the gitter bridge.

See you on the Interwebz!

&yet

The Blog