Building scalable data systems

In my tenure at HubSpot I’ve been on teams that have built and rebuilt various data systems and every time we’ve tried to construct “scalable” solutions. We’ve hit the mark on some projects and been wildly off on others. I’m on another such project and a few things I’ve learned seemed important enough to share, or at least write down to remind myself later.

Make an API. Whether you’re creating a new RESTful HTTP service or using ProtoBuf/Thrift it’s worth it to put a layer between your raw data source and your consumers. By creating a layer between your consumers and your data system, you can insulate client code from all sorts of issues related to data management.  You can add cache layers, shard databases, even switch the entire data storage system behind the API while the client stays blissfully unmodified. All of these things put you on the hook for making a fast and reliable API, but the benefits of being able to swap moving parts behind the scenes are invaluable. Furthermore, by making an API for your data, you also make it easier for other developers (internal and external) to make use of your data. Internal consumers will be able to build tighter integration between systems within your product(s). External developers can start to use your systems data to build plugins and add-ons that add value for your customers in ways you hadn’t thought of yet. Of course with all these other developers and teams accessing your data now…

What you expect to happen is not what will happen. In rebuilding some other parts of the Lead storage system my team at also built a RESTful API to modify and retrieve lead data. We thought it would receive relatively moderate usage internally and very light usage externally. A pair of load-balanced tomcat servers could easily handle the 10,000 requests per day we were expecting. Instead we saw internal usage spike well over 100,000 requests in the first few weeks, forcing us to modify and add capacity to the “scalable” system in a variety of ways. Now over a year later we have over 200 HubSpot customers and dozens of internal customers tallying over 1 million requests and serving up a couple gigs worth of data per day. This isn’t what we had expected for this API, and as such it’s got a lot of rough edges, some which have been sanded down, others which still pose hazards to internal and external users.

Over-engineering vs under-delivering. If your system takes a year to design and another to build a first version and its “infinitely” scalable, your business better be capable of making that investment and continuing to run in the meantime. If you’re like just about every other company smaller than Google or Microsoft, your development team can’t wait that long. Google’s BigTable is an amazing work of computer science and software engineering, but I doubt few established companies and even fewer startups can sustain the rest of their business while waiting on a team to deliver such an amazing work. The other side to the coin is under-delivering.  Designing to your current volume of traffic and requests is a sure way to waste your time.  At the very least plan for double, if not an order of magnitude more, requests/traffic/data than your current system sustains. If you’re building something entirely green field, consider what would happen if you opened up access to this data through an API layer to external developers. Would you see 10 requests per day? 100? 100,000? Think big enough to know that you won’t be up until 3am every night patching your system in a desperate attempt to keep it efficient and high-performing as usage grows.

Experiment to solve problems, not for the sake of experimenting. There’s so many cool new NoSQL or Key-Value data storage projects out now that want to replace your relational database. In the world of developing data-driven web applications, there are a lot of applications of this new technology. However, it doesn’t mean that every project is well suited for this model. Building an in-browser IM client? Key value stores sound like a great way to go for storing the messages for quick retrieval. Building a white-label inventory management console? It could still work ok, but you’re in the gray area. That new CRM for non-profit dog walking businesses? You’re sounding more like you’re in the relational model’s sweet spot. Shiny new things are shiny, and new, but many successful technologies are successful because they’ve been proven to work very well for a wide range of applications.

This is by no means the blueprint to successful data system design, its just the start of some guidelines I intend to use as I build new, bigger systems to replace the ones I’ve already built. We’re working on some awesome new stuff at HubSpot and I can’t wait to post more about what we build. I’m sure we’ll try some new stuff that won’t fit either because we don’t sufficiently understand it or because its ill suited for our data model. I’ll try keep this blog updated as we come across cool new findings (of which expect there will be many).

Posted in Technology | Tagged , , , | Comments Off on Building scalable data systems

On Traveling And Being A Nerd

I love traveling, the thrill of seeing new and unfamiliar places and people is really unbeatable.  The downside is that I’ve spent a lot of time cultivating an environment in which I thrive as an engineer (for more info on “the cave” and related issues see: http://randsinrepose.com/archives/2007/11/11/the_nerd_handbook.html) and I can’t bring that environment with me when globe-trotting.  I’ll often find myself hating being away from the things I find familiar, annoyed because I can’t control all the variables. When I’m home I’m constantly checking my email, writing code and basically living life through a keyboard and internet connection. When traveling, I often find myself introverting and getting snippy because I’m basically going through withdrawal from my daily routines.  Time changes, different cuisine, tea instead of coffee, coffee that’s too weak, coffee that’s too strong, temperature too hot, too cold, etc. I’m definitely getting better, but I’m come up with a few methods that seem to have helped on recent trips.

*  Bring technology of some sort with you that serves a useful purpose – On our recent trip to Iceland, I brought my new Nook Color. I loaded it with an offline maps program and a pretty good travel book I bought through Amazon Kindle.  It was very useful to have interactive maps we could use in conjunction with our print and digital guidebooks.  It also helped my inner nerd feeling in touch with a familiar piece of technology, and in turn, a little more in control of the world around me.

* Recognize your frustration and squash it – I get snippy when I get hungry.  My partner knows this, she reminds me to eat a snack and I get more annoyed because someone is telling me what to do. I’ve learned to just let it go, eat a Cliff Bar and enjoy the scenery. It’s not easy, and I don’t profess to have stopped entirely, or to expect that I won’t ever get annoyed, but at least I’m making progress and able to get out of my own head long enough to stop acting like a jerk.

* Slow down and relax (at least for a little while) – My partner and I do whirlwind tours when we travel.  This usually means short stays in a ton of different places on a 10 day vacation.  When we went to Costa Rica last year, we never stayed in any place more than 2 nights and we were there for 11 days. But even at that pace, we managed to take naps, enjoy the rain forest, the cloud forests, black sand beaches, and quiet nights in a little beach town.  The only time I ever felt we were rushing was when we were trying to get to something we were required to schedule. Taking time to walk around, enjoy a city or town in the evening or early morning really takes the anxiety out of being away from my normal environment.

I really enjoy traveling, especially with my girlfriend and can’t wait for our next trip. Hopefully I’ll keep myself from succumbing to low blood sugar and enjoy the good decompression time that comes with being away from work and the internet in general.

Posted in Travel | Comments Off on On Traveling And Being A Nerd

Nook Color Review

I recently upgraded from a no-name Chinese Android 2.2 tablet to a NOOK Color. The difference was astounding. I really liked the skin Barnes & Noble has applied on Android.  It looks slick and operates really smoothly. The device itself is a bit heavy and after 35 minutes of holding it up while reading your arms will be a bit tired. The heft does convey that feeling of being well built and sturdy, an attribute possessed by many quality Apple products. My first task after playing with it for a day was to root it and install a 3rd party Android rom.  This really changed the device from a cool color e-reader to a tablet.

I opted to install CyanogenMod 7, an Android 2.3 rom (Gingerbread, yum). Installing was a breeze for anyone who’s done any tinkering with Android, and compared to my previous forays into this realm, it was much simpler. Simply format your SD-Card with a bootable image provided on a few sites, then install the rom, and reboot and viola! you’re running the latest stable release of Android on your $250 tablet. The UI in CyanogenMod lacks some polish compared to the heavy customization in the standard NOOK UI, but it’s quite pleasant and intuitive, especially for a seasoned Android user.

The battery life is more than acceptable, especially for the price.  With moderate usage, I can usually read and respond to emails, read books, browse the web (including Flash) for about 6 hours before the battery gets tired. For me, that means half an hour in the morning of reading the news and checking my email, then another two hours of the same when I get home and maybe an hour of reading before bed.  So I usually go two days between full charges, and I haven’t actually let the battery fully discharge yet. I suspect with some tweaks, lower screen brightness and quicker screen timeouts, the battery could easily last you a week.

For my personal use, the most important thing I wanted to try was using the tablet as an offline travel guide. Equipped with the NOOK app and Amazon Kindle I can keep travel books available for immediate, searchable, and bookmarkable reference without taking up any extra space in a daypack or luggage. I recently got back from a trip to Iceland (more on that soon) with my girlfriend where we took this plan for a test run. We both agreed the addition of the NOOK to our usual equipment when traveling abroad was a big win. With plentiful wifi and preloaded with travel books, the NOOK performed fantastically as a travel book. I found a fantastic offline maps application called MapDroyd which can download vector maps of regions and entire countries. It wasn’t perfect, and when I had preloaded directions with Google Maps the map was cached well enough that we didn’t need it. However for the few times we found ourselves at a random intersection in Reykjavik with no idea which way to head next, MapDroyd really came through.

My essential Android apps on my NOOK so far:
* Firefox
* Google Maps
* MapDroyd
* Amazon Kindle

The biggest thing I noticed about how I was treating it was that I noticed I wasn’t babying it the way I do my digital SLR camera or my cell-phone.  I was throwing it into our day pack. When you take it apart and put it back together, the NOOK delivers so much for the price. At $250 the hardware itself has a nice form factor and is very portable, but you don’t feel like you need to protect it or shield it from the horrors of real-world use. It’s durable, yes, but more than that, it’s the first step towards commoditization of tablets. The Samsung Galaxy, Motorola Xoom and even the iPad v1 and v2 all put the price of a tablet device well above $400. The Kindle come close, but with the current generation of E-Ink, it’s not going to compete well with a tablet that has full-color, Adobe Flash and a rich UI for only $100 more.

Wrapping up, the NOOK delivers so much for $250, but only after you’ve replaced the stock software with something else.  This barrier might hold it back, but it also will continue to do exactly what Android does best, attract the enthusiast crowd. People who already flock to Android-based devices for their customizability and configurability love this tablet.  Barnes & Noble has managed to put a slick enough UI on top of Android, and seems to be regularly releasing new features with updates, even your parents could enjoy this device.

Posted in Technology | Comments Off on Nook Color Review

Starting writing again

This is just a post to start me in the habit or writing at least once a week on personal, tech or otherwise interesting topics.  I intend on doing meaningful short essays, but can’t guarantee that my writing will be spectacular.  I’m a software engineer, with a passion for gadgets and especially for Android. I enjoy taking photographs, but don’t profess any strong commitment to it as an artist. I’m hoping I’ll be able to keep up with this and give something useful back to all the people whose great ideas I’ve stolen been inspired by.

Posted in General | Comments Off on Starting writing again