How Stripe Builds Software, with Greg Brockman

Mar 7, 2013 1:17:18 PM | DevOps How Stripe Builds Software, with Greg Brockman

Airbrake's interview with Greg Brockman, one of Stripe's founding engineers, about how they build software, team management, and development team scaling.

I talked with  Greg Brockman, one of Stripe's founding engineers, about how they build software, team management, how to scale a development team and lots more. If you haven't seen it, check out Greg's presentation about how Stripe built one of Silicon Valley's best engineering teams.

Screen Shot 2013-03-07 at 11.15.22 AM

Justin:  How many developers does Stripe have?

Greg:  We have about 20 developers.

Justin:  When did you join the team?

Greg:  I’ve been at Stripe for about two and a half years. I joined when we were about four people.

Justin:  How has the way that Stripe has developed software changed between the early days and now?

Greg:  Many things have actually remained similar. We try to hire people who are able to figure out what they should be working on, who can evaluate things from their own perspective. We make sure that people have enough of a global view to be able to make informed decisions about that. Some things have changed though. Now that we have more people, you just have to make sure that everyone is moving in the right direction, and communication becomes a lot harder. Having internal debates becomes significantly more difficult, but something that’s good to maintain.

When Stripe only had four team members, one person would work on different pieces. There weren’t formal assignments. Each person would end up working on the piece he or she was most comfortable with. I ended up working mostly on infrastructure and Darragh worked mostly on financial operations, which involves actually moving money. John and Patrick worked on more of the product.

As we hired our first few people, we started to realize that there were some scaling issues with everyone sitting down and having a technical debate at the same time. There’s an answer on Quora about how the Stripe API was designed. We started off with a JSON-RPC protocol. We didn’t use any of the features of REST: you’d just curl to one endpoint. We didn’t use status codes, we didn’t use headers for anything. It was really just JSON.

After Ross and Saikat joined, they looked at this, and they argued that we should have a RESTful API. We spent three weeks going back and forth on this every day. There were two big camps, and a few additional opinions within those larger camps, so we would debate back and forth. Needless to say, we weren’t making any progress.

In the end, we came up with an API that was better designed, so from that perspective it was a success, but it was also hugely painful to get there. Unfortunately, it was really inefficient because we spent a lot of time debating tiny details, rather than thinking about the overarching design. After that, we decided that we will never have technical debates with more than three people. Even four is really pushing it.

The second important result was coming up with the idea that, for any particular piece of Stripe, there should be one person who is clearly responsible for it. However, it’s really easy to end up in a world where one person believes they own the system, which can block others who want to work with it. That’s not what we aim for. We want someone who is thinking through all of the issues, has a say on why those decisions are made, and is thinking about how the pieces fit together.

For our API, we determined that Ross would be the point person. If you want to make a change to it, he’s the person to talk to and determine what changes make the most sense. It’s also Ross’s responsibility to take out the garbage: if there are bugs, he makes sure those are taken care of.

As we started to grow, people began to come up with ideas for projects. For example, the management interface that we have right now was written as a result of this. Someone determined we weren’t executing it well and recruited a few others to work on the project. With the management interface, about four people took a month to rewrite the entire web interface from scratch and rearchitect it to be on top of the API instead of just talking to the database. As we grew, we found that it was better to have smaller teams form and work on self-contained projects.

There are many other evolutions similar to that, where we needed to change our processes as we grew. This often occurred at distinct tiers, such as from 4 to 8 people, to 20 people and up to 40 people.

Our launch in Canada is another example. It involved software engineering that touched a lot of different pieces of the organization. When working with US dollars and Canadian dollars, you need to build a lot of code to support that across different tools and projects. We needed to work with different back-ends, ensure that all the business relationships were in place, and that everything was integrated properly. This is another arena where having a point person is particularly important. The point person ensures that all of the pieces of the puzzle fit together for a very large project.

Sheena, an engineer on our team, was the point person for Canada, so she was actually building a lot of the pieces and also keeping track of the work that others were contributing to the launch. This worked out really well for us because she wasn’t waiting for others to fix parts of her own project—she would simply take care of it herself.

We found that combining project management and engineering into a role held by one person cuts down on a lot of communication overhead. It’s always the case that the person who understands exactly what we’re trying to build is someone who also understands all of the technical details of how exactly it’s built.

Justin:  Does Sheena just say, “I want to own this project,” and she gets to choose it?  How does that whole assignment process work?

Greg:  Much of our project management is organic, in that we know that we want to build something, and usually, someone steps up to the plate and wants to run it. We’ve never really had an issue where there were multiple people who really wanted to manage a project or, conversely, that there was no one who wanted to do it. Usually, it just works out.

Justin:  How do you coordinate that at the level of 40 people?

Greg:  It’s a tough problem. I don’t think we’ve completely solved it yet. I think that a lot of it tends to come from the fact that we all come up with the direction we want to go in. Every two weeks, we sit down and have an all-hands meeting. The most useful piece of all-hands is not the actual meeting, but more that we have to think about what we have been doing and how it fits into the bigger picture of where Stripe is going.

There is usually some time for reflection there just to discern what our goals are and where we’re going. That brings us onto the same page. The details of what we do and how we do it generally involve some discussion. After the discussion, a point person typically emerges.

Justin:  What if two people want to own the same project? Does that ever occur?

Greg:  That hasn’t happened yet. I can’t think of a single incident where we’ve had that kind of conflict. I think our team tends to be pretty good about this. There’s also an infinite list of things that we want to do – there isn’t one glamorous project that everyone is vying for.

A lot of that actually comes down to making sure people have the right working relationships. We spend a lot of time thinking about building the right culture and making sure that people know each other really well, even at the level of just having good social events. I don’t think that we have as many as we’d like to, in that people really like each other and they would like to spend even more time together than they do.

I think that it’s amazing how much the actual work that you produce changes when you really know the people that you’re working with, and you really like interacting with them. We made a bet that hiring people you want to be around would pay off. It is something that has really contributed to being able to scale the workflow we have.

Justin:  Can you talk a little bit about some of the tools that you use to manage everything going on in the organization?

Greg: Different teams at Stripe tend to track their work differently. We use a couple of SaaS services. We use GitHub and Asana pretty heavily. Asana is really useful for things that are not code, and some code tasks even end up there. For example, infrastructure stuff tends to end up in Asana partially because it’s larger than any one repo. E-mail is very heavily used (see my blog post from a few weeks ago), and we’ve built some tools on top of the Google APIs in order to help manage a heavy flow of email.

We use UserVoice for ticket tracking, and we build a lot of tools for grabbing data from our internal management interface and correlating it to tickets in UserVoice. This enables us to correlate a lot of different tickets and issues and track data across various internal and external systems.

Justin:  So that leads me to another point. Right now, how do you work with errors or problems that you see from UserVoice? How do those get assigned and fixed?

Greg:  It depends on the issue. Most of the things that come out of that are product issues. We’re still actively experimenting and trying different things. For example, each week, someone from the support team will collect the most important issues from the week, summarize those and forward them on to the rest of the team. It’s useful for these bigger systemic issues that aren’t really a bug, per se, but can help cut down the number of inquiries about a particular feature or how we do our messaging.

For more immediate issues, like bugs, we’ll file them on GitHub, and each week the people working on product go through those, triage them and respond. For potential security issues, we specifically put a hard guarantee of a response within 24 hours. For system infrastructure, those are things that you can’t just put on the backburner, so we respond to them in real time.

Justin:  You brought up the concept of teams. Do you mind talking a little bit more about that?

Greg:  Our internal team structure evolved organically, like many things at Stripe, where we started out with people who worked on different projects. For example, I would be working on infrastructure, someone else would be working on product, and an issue that was in the middle of the two would often fall through the cracks. Often, I ended up working on many of these but they piled up and became too much for any one person to deal with.

At that point, we thought about how we were going to evolve because the status quo wasn’t working any longer. I think that that was actually a success of the system. We often do unscalable things for as long as possible and then figure out how to fix them later.

The abstraction we came up with is to have a concept of teams. We now have a systems team, a product team and a financial operations team. Each of these serves as buckets for work. So when an incoming task enters the system, it’s picked up by one of these teams and we don’t really end up with things that fall through the cracks anymore.

It’s important to make sure that these teams are very dynamic and that people are able to move around. Preserving that ability is tough and, in practice, people end up moving but not nearly as frequently as we’d like. We’ve been thinking about playing around with different models in order to make sure that people do end up moving.

Justin:  So how often do you guys deploy, and do you have any code review type practices?

Greg:  We have continuous integration. We don’t have continuous deployment, though we’d like to, and our test suite is a pretty good state these days. I think we’d be ready for that; it’s just a tool issue at this point. We deploy a number of times a day. I don’t have the exact figure, but usually people just commit code and end up deploying it. The workflow for code review depends on the patch. For small issues, you usually just write your code, you deploy it, and then someone will typically look at it post-deploy.

We found that’s a useful mechanism for making sure people are familiar with all the code going in without slowing down development. For bigger issues, people will usually open a pull request on GitHub and pull in reviews explicitly. We leave that to the discretion of the person writing the code. The person can determine how comfortable they are with their code versus getting opinions and feedback from others. There is a body of code, which is just very small stuff that doesn’t get reviewed, and again, that’s just left up to the discretion of the person who’s committing.

Justin:  Do you guys do anything with Scrum or Kanban or anything like that?

Greg: We don’t do any formal development methodologies. I think methodologies are useful as tools rather than ends in themselves. Usually, when people are working on smaller projects, they come up with whatever workflow works best for them. As an example, I’m currently working on a project with one or two other people. We physically moved our desks closer together and we passively communicate with each other throughout the day.

Justin: Since you work in these teams and have all these different things going on, how do you guys prioritize different features and product feedback from customers?

Greg:  That’s usually handled on a per-team basis. For larger items, like what project we should be working on, we handle that at the highest levels. But for things that can be bucketed into a single team, that’s just handled by the people within that team. Typically, there’s one person who is keeping track of all tasks within the team, and that person usually helps with the prioritization. For example, Systems tends to meet once per month, and Evan keeps track of a lot of the bigger issues that are going on. We sit down and think about what we want to get done this month and each person works on the piece they decided to take on.

For Product, again, there’s that weekly meeting where bugs are triaged and decisions are made on what our priorities are. Again, it’s decisions by people who are directly building.

Justin:  Is most of your time spent managing now or do you go back and forth?

Greg: No, I think what I do really depends on the day. Over the past 24 hours, I spent probably 80% of my time coding, and I end up getting involved in a lot of different things. One aspect that I really like about what we do here is that I also work on things that aren’t just code. There are a lot of different pieces of the business that people end up working on. I do a lot of recruiting, watch for how things are done on various teams, and try and figure out how we can do better.

 

Thanks Greg for a great interview! If you enjoyed it, send Greg a tweet!

Also, if you're interested in building better software, sign up for a free trial of Airbrake's exception tracker. 

 

Written By: Frances Banks