Always Write Something

Yann Tambouret
Klaviyo Engineering
10 min readMay 30, 2023

--

John Meichle circulated the first Klaviyo RFC days after Black Friday, Cyber Monday (BFCM) 2017. He presented RFC: Jenkins + AMI build pipeline, which discussed automating away a manually triggered (though scripted) task. John co-opted the RFC pattern from a previous organization. Our people and systems were scaling quickly and John saw the need to be more formal about our technical planning and execution.

Since that first RFC, or Request For Comments, the process has become the standard Klaviyo engineers follow before developing projects of all sizes. It’s not at all a new concept, and the first Internet Standards RFC dates back to 1969. Many other contemporary engineering teams author RFC documents, for example, see Squarespace’s The Power of “Yes, If” and Juan Pablo Buriticá’s A thorough team guide to RFCs.

Early on, we evangelized the process with the motto Always Write Something. We’ve since authored and reviewed over 600 RFCs, covering everything from small data migrations to entirely new revenue generating systems.

In my 5+ years at Klaviyo, I’ve reviewed hundreds of RFCs and written over a dozen. I’m also a member of the Architecture Review Board (ARB), a group that stewards the RFC reviews. RFCs have become the foundation for our technical decision making. In this post I’ll share how RFCs work at Klaviyo with real examples.

What is an RFC and how does it work?

An RFC involves both writing your ideas down in a document and presenting them to your stakeholders for discussion. Communication is essential for engineers to work effectively with one another, especially when the organization grows beyond a few teammates. The RFC process encourages healthy communication. We want engineers with all levels of experience to have a positive and useful experience sharing ideas, and we want these ideas to be described and discussed in a consistent and high-quality way.

RFCs start as team discussions, which are then written up as technical specifications. When writing an RFC, authors use a standard template. The ARB maintains and iteratively improves the RFC templates (there are subtle derivatives of the one linked above, each focusing on domain specific details). The authoring teams then share these RFC documents with stakeholders, who are teams that will be impacted one way or another by the proposed changes. Stakeholders can be one of our SRE teams, e.g. Security Engineering, or they can be teams whose owned systems will interact with the proposed changes. For every RFC, the ARB is a required stakeholder. This helps ensure consistency and reliability across all teams.

An example approvals section of the RFC, listing out stakeholders and their review status.

Discussion starts asynchronously in the form of comments on the RFC document and over Slack. Meanwhile, the author schedules a meeting to finalize the discussion. We reserve four one hour blocks per week to hold these discussions and authors can reserve 30 or 60 minutes — it’s usually clear from the async comments how long it will take to resolve open points. Holding these blocks is a small but important process detail. Otherwise, if authors were left on their own to find blocks of time, weeks could be lost to scheduling.

These discussion sessions are open to everyone at Klaviyo. For that reason, RFCs are an excellent learning opportunity for new team members. Liam Kelly, a colleague and ARB member, put it this way:

One thing that impressed me about Klaviyo’s RFC process is the openness of it — i.e. everyone can view and comment on every RFC if they want, and anyone can drop in and observe any RFC review. Those things were a big help when I was onboarding as far as understanding the work of other teams.

An ARB member moderates the RFC discussion. Our goal is not just moderation, but also to prompt exploration of any under-addressed aspects of the proposal.

Page from our internal wiki describing the RFC moderation process.

Here’s the slide we show before each RFC discussion to remind everyone the rules of play.

The “Welcome to RFC Review” slide presented before each meeting.

(The slide is less fun than our initial effort: playing a Gilbert Godfreed impersonator reading of the guidelines out loud!)

Many RFC discussions resolve with all teams agreeing on a path forward. If not, there’s a process for iteration, and the outcome of the discussion might simply be a list of action items to address before re-review of the plan. The eventual goal is that stakeholders give approval and the RFC is ready for development. An alternative, and also positive outcome, is one where the process of writing and discussion makes us realize the plan is missing something that deserves more attention. This sometimes results in scrapping the approach or even the entire project.

How long does this all take? Writing the initial draft varies widely. It could be that the author clearly understands the idea and writing it down takes half a day. Or it could be that in writing the RFC, the author is also writing prototype code, performance testing packages, etc., in which case it will take much longer. At any rate, once drafted, based on reviewer responsiveness, schedule congestion and the number of follow up action items, it’s usually one to three weeks to approval.

The focus of the RFC document and discussion is purely technical. It’s not a contract among teams regarding timelines, commitments or resourcing.

A personal story

I believe most of my colleagues value the RFC process including the role of the ARB, but not everyone values it the first day they join. When I joined, Klaviyo neither used RFCs nor had an ARB to review technical plans. There’s nothing like screwing up to develop conviction that formal socialization of ideas makes sense. That happened to me on October 30, 2018.

On that day, my team and I launched a massive rewrite to our event ingestion system. On that very same day, another team launched a whole new architecture for product catalogs.

My schedule on the day of the big cutover

Both releases went badly, and what made things even worse was that for ten hours we tried to debug issues in production. (We were a much younger company back then!) We had retros (oh there were retros for sure) and I’ve internally retroed about the actions leading up to and following that day ever since.

And after all those retros, it was clear to me and everyone else that more time on design, more time spent making sure we were solving the right technical problems, and more time making sure people on other teams knew what we were up to could have prevented months and months of pain and wasted effort.

In large part these events motivated the creation of the Architecture Review Board.

A request for permission?

Everyone on the ARB is an experienced engineer, and they bring that expertise to bear on RFC reviews. The ARB also works to indoctrinate our rapidly growing engineering team with the communication etiquette needed to support the RFC process at scale.

In my capacity as an ARB member, I go into each review, each interaction, serving as a supportive colleague. As a default reviewer on every RFC, the ARB has the authority to block a project. That might seem weird since we have a request for comments process, not a request for permission. Even though we’ve rarely had an RFC blocked by the ARB and simultaneously supported by all other reviewers, we’ve found that the authority plays a key role in validating the significance of the RFC process. Without that authority and ARB’s continuous engagement, the quality of the discussions and the RFC process would suffer.

It’s not just the ARB that can block an RFC from proceeding. All of the stakeholders have the responsibility to prevent disastrous choices from impacting future systems. Their approval signifies that their systems, either as consumers of or sources to the RFC’d system, will work well with the proposed changes. Authors must address questions of increased load, reliability, security and scaling (especially during peak holiday season) in order to gain approval.

What makes an RFC effective?

The RFC process formalizes high quality communication. RFC etiquette includes:

Know your responsibilities and own them.

Authors are responsible for sharing their RFC early enough for teams to review the proposal. They should also respond to all comments within a reasonable time — if you’re going to ask someone to read your admittedly often dry document, they deserve the same respect and responsiveness in return.

Communicate early and often.

Always Write Something goes to the heart of this idea. Get in front of roadblocks by sharing ideas as early as possible. We encourage teams to RFC ideas before they are fully baked — the feedback they’ll receive could let them explore a direction they wouldn’t otherwise consider and would be reluctant to later.

Assume everyone is operating with everyone’s best interest in mind.

Disagreements are common with RFCs, but they are never about personal issues, slights, offenses or politics.

Klassic Klaviyo Meme

If you have an opinion, make sure you explain it clearly and completely.

As a corollary, if you receive a comment, please return the courtesy with a reply.

We’re all working towards a better Klaviyo application.

Any one engineer’s “ownership” of code or service is really a temporary stewardship. The corporation owns the code and it’s our job to make sure it’s the highest quality possible.

Stick to a consistent routine.

The ARB developed and supported several strategies to foster this healthy etiquette. For example, I developed a lightweight Django app to manage scheduling and stakeholder alerting (on Slack). While it’s not necessary that an ARB exists to manage the RFC process, it is essential that some individual or group own the process and make sure everyone remains committed. The additional overhead to create this app was well worth the efficiency gains we’ve seen from establishing a routine. Though adhering to the process may feel like an administrative chore, doing so consistently makes managing the process easier in the long run.

Screenshot from our RFC Bot in Slack

A real example

But of course none of that process will be of any help unless the underlying content is good. Here are a few examples from our 2021 Deployment Trains RFC authored by our Velocity SRE team. I’ve chosen that RFC because you can read about the system in our Thomas the Deployer blog post.

Here was the problem statement. Short, sweet, and clear:

As the engineering organization grows, we continue to have more and more people who wish to deploy every day. We need a better way to deploy changes that doesn’t add a significant amount of time for every ticket.

Here was the background. It’s clear and starts to draw lines around the scope of this RFC.

The following snippet, from a few pages into the document, shows the author using a scenario so that reviewers can easily grasp what’s being proposed.

There are lots of ways to do deployment and anyone reading the RFC should be thinking not just about the micro details but if the overall approach makes sense. That’s where it’s useful to read about the alternatives considered by the authoring team. In this RFC there were six top-level bullets in this section, ranging from minor alternatives such as the one shown in the screenshot below to using commercial and open source tools that the team investigated

Finally, the comments and discussions are where the best content lives. They’re invaluable in heading off confusion before any in-person discussions take place. Example:

In the comments, experienced engineers drop gems and words of wisdom such as “don’t test in production.” Comments also drive exploration into important philosophical questions. Here’s an example of the RFC motivating a discussion of the build vs. buy question:

This Deployment Train RFC had 24 reviewers. Keeping as much of the conversation asynchronous as possible helped the reviewers have a productive discussion in person. To organize in person discussions, we have a template section for Open Questions. This keeps everyone on task while holding a productive conversation, and it helps everyone involved achieve the goal of developing the strongest plan possible.

Conclusion

I always write something, because with each RFC, I gain more than I put into the process. Writing thoughts down makes those thoughts crisper and helps me plan. Plus, once written, I can solicit feedback at scale. And, most importantly, feedback sometimes reveals pitfalls that I would have otherwise learned about the hard way months later.

At the same time, socializing RFCs unlocks value for Klaviyo as a whole. My colleagues gain knowledge and insight, and since RFCs are open to everyone, people can learn and contribute even if they’re not involved in a particular system. RFCs are also a type of audit trail of decisions made. (How many times have you come across legacy code and asked “but why?”)

A template and a written process aren’t good enough. An individual or group must maintain quality, nurture the RFC culture, and own updating the process as the organization grows. At Klaviyo, that’s our ARB.

When I joined Klaviyo, I was surprised — especially compared to academia — at how open we were about information. We were encouraged to work transparently, there were dashboards with metrics all over the office, our calendars were all open, and lots of fascinating business information was shared at our weekly company-wide By The Numbers meetings. A few years ago, those ideas were formalized in our official company values, among them collaborate radically, always learning, remarkable work. Our RFC process reflects and reinforces these values. Over 600 RFCs after John’s initial one, I’m proud that writing RFCs has become such an important part of our engineering culture.

--

--