- What tasks take longer than they should?
- What is the most embarrassing part of our code to explain to new hires?
- What key pieces of our code have we under-invested in?
Not guaranteed. ↩
- discuss potential process improvements to the interviewing process
- review borderline interviews with the group to review together, if your interviewing tool supports recording and playback
- have interviewers read through feedback their candidates got from onsite interviewers and look for consistent patterns.
- Was this statistically significant, or are you conducting pseudoscience? Definitely pseudoscience. Folks in the sample were conducting about 10 interviews a month, ~25 per quarter. Perhaps not yet ready to publish in Nature but meaningful enough to infer from, especially considering the relatively low cost of being wrong.
- Why didn’t the on-site pass rate double, as predicted? First, not all of the top folks ended up joining the team. Second, the best performers did well because of a combination of skill (great interviewers, friendly, high signal) and luck (got better candidates). Luck is fleeting, resulting in a regression to the mean.
- What size does this start to make sense at? Early on, you should just identify who you believe your best interviewers are and have them (or yourself) do all the phone screens. Then, once you start hiring rapidly enough that you are doing about 5-10 phone screens a week, run the numbers and invite your best 2-3 onsite interviewers to join and create the team.
- What did you do for specialized engineering roles? They had their own dedicated processes. Data Science ran a take home, Front-End engineers had their own Phone Screen sub-team, and Data and ML Engineers went through the general full-stack engineer phone screen.
- Didn’t shrinking your Phone Screener pool hurt your diversity? In fact, the opposite happened. First, the phone screener pool had a higher percentage of women than the engineering organization at the time; second, a common interviewing anti-pattern is “hazing” - asking difficult questions and then rejecting somebody for “not even remembering about Kahn’s algorithm, lolz.” The best phone screeners don’t haze, bringing a more diverse group onsite.
- to the edge for things like page views (the server is fine, here, though, if you KISS)
- to the server for events that have consequences, like button presses.
- to publishers for paid traffic conversion, inform Google/Facebook via their server-side APIs when feasible, instead of trying to load a pixel
We explicitly only changed the infra which served our landing pages, and kept the content - the HTML/CSS/JS - identical. Once the new infra was shown to work, we would begin to experiment with the website itself. ↩
- Why didn’t the issue get caught by unit tests?
- Why didn’t the issue get caught by integration/smoke tests?
- Why didn’t the issue get flagged in Code Review?
- Why didn’t the issue get caught during manual QA?
- If the outage took over an hour to get discovered, why didn’t the monitoring page our on-call?
- trade-off we were aware of this concern but explicitly made the speed-vs-quality trade-off (IE, not adding tests for an experiment). This was tech debt coming back to bite us.
- knowledge gap the person doing the work was not aware that this kind of error was even possible (IE, tricky race conditions, worker starvation)
brain fart now that we look at it, we should have caught this earlier. “Just didn’t get enough sleep that night” kind of thing.
If you keep asking “why” but haven’t gotten to an answer that boils down to one of these, keep going deeper (or get a second opinion).
- Inertia. I have a job and a system that “works” for me. Don’t change what isn’t broken.
- Status. I have a “real job” at a “real company.” Working remote is for weirdos and loners.
- Productivity. Real decisions happen in elevators on the way to lunch. We know how to be efficient and innovative in-person. How the hell do you build a team remotely?
Let us prepare to grapple with the ineffable itself, and see if we may not eff it after all.
– Douglas Adams, Dirk Gently’s Holistic Detective Agency
“Ugh, the codebase is just such a mess,” my new Tech Lead said. “It’s just cruft on top of cruft, never cleaned up, always ‘after the next release’. No wonder we keep getting bug reports faster than we can fix them.”
Not what you want to hear as the freshly-appointed Engineering Manager on a critical team. Leadership expects the team to deliver on key new features, but also, there better not be any voluntary churn.
I went to talk to the Product Manager. “Tech Debt?” he said, “sure, we can tackle some tech debt - but let’s make sure to get some credibility first by hitting our OKRs. It won’t be easy.”
How did it get this bad?
Cut to three years earlier. I was a new hire on that very team. My onboarding buddy - let’s call him Buddy - and I bumped into a strange corner of the codebase.
“Oh weird,” I said. “Should we fix that?”
“I have a strategy for this that you can use,” Buddy said. “When you run into code that seems off, that feels worth fixing, you write the issue down in a separate text file. Then you go do useful work.”
“Oh, I see. And eventually, you get back to the text file and fix the issues?”
“Nope. But at least you’ve written it down.”
Wikipedia describes learned helplessness as “behavior exhibited by a subject after enduring repeated aversive stimuli beyond their control.” Without support, this is how engineers come to feel about tech debt.
When I came back to this team as a manager, I reached out to Buddy, who had left years ago. “The code is crap at Airbnb too,” he told me when we caught up, “but at least they pay well and I don’t have to work very hard.”
So what did you do?
I joined Airbnb.
That’s not true. We tackled the tech debt. We shipped leadership’s key features, hit our OKRs, and cleaned up some terrible, long-overdue-for-deletion no-good code. Within 3 months, the team’s attitude about technical debt had begun to turn around.
Tackling Technical Debt In Three Easy Steps
Step 1. Empower
The biggest reason technical debt exists is because Engineers have internalized that it’s not their job to fix it. Start-up mantras like “focus” and “let small fires burn” have lead to just that - small fires everywhere.
“Get shit done” is a great mantra, but you still have to clean up after yourself.
The fix here is cultural. Make it clear that engineers who identify debt and take time to tackle it are appreciated. Celebrate their work to peers. A friend once created a slack bot that called out any PR that deleted a significant amount of code. Engineers all across the company began striving to get featured.
Now of course, the team does have actual work that needs doing. Empower doesn’t mean “ignore our actual work” - it means, “if you take a Friday to fix something that’s bothering you, I have your back.”
Step 2. Identify
If you’re on a team that hasn’t been rigorous about tackling tech debt, there’s probably lots of it and it’s unclear what could even be done. This is fixable.
Organize a brainstorm with prompts like
This’ll set you up with a solid initial list for your Tech Debt backlog. For more ideas, run your codebase through a tool like CodeClimate to algorithmically point out the rough spots.
The first time we ran a brainstorm like this, everybody agreed that we had a handful of ideas that were so easy and valuable enough that we should do them right away. Like, that day. It felt like a breath of fresh air. Things are fixable.
Encourage folks to add to the Backlog anytime they ran into annoyances and didn’t have time to fix it right there and then. In future team retros or brainstorms, identify any tech debt that comes up and add it onto the backlog.
Step 3. Prioritize
Having a tech debt backlog and ignoring it is worse than none at all.
Time to play Product Manager and use ICE to prioritize your tech debt on effort required to fix, impact that a fix would have on velocity, and confidence that the fix will actually work.
This gives you a list of potential projects. Some will take months; others, hours.
Now they just need to get done. That’ll require buy-in from your Product Manager.
When it’s time to have “the talk” with your PM, I‘ve found “how often should you clean your room” to be useful analogy.
Never cleaning your room is a bad idea and obviously so. Over time it becomes unlivable. This is how our engineers feel. At the same time, if you’re cleaning your room all day every day, that’s not a clean room, that’s excessive and no longer helpful. In moderation, messiness is healthy - it means you’re prioritizing. We don’t need a glistening-clean room, but we do need to do more than nothing. At the end of the day, a clean room is a productive room.
Tackling Small Debt
Come together with your Product Manager and agree to a rate at which small debt projects can get added to the team’s ticket queue. With a spiel like the above, you can hopefully get ~10% of all work done to focus on debt, depending on the maturity of the team and the company.
For ~week-long projects, try to leverage particular times of year like Hack Weeks and pitch high-value projects to engineers looking for a fun project.
Tackling Heavy Debt
This is where good leadership helps. At this particular company, Engineering leadership had rolled out “Quality OKRs”. Every quarter, each team had to sign up for a meaningful “quality” OKR goal.
What is a “quality” goal? This was left up to teams, but the gist of it was, just go fix the most painful thing that isn’t already translated in your business metrics.
During quarterly planning, we whittled the top three “heavy tech debt” projects into proposals, got buy-in from leadership, then brought the ideas back to the group.
Since quality projects had been blessed from top-down and indisputable, the PM had air cover to support the work without pushback.
So what happened?
Was there still tech debt? Yes. Did it continue to accumulate? Of course. But did it feel inexorable? Not anymore.
“The new VP wants us to double engineering’s headcount in the next six months. If we have a chance in hell to hit the hiring target, you seriously need to reconsider how fussy you’ve become.”
It’s never good to have a recruiter ask engineers to lower their hiring bar, but he had a point. It can take upwards of 100 engineering hours to hire a single candidate, and we had over 50 engineers to hire. Even with the majority of the team chipping in, engineers would often spend multiple hours a week in interviews. Folks began to complain about interview burnout.
Also, fewer people were actually getting offers; the onsite pass rate had fallen by almost a third, from ~40% to under 30%. This meant we needed even more interviews for every hire.
Visnu and I were early engineers bothered most by the state of our hiring process. We dug in. Within a few months, the onsite pass rate went back up, and interviewing burnout receded.
We didn’t lower the hiring bar, though. There was a better way.
Introducing: the Phone Screen Team
We took the company’s best technical interviewers and organized them into a dedicated Phone Screen Team. No longer would engineers be assigned between onsite interviews and preliminary phone screens at recruiting coordinators’ whims. The Phone Screen Team specialized in phone screens; everybody else did onsites.
Why did you think this would be a good idea?
Honestly, all I wanted at the start was to see if I was a higher-signal interviewer than my buddy Joe. So I graphed people’s phone screen pass rate against how those candidates performed in their onsite pass rate.
Joe turned out to be the better interviewer. More importantly, I stumbled into the fact that a number of engineers doing phone screens performed consistently better across the board. They both had more candidates pass their phone screens and then those candidates would get offers at a higher rate.
These numbers were consistent, quarter over quarter. As we compared the top quartile of phone screeners to everybody else, the difference was stark. Each group included a mix of strict and lenient phone screeners; on average, both groups had a phone screen pass rate of 40%.
The similarities ended there: the top quartile’s invitees were twice as likely to get an offer after the onsite (50% vs 25%). These results also were consistent across quarters.
Armed with newfound knowledge of phone screen superforecasters, the obvious move was to have them do all the interviews. In retrospect, it made a ton of sense that some interviewers were “just better” than others.
A quarter after implementing the new process, the “phone screen to onsite” rate stayed constant, but the “onsite pass rate” climbed from ~30% to ~40%, shaving more than 10 hours-per-hire (footnote 2). Opendoor was still running this process when I left several years later.
You should too (footnote 3, footnote 4).
Starting your own Phone Screen Team
1. Identifying Interviewers (footnote 5)
Get your Lever or Greenhouse (or ATS of choice) into an analyzable place somewhere, and then quantify how well interviewers perform. There’s lots of ways to analyze performance; here’s a simple approach which favors folks who generated lots of offers from as few as possible onsites and phone screens.
You can adjust the constants to where zero would match a median interviewer. A score of zero, then, is good.
Your query will look something like this:
|Accurate Alice||20||5||3||(45 - 20 - 20) / 20 = 0.25|
|Friendly Fred||20||9||4||(60 - 36 - 20) / 20 = 0.2|
|Strict Sally||20||4||2||(30 - 16 - 20) / 20 = -0.3|
|Chaotic Chris||20||10||3||(45 - 40 - 20) / 20 = -0.75|
|No Good Nick||20||12||2||(30 - 48 - 20) / 30 = -1.9|
Ideally, hires would also be included in the funnel, since a great phone screen experience would make a candidate more likely to join. I tried including them; unfortunately, the numbers get too small and we start running out of statistical predictive power.
2. Logistics & Scheduling
Phone Screen interviewers no longer do onsite interviews (except as emergency backfills). The questions they ask are now retired from the onsite interview pool to avoid collisions.
Ask the engineers to identify and block off 4 hour-long weekly slots to make available to recruiting (recruiting coordinators will love you). Use a tool like youcanbook.me or calendly to create a unified availability calendar. Aim to have no more than ~2.5 interviews per interviewer per week. To minimize burnout, one thing we tried was to take 2 weeks off interviewing every 6 weeks.
To avoid conflict, ensure that interviewers’ managers are bought in to the time commitment and incorporate their participation during performance reviews.
3. Onboarding Interviewers
When new engineers join the company and start interviewing, they will initially conduct on-site interviews only. If they perform well, consider inviting them into the phone screen team as slots open up. Encourage new members to keep the same question they were already calibrated on, but adapt it to the phone format as needed. In general, it helps to make the question easier and shorter than if you were conducting the interview in person.
When onboarding a new engineer onto the team, have them shadow a current member twice, then be reverse-shadowed by that member twice. Discuss and offer feedback after each shadowing.
4. Continuous Improvement
Interviewing can get repetitive and lonely. Fight this head-on by having recruiting coordinators add a second interviewer (not necessarily from the team) to join 10% or so of interviews and discuss afterwords.
Hold a monthly retrospective with the team and recruiting, with three items on the agenda:
Eventually, interviewers may get burnt out and say things like “I’m interviewing way more people than others on my actual team - why? I could just go do onsite interviews.” This probably means it’s time to rotate them out. Six months feels about right for a typical “phone screen team” tour of duty, to give people a rest. Some folks may not mind and stay on the team for longer.
Buy exclusive swag for team members. Swag are cheap and these people are doing incredibly valuable work. Leaderboards (“Sarah interviewed 10 of the new hires this year”) help raise awareness. Appreciation goes a long way.
Also, people want to be on teams with cool names. Come up with a cooler name than “Phone Screen Team.” My best idea so far is “Ambassadors.”
There’s something very Dunder Mifflin about companies that create Growth Engineering organizations to micro-optimize conversion, only to have those very growth engineers struggle to focus due to interview thrash from an inefficient hiring process. These companies invest millions into hiring, coaching and retaining the very best sales people. Then they leave recruiting - selling the idea of working at the company - in the hands of an engineer that hasn’t gotten a lick of feedback on their interviewing since joining two years ago, with a tight project deadline on the back of her mind.
If you accept the simple truth that not all interviewers are created equal, that the same rigorous quantitative process with which you improve the business should also be used to improve your internal operations, and if you’re trying to hire quickly, you should consider creating a Technical Phone Screen Team.
FAQs, Caveats, and Pre-emptive Defensiveness
I’m here to warn you about the dangers of front-end user tracking. Not because Google is tracking you, but because it doesn’t track you quite well enough.
What follows is a story in three parts: the front-end tracking trap I fell into, how we dug ourselves out, and how you can go around the trap altogether.
Part 1: A Cautionary Tale
The year was 2019. Opendoor was signing my paychecks.
We were launching our shiny new homepage.
We had spent a month migrating our landing pages from the Rails monolith to a shiny new Next.JS app. The new site was way faster and would therefore convert better, saving us millions of dollars annually in Facebook and Google ad costs.
Being responsible, we ran the roll-out as an A/B test, sending half of the traffic to the old site so we could quantify our impact1.
The impact we’d made was making things worse. Way worse. The new site got crushed.
WTF. Google had told us our new page was way better. The new site even felt snappier.
“Figure it out.” The engineers on revamp paired up with a Data Scientist and went to go figure out what the hell was going on. They started digging into every nook and cranny of the relaunch.
A week went by. Our director peeked in curiously. Murmurs about postponing the big launch started to circle. Weight was gained; hair was lost.
Ultimately, the clue that cracked the case was bounces. Bounces (IE, people leaving right away) were way up on the new site. But it was clear the new site loaded much faster. Bounce rates should have gone down, not up.
How did we measure bounce rates? We dug in.
How bounces work
When the homepage loads, the front-end tracking code records a ‘page view’ event. If the ‘page view’ event was recorded, but then nothing else happens, analytics will consider that user to have “bounced”.
It turned out that the old site was so slow that many folks left before their ‘page view’ ever got recorded. In other words, the old site was dramatically under-reporting bounces.
It was like comparing two diet plans and saying the one where half the subjects quit was better because the survivors tended to lose weight.
Part 2: How we fixed bounces
If the front-end was under-reporting bounces, could we find a way to track a ‘page view’ without relying on the client?
There was. It was on the server - though in our example, we tracked the event in Cloudflare, which we were already using for our A/B test setup.
We started logging a
page-about-to-be-viewed event instead of the
page view event, which was really
Lo and behold, the new infra was better after all! We had been giving our old page too much credit this entire time, but nobody was incentivized to cry wolf.
Part 3: Front-end tracking done right
Forsake the front-end. Tis a terrible place to track things, for at least three reasons.
We calculated that getting rid of Segment and Google Tag Manager on our landing pages would yield about 10-15 points of Google PageSpeed. Google takes PageSpeed into account for Quality Score, which in turn makes your CPMs/CPC cheaper.
Somewhere between a half and a quarter of all users have ad-blockers set up. If you’re relying on a pixel event to inform Google / Facebook of conversions, you’re not telling them about everybody. This makes it harder for their machine learning to optimize which customers to send your way. Which means you’re paying more for the same traffic.
What should i do instead?
Take all your client-side tracking, and move it
FAQs & Caveats
Does your approach break identifying & cookie-ing users, so you can retarget effectively?
Shouldn’t. We used Segment to
identify anonymous users; the change was just calling
.identify() in Cloudflare (and handling the user cookie there).
I heard server-side conversion tracking for google and facebook doesn’t perform as well.
I’ve heard (and experienced) this as well. We’re entering black magic territory here… try it.
Want to tell me I’m misinformed / on-point / needed? Hit me up.
“Alexey, do you feel the points you bring up during our post-mortems are productive?” my tech lead asked at our 1:1.
Well, shit. I had thought so, but apparently not.
Earlier in the year, I became the Engineering Manager on a team responsible for half of the outages at our 2,000 person company. After each incident, the on-call engineer would write-up a doc and schedule a meeting.
“How come this wasn’t caught in unit tests?” I found myself asking, in front of the assembled team. Next post-mortem, same thing. “I get that we didn’t have monitoring for this particular metric, but why not?” Week after week.
The tech lead had asked a great question. Was my approach working?
“I want to set high expectations,” I told him. “It’s not pleasant being critiqued in a group setting, but my hope is that the team internalizes my ‘good post-mortem’ bar.”
The words sounded wrong even as I said them.
“Thanks for the feedback.” I said “Let me think on it.”
I thought about it.
There’s a limited budget for criticism one can ingest productively in a single sitting. Managers will try to extend this budget through famed best-practices like the shit-sandwich and the not-really-a-question question. Employees learn these approaches over time and develop an immunity.
This happened here. Once my questioning reached the criticism threshold, I was no longer “improving the post-mortem culture.” I was “building resentment and defensiveness”.
I had run over budget. And yet, there was important feedback to give!
Change the template, change the world
Upon reflection, I ended up updating our post-mortem template. My questions became part of the template that got filled in before meeting.
This way, it was the template pestering the post-mortem author. My role was simply to insist that the template be filled out; an entirely reasonable ask.
Surprisingly enough, this worked; post-mortems became more substantive. The team pared down outage frequency and met OKR goals.
One Simple Trick I had stumbled into was that there was a way to get around feedback budgets. Turns out there’s this other, vaster budget to tap into: the budget of process automation. When feedback is automated, it arrives sooner, feels confidential, and lacks judgement. This makes it palatable; this is why the budget is vaster.
The technical analogy here is how we use linters. “Nit: don’t forget to explicitly handle the return value” during code review feels mildly frustrating. Ugh. It’s “just a style thing” and “the code works”. I’ll make the change, but with slight resentment.
Yet, if that same “unhandled return value” nudge arrives in the form of a linter, it’s a different story. I got the feedback before submitting the code for review; no human had to see my minor incompetence.
As a software engineer, Have Good Linters is an obvious, uncontroversial best practice. The revelatory moment for me was that templates for documents were just another kind of linter.
My insight completely transformed the way Opendoor Engineering thinks about feedback; I crowd-surfed, held aloft by the team’s grateful arms, to receive my due praise as the master of all process improvement.
Just kidding; COVID-19 happened and I switched jobs.
The Appendices Three
I: Process linters seen in the wild
Feedback “we have too many meetings”; “what’s the point of this meeting”; “do I need to be here” Linter mandate no-meetings days; mandate agendas; mandate a hard max on attendee count.
Feedback “Hey, how’s that project going? Haven’t heard from you in a bit” Linter Daily stand-ups (synchronous or in slack/an app); issue trackers (Linear, Asana, Jira, Trello)
Feedback “Hey, a friend who uses the app said that our unsubscribe page is broken?” Linter Quality pre-deploy test coverage, automated error reporting (Sentry), Alerting on pages or business metrics having anomalous activity patterns (Datadog).
II: You’ve gone too far with this process crap
The process budget is vaster than the feedback budget, but it isn’t unlimited. A mature company is going to have lots of legacy process - process debt, if you will.
Process requires maintenance and pruning, to avoid “we do this because we’ve always done this” type problems. High-process managers are just as likely to generate unhappy employees as high-feedback managers.
III: The post-mortem template changes, if that’s what you’re here for
A. “5 Whys” Prompts
Our original 5 Whys prompt was “Why did this outage occur.” During the post-mortem review, I kept asking questions like “but why didn’t this get caught in regression testing?”
So, after discussion, I added my evergreen questions to the post-mortem template. They are:
B. Defining “Root Cause”
“5 Whys” recommends continuing to ask why until you’re about five levels deep. We were often stopping at one or two.
To make stopping less ambiguous, here are a set of “root causes” that I think are close to exhaustive:
Working remotely: not just for COVID-19 anymore.
From academia to the Open Source movement, remote collaboration is not exactly novel. From Github to DuckDuckGo, remote-first successful businesses are no longer rare.
Remote work occupied the cultural relevance of something those whiz kids do, working on a beach in Thailand. Until today.
What has prevented remote work adoption?
COVID-19 addressed inertia. Twitter, a Legitimate Tech Company™️, addressed status. Productivity is still pretty hit-or-miss, but it’s early days yet.
Trust me, I’m an
expert guinea pig
I came to the Bay Area like a moth to the flame of start-ups after college.
Lacking an H-1B, I left in 2014, and started Hacker Paradise, a boutique travel business catered to remote workers. During that time, I worked remotely as a software engineer for an SF client for over a year.
In 2016, I came back to San Francisco. I did it for the same reason I came in the first place - the quality of the jobs and the community.
I don’t love working remotely. I miss the energy I get from a well-run office environment. COVID-19 has taught me that I’m an extrovert, and shelter-in-place has sucked.
Reality is that which, when you stop believing in it, doesn’t go away
- Philip K. Dick.
1. Remote work is about to do to San Francisco what San Francisco did to the South Bay.
Techies have been leaving San Francisco for cities like Portland, Austin, Denver and Seattle (PADS) for over a decade. We seek a place to settle down and improve cost-to-quality-of-life ratio.
As it becomes easier than ever to keep your job while moving, this trend will pick up. The microkitchens at Twitter are nice, but not “an extra $2k/mo in rent” nice.
As ambitious millennials realize they can optimize for quality of life, expect businesses like Culdesac and Hacker Paradise (see what I did there) to blossom.
San Francisco will neither be abandoned quickly nor completely. It’ll retain a “city emeritus” status, like London, Philadelphia or Palo Alto.
2. We’re going to learn to run remote organizations.
We are, as an industry, pretty clueless at remote company building.
In fairness, we’re not even that good at regular company building. We just figured out 1:1s were a good idea and are still deciding who dotted-lines to whom in the matrix org structure.
You can’t just slap “remote friendly” on your jobs page and decide you’ve done a “heck of a job.”
Should team sizes change? How to you measure and manage morale? What about onboarding and knowledge sharing?
There’s some knowledge to be gleaned from the Basecamp folks; they wrote a book about remote work. That book is the COBOL of remote work; it works, but we can do better.
3. We’re about to enter a renaissance in remote collaboration tooling.
Slack is doing phenomenally well during COVID-19, but it still kind of sucks, right? At least when people emailed you, they didn’t expect a response right away. Etiquette around Slack usage is still pretty immature, am I right @channel?
Also. Software engineers use Github, and it’s a pretty mature way to collaborate on shared work. Every other industry is still sending around
Presentation Final Final .pdf. Some of my favorite former co-workers are building companies in this space. expect more businesses like Figma and Loom to blossom.
PS. You’re wrong, Alexey. Offices are here to stay.
That’s true. They will.
We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run
- Amara’s Law
The move to remote-as-mainstream-option will take a good decade or two. The inflection point, however, was today, on May 12, 2020.
I for one look forward to kids asking what it was like when I had to leave for work every day.