Many startups experience a chicken-and-egg problem with growth: they want to run experiments to gain more volume, but lack the volume for experiments to be practical.

    This guide helps companies determine if (and how) they could staff a Growth team to start running experiments. First, we’ll diagnose what is achievable with the amount of traffic currently available. Then, we’ll dig into what techniques might be available to increase experimental power.

    As any proper guide ought to, this one includes a handy flowchart.

    Part 1: Do we have the numbers?

    Consider an imaginary startup in turn considering funding a checkout optimization team.

    Being well acquainted with power analysis, this startup gets its Minimal Detectable Effect - IE, how big of a swing would it need to.

    Plugging their numbers into a swing size calculator, they learn that nothing less than a 20% win be detectable today.

    Even 10% is a high bar to cross for winning experiments.

    Gut Check Especially if you’re off by quite a bit, this is a chance to take a step back and ask whether the company has reached growth scale or not. It could be that there are plenty of obvious 0-1 tactics left. Not everything has to be an experiment.

    In cases where experimentation remains the right choice, let’s find ways to adjust your experiment designs to be able to detect a 10% winner within a reasonable timeframe.

    Part 2: Boosters

    Things to try if you need to get just a little bit closer to get to stat-sig:

    Bigger Bets: combine ideas together, merging by theme

    The first thing to try is combining certain ideas with a common theme together. For example, if you believe emphasizing your transparent pricing will help conversion and are considering experiments to display your transparent pricing prominently on your homepage, during checkout, and in your cart abandonment emails.

    Instead of running 3 experiments which might (for example) each move conversion by a bit, run them as a single “emphasize transparent pricing everywhere” mega-experiment so it has a better shot to hit the 10% detectable threshold you need.

    The trade-off is reduced learning quality. When productionizing a batched win, it’s harder to tell which part of the change was actually effective. In the near term, this is tolerable - when the company gets to a larger scale and you revisit this theme, you can always tease out the underlying causes through follow-up experiments.

    Best Foot Forward: Use fewer variants, A/B/C/D become A/B

    You need a certain number of visitors to both your “winning variant” and “control” to determine the result. Every variant you add reduces the oxygen flow to your winning variant.

    If you’re tight on traffic, stick to two variants.

    The trade-off is, **this reduces your win rate, since you’re not getting to try as many options.

    Run in Parallel: A/B/C/D becomes A/B, A/C, and A/D

    We know by now that combining ideas together gives them the best shot at being sufficiently big winners. Some ideas, however, are dangerous enough to merit greater care.

    Consider bodybuilding. You are probably fine combining “drink milk,” “eat raw eggs,” and “get good sleep” all at once, since these are relatively common approaches. However, if you decide to start injecting experimental supplements, you probably want to go one supplement at a time. If the side effects end up harmful, you want to be crystal clear on what is causing them.

    The same is true for particularly sensitive experiments. Consider a test around “combatting high price perception.” **Various alternatives include a value-focused redesign of the pricing page, a free trial, and a price drop. **Since the latter two may be net-harmful to revenue, we should test them in isolation carefully.

    Unlike with bodybuilding, testing in isolation does not require slowing down. Using modern experiment frameworks, all 3 of ideas can be safely tested at once, using parallel A/B tests (see chart).

    Even better, as a side effect, you also gain directional evidence about how the ideas interact when they work together (IE, the free trial seems to work much better with the new redesign).

    The trade-off is, running multiple experiments on the same surface area increases the burden on both engineering and analytics. For engineering, more complexity means more tests to reduce the potential for edge cases. For analytics, simultaneous experiments provide an extra require to inspect experiment interactions to avoid drawing false conclusions. When bottlenecked on experiment volume (versus eng or analytics throughput) this can be a worthwhile tradeoff.

    Share Components on High-Throughput Surface Areas

    [hat-tip to Taylor Adams]

    The place that has the most experimental power will generally be at the start of your checkout flow: it is seen by 100% of eventual purchasers, and converts relatively well. Your homepage is probably also a high-throughput surface area, but some customers may skip it and convert straight from other landing pages.

    To get more power for top of funnel experiments, you can either (a) avoid custom landing pages and send almost all your traffic to the homepage or (b) share reusable components between your homepage and landing pages, and run experiments within those shareable components.

    The trade-off of this approach constraints the amount of flexibility you can have on your other landing pages. Early on, it can be quite valuable to avoid constraining landing page formats, since that team needs to be taking its own big swings.

    Run tests for longer: 7 days becomes 2-3 weeks

    Swing Size Calculator

    If you have more traffic, you get a larger sample size, which means you can detect smaller wins easier.

    The trade-off is the likelihood you’ll get to see the experiment through. Especially at earlier stage companies, I rarely see the discipline to actually leave an experiment running longer than a couple of weeks. Inevitably, an executive will pop in and demand an update. “Trending positive?” they’ll say, “Great, let’s just ship it, it’s fine.” And it is fine, except now you’ve committed the cardinal sin of peeking, your likelihood of a false positive has gone up, and you have not truly “learned” anything that will help your experiment roadmap.

    If you often find yourself shipping tests earlier than planned, consider switching to a non-frequentist (fixed-sample) methodology, such as Bayesian or Sequential statistics.

    Get Comfortable with False Positives: p<0.05 becomes p<0.2

    Another tolerable trade-off: in a world where a false positive is harmless, get comfortable setting your Type 1 Error tolerance (IE, how often can I live with a false positive) from its traditional a=0.05 (a 5% false positive) as high as a=.2 (a 20% false positive rate)

    The trade-off is, you shouldn’t do this for risky or significant changes, such as a new pricing strategy. Also, realize that your ability to trust your learnings from experiments is decreased, since there’s a greater chance your “insights” are now coming from noise and not reality.

    Buy Traffic

    [hat-tip to Taylor Adams]

    If the budget is available, you can always ask the paid marketing team to increase their spend during your experiment. A quick traffic boost can get you the numbers you need.

    The trade-off is, newly-acquired prospecting paid traffic tends to be lower intent than your average visitor, so expect some conversion degradation (which may, in turn, make getting to stat-sig harder). Also, there’s probably a reason the paid marketing hasn’t increased their budget yet: the newly-acquired traffic is unlikely to be cost-efficient, and will likely burn money, so try not to pull this card too often.

    Part 3: Desperate Times

    When you’re an order of magnitude (>2x) away from stat-sig, there’s still a few (more aggressive) things to try:

    Target a Proxy Metric

    At Opendoor, our team’s job was to optimize conversion for people selling us their houses. However, a house sale actually closing took months and was not a common event, especially compared to a SAAS or eCommerce setup. Instead, we found a number of proxy metrics (offer viewed, etc) that we could target instead.

    The trade-off is the new layer of indirection. Without being careful, it’s easy to get caught up in moving the proxy in a way that doesn’t help with your actual goal. For example, at Opendoor, shortening the purchase flow would result in more offers viewed, but fewer purchases at the end (since the offers would be less accurate).

    To ensure the proxy metric doesn’t steer you astray, you can:

    1. Find or create higher-signal proxy metrics, such as a lead score model that can more accurately predict eventual conversion than merely the event itself taking place.
    2. Keep an eye on the impact your experiments are having on the true metric at least quarterly.

    Run your Tests on the Paid Layer

    [Hat-tip to Julian - I can’t for the life of me find the original quote, but here’s a good guide]

    If you want to experiment with several headlines or images on your landing pages but don’t have enough traffic, you can test those very same headlines or images on paid marketing ads. The ones that perform best at the paid traffic level are reasonable candidates to do well on your marketing page.

    The trade-off is that **this approach is limited in which parts of your product journey you can test.

    Go Qualitative with Surveys, Paid Feedback and Session Recording

    image credit:

    If you want to gain intuition for what users want - just ask them, or look for yourself!

    There are several ways to get qualitative feedback about variants in your conversion journey:

    1. Use a tool like

    to ask a fraction of customers on the website to take a quick survey about their experience.

    However, customers willing to do surveys are not always representative of your potential users.

    2. Pay testers

    on sites like UserTesting to go through your site (or even just designs iterations) and offer their feedback/see where they get stuck.

    However, if you’re targeting wealthier customers or business purchasers, you may not get representative feedback.

    3. Use screen recording tools

    like Sprig or FullStory and personally watch the experience of tens of customers in your experimental variants.

    However, it’s easy to get fixated on a specific issue that may not be representative or affect many customers.

    Ideally, many of these approaches are often used during design, as a precursor to running an A/B test, but in a pinch, they can be used to in lieu of the quantitative tests.

    YOLO - Just Make the Change

    Ultimately, the simplest thing you can always do is just go ahead and make whatever change it is you believe in. So long as you are right often enough, you should start to see conversion starting to tick up over time.

    This approach also saves a decent amount of engineering effort, since you no longer need to spend eng effort on supporting more than one version of your site or product.

    The trade-off is that without the precision A/B testing, it’s that much harder to know which of your efforts are actually working, and which are neutral-to-harmful. Working pre-post puts you back in the old days of “half of my marketing efforts are wasted, but I don’t know which half.” At the end of the day, favoring conviction and best practices over experimentation is a reasonable approach, especially early on in a company’s lifetime.

    Published: April 04 2023

    From a recent investor update:

    …and 40% of our purchases in Q1 came from organic traffic

    I, uhh, don’t believe you.

    A purchase from organic traffic is:

    1. the customer had the problem your product solves
    2. they googled
    3. they clicked on your website from the organic search results
    4. and then bought it.

    I too would rather live in a world where this was still a thing. But this was probably not what happened.

    What’s the big deal, you pedant?

    One-half of the money I spend for advertising is wasted, but I have never been able to decide which half

    - John Wanamaker, Department Store Magnate, 1919ish

    John Wanamaker, a Department store magnate, said this in 1919 or so. So that’s literally a century year old excuse. You can do better.

    The customer acquisition team’s job is to double down on what works. Somebody on your team deserves a bigger budget - knowing who is part of the job.

    Sure, but that is only a superficial explanation of their “source.” What motivated them to search? Here’s an (in-exhaustive) decision tree of possibilities and initiatives deserving credit.

    • At the point they googled for it, had they heard of your product earlier?
      • Yes - give credit to how they originally heard about it
        • Did they hear/see an ad? ➡️ Paid ad channel
        • Was it from a friend/colleague? ➡️ Word of mouth
        • From an existing user? ➡️ Referral
      • No - credit should go to your SEO & brand efforts. Were they searching for…
        • Your company name explicitly (ie, ➡️ brand effort for domain
        • The problem you solve (keyword) ➡️ SEO / content team.

    First, unless you actually rank for your category’s generic domain name, re-label the “organic” traffic as “unattributed.”

    At that point, there are a few ways to allocate the unattributed conversions:

    Simplest: spread evenly across known channels

    Let’s say your purchases come from 40% paid social, 10% referrals, and 20% SEM. You have 30% left unattributed, so allocate these proportionally.

    distributing organic traffic evenly

    Note: I am terrified of this approach because obviously different channels can have different amounts of “search later” halo effects, but at least it is less wrong than “Organic” and it can be done entirely post-facto.

    Slightly better: Incorporate How did you hear about us? (HDYHAU)

    Once customers have purchased, ask them how they first heard about you, via a HDYHAU survey.

    Now, look, many customers will skip the survey and others just won’t remember, but the answers you do get from the unattributed crowd are still worth applying to your estimates. It’ll help realize things like “oh, most people who are unattributed are actually coming from Instagram - as are a bunch of people we thought were referrals” and adjust your focus accordingly.

    Fancy: Marketing Mix Modeling

    At some point you get big and successful enough that you hire a fancy full-time marketing analytics leader, who will understand this subject way better than either you or me.

    They’ll bring in tactics like incrementality testing, where Google or Meta will start running internal tests for you where they exclude a handful of their userbase from your ads and show you the effect it has. Or you’ll do something simpler like turn off specific channels in specific countries or US states and see how much their purchases fall. They’ll talk you through whether “online advertising is a lie” and help figure out how proportionally distribute credit for any sale across multiple channels, first touch vs last touch, and other fun things of that nature.

    You will eventually reach a scale where this investment will be justified. Until then, just realize there’s (probably) no such thing as organic traffic.

    PS. Only getting harder over time

    Between the ever-increasing popularity of ad-blockers, common pattern of “see it on mobile, buy it on desktop” cross-device purchase behavior, iOS14 and government-driven private regulations, the % of your purchases that are unattributed will only go up over time. If scaling up customer acquisition is important to you, get it right.

    Caveat: Yes, yes, almost everything about the interviewing / recruiting process is broken. Sometimes though, you just have to play the hand you’re dealt and settle for minor improvements.

    The 75-minute HMTPS is my proposed minor improvement.

    Hat tip to The Oatmeal

    What is the HMTPS

    It stands for “Hiring Manager Technical Phone Screen.” Since you asked, I’ve been pronouncing it “ham-tips.” It’s the call a candidate will have after their RPS (Recruiter Phone Screen) but before their onsite.

    This combines two calls - the Technical Phone Screen (TPS), which is a coding exercise, and usually happens before the onsite, and the HMS call, which is a call with the Hiring Manager (your would-be manager), which I’ve seen done before an onsite, or after, or not at all.

    So I combine these into one. It takes 75 minutes.

    Why combine the two interviews?

    An ideal interview loop has as few steps as necessary and gets to a decision ASAP. Combine these two steps to shorten intro-to-offer by ~1 week and reduce candidate drop-off by 5-10%.

    It’s also a lot less work for recruiters playing scheduling battleship1.

    Finally, Hiring Managers will, on average, be better at selling working at the company - it’s kind of their job.

    Why 75 minutes?

    We’re combining a 30-minute call and a 60-minute call, and combining the 15-minute Q&A at the end of each into one.

    TPS (60m)

    • 5m Intros
    • 45m We write some code in Coderpad together
    • 10m Ask me Anything

    HM call (30m)

    • 5m Intros
    • 10m Dig into relevant experience & what candidate wants from next job
    • 15m AMA time.

    HMTPS (75m)

    • 5m Intros
    • 15m Dig into relevant experience & what candidate wants from next job
    • 30m Coderpad
    • 15m AMA time.
    • 10m buffer time (inevitably one of these will go long in an interesting way)

    I’m also more comfortable shortening the ~50 minute technical question into 30 minutes because (a) I’m pretty calibrated on my question, having run it 200+ times at this point, and so can get most of the signal I’m looking for within the first 30 minutes.

    I’ve tried doing this call in 60 minutes and it ends up feeling pretty rushed; not to say somebody else couldn’t pull that off, but I’ve appreciated the bit of space. Also, since most candidates don’t schedule in 15-minute increments, we can always go a little long (up to the 90 minute mark) if we need to.

    Why is this good for the Hiring Manager?

    First, it’s easier to schedule (usually towards the end of the day). Second, it usually gives me enough time with the candidate so that I end up being pretty confident about how they’ll do both at the job and on the onsite. I haven’t quantified this yet, but anecdotally I have been surprised by onsite interviewer feedback much more rarely when I do this.

    Why is this good for the candidate?

    It’s one fewer hoop to jump through. Also, whether or not they get along with me as their future manager - both technically and interpersonally - can and should be a pretty strong determinant as to whether they should continue with the process. This gives stronger signal since we are both coding together and talking about work.

    When is this a bad idea?

    This makes the Hiring Manager a bit of a bottleneck in interviewing; once a company gets to the point where you are interviewing for titles like “Senior Software Engineer, Team TBD” you have to round robin TPS-es to the rest of your Phone Screen Team

    Also, as the HM I likely have some unreasonable biases (Golang engineers, I’m looking at you), and making me the bottleneck in interviewing exacerbates those. That said, the HM’s bias is going to be applied sooner or later in the interview process, and my take is that the benefits outlined are worth it.

    1. Tuesday at 4? You sunk my Grooming Session! 

    Published: April 01 2021

    Let us prepare to grapple with the ineffable itself, and see if we may not eff it after all.

    – Douglas Adams, Dirk Gently’s Holistic Detective Agency

    The Situation

    “Ugh, the codebase is just such a mess,” my new Tech Lead said. “It’s just cruft on top of cruft, never cleaned up, always ‘after the next release’. No wonder we keep getting bug reports faster than we can fix them.”

    Not what you want to hear as the freshly-appointed Engineering Manager on a critical team. Leadership expects the team to deliver on key new features, but also, there better not be any voluntary churn.

    I went to talk to the Product Manager. “Tech Debt?” he said, “sure, we can tackle some tech debt - but let’s make sure to get some credibility first by hitting our OKRs. It won’t be easy.”

    How did it get this bad?

    Cut to three years earlier. I was a new hire on that very team. My onboarding buddy - let’s call him Buddy - and I bumped into a strange corner of the codebase.

    “Oh weird,” I said. “Should we fix that?”

    “I have a strategy for this that you can use,” Buddy said. “When you run into code that seems off, that feels worth fixing, you write the issue down in a separate text file. Then you go do useful work.”

    “Oh, I see. And eventually, you get back to the text file and fix the issues?”

    “Nope. But at least you’ve written it down.”

    Wikipedia describes learned helplessness as “behavior exhibited by a subject after enduring repeated aversive stimuli beyond their control.” Without support, this is how engineers come to feel about tech debt.

    When I came back to this team as a manager, I reached out to Buddy, who had left years ago. “The code is crap at Airbnb too,” he told me when we caught up, “but at least they pay well and I don’t have to work very hard.”

    So what did you do?

    I joined Airbnb.

    That’s not true. We tackled the tech debt. We shipped leadership’s key features, hit our OKRs, and cleaned up some terrible, long-overdue-for-deletion no-good code. Within 3 months, the team’s attitude about technical debt had begun to turn around.

    Here’s how.

    Tackling Technical Debt In Three Easy Steps


    Step 1. Empower

    The biggest reason technical debt exists is because Engineers have internalized that it’s not their job to fix it. Start-up mantras like “focus” and “let small fires burn” have lead to just that - small fires everywhere.

    “Get shit done” is a great mantra, but you still have to clean up after yourself.

    The fix here is cultural. Make it clear that engineers who identify debt and take time to tackle it are appreciated. Celebrate their work to peers. A friend once created a slack bot that called out any PR that deleted a significant amount of code. Engineers all across the company began striving to get featured.

    Now of course, the team does have actual work that needs doing. Empower doesn’t mean “ignore our actual work” - it means, “if you take a Friday to fix something that’s bothering you, I have your back.”

    Step 2. Identify

    If you’re on a team that hasn’t been rigorous about tackling tech debt, there’s probably lots of it and it’s unclear what could even be done. This is fixable.

    Organize a brainstorm with prompts like

    • What tasks take longer than they should?
    • What is the most embarrassing part of our code to explain to new hires?
    • What key pieces of our code have we under-invested in?

    This’ll set you up with a solid initial list for your Tech Debt backlog. For more ideas, run your codebase through a tool like CodeClimate to algorithmically point out the rough spots.

    The first time we ran a brainstorm like this, everybody agreed that we had a handful of ideas that were so easy and valuable enough that we should do them right away. Like, that day. It felt like a breath of fresh air. Things are fixable.

    Encourage folks to add to the Backlog anytime they ran into annoyances and didn’t have time to fix it right there and then. In future team retros or brainstorms, identify any tech debt that comes up and add it onto the backlog.

    Step 3. Prioritize

    Having a tech debt backlog and ignoring it is worse than none at all.

    Time to play Product Manager and use ICE to prioritize your tech debt on effort required to fix, impact that a fix would have on velocity, and confidence that the fix will actually work.

    This gives you a list of potential projects. Some will take months; others, hours.

    Now they just need to get done. That’ll require buy-in from your Product Manager.

    Getting Buy-in

    When it’s time to have “the talk” with your PM, I‘ve found “how often should you clean your room” to be useful analogy.

    Never cleaning your room is a bad idea and obviously so. Over time it becomes unlivable. This is how our engineers feel. At the same time, if you’re cleaning your room all day every day, that’s not a clean room, that’s excessive and no longer helpful. In moderation, messiness is healthy - it means you’re prioritizing. We don’t need a glistening-clean room, but we do need to do more than nothing. At the end of the day, a clean room is a productive room.

    Tackling Small Debt

    Come together with your Product Manager and agree to a rate at which small debt projects can get added to the team’s ticket queue. With a spiel like the above, you can hopefully get ~10% of all work done to focus on debt, depending on the maturity of the team and the company.

    For ~week-long projects, try to leverage particular times of year like Hack Weeks and pitch high-value projects to engineers looking for a fun project.

    Tackling Heavy Debt

    This is where good leadership helps. At this particular company, Engineering leadership had rolled out “Quality OKRs”. Every quarter, each team had to sign up for a meaningful “quality” OKR goal.

    What is a “quality” goal? This was left up to teams, but the gist of it was, just go fix the most painful thing that isn’t already translated in your business metrics.

    During quarterly planning, we whittled the top three “heavy tech debt” projects into proposals, got buy-in from leadership, then brought the ideas back to the group.

    Since quality projects had been blessed from top-down and indisputable, the PM had air cover to support the work without pushback.

    So what happened?

    Was there still tech debt? Yes. Did it continue to accumulate? Of course. But did it feel inexorable? Not anymore.

    1. Not guaranteed. 

    Published: December 05 2020

    Originally published as a guest blog post on Thanks Aline!

    “The new VP wants us to double engineering’s headcount in the next six months. If we have a chance in hell to hit the hiring target, you seriously need to reconsider how fussy you’ve become.”

    It’s never good to have a recruiter ask engineers to lower their hiring bar, but he had a point. It can take upwards of 100 engineering hours to hire a single candidate, and we had over 50 engineers to hire. Even with the majority of the team chipping in, engineers would often spend multiple hours a week in interviews. Folks began to complain about interview burnout.

    Also, fewer people were actually getting offers; the onsite pass rate had fallen by almost a third, from ~40% to under 30%. This meant we needed even more interviews for every hire.

    Visnu and I were early engineers bothered most by the state of our hiring process. We dug in. Within a few months, the onsite pass rate went back up, and interviewing burnout receded.

    We didn’t lower the hiring bar, though. There was a better way.

    Introducing: the Phone Screen Team

    We took the company’s best technical interviewers and organized them into a dedicated Phone Screen Team. No longer would engineers be assigned between onsite interviews and preliminary phone screens at recruiting coordinators’ whims. The Phone Screen Team specialized in phone screens; everybody else did onsites.

    Why did you think this would be a good idea?

    Honestly, all I wanted at the start was to see if I was a higher-signal interviewer than my buddy Joe. So I graphed people’s phone screen pass rate against how those candidates performed in their onsite pass rate.

    Joe turned out to be the better interviewer. More importantly, I stumbled into the fact that a number of engineers doing phone screens performed consistently better across the board. They both had more candidates pass their phone screens and then those candidates would get offers at a higher rate.

    Sample Data, recreated for Illustrative Purposes.

    These numbers were consistent, quarter over quarter. As we compared the top quartile of phone screeners to everybody else, the difference was stark. Each group included a mix of strict and lenient phone screeners; on average, both groups had a phone screen pass rate of 40%.

    The similarities ended there: the top quartile’s invitees were twice as likely to get an offer after the onsite (50% vs 25%). These results also were consistent across quarters.

    Armed with newfound knowledge of phone screen superforecasters, the obvious move was to have them do all the interviews. In retrospect, it made a ton of sense that some interviewers were “just better” than others.

    A quarter after implementing the new process, the “phone screen to onsite” rate stayed constant, but the “onsite pass rate” climbed from ~30% to ~40%, shaving more than 10 hours-per-hire (footnote 2). Opendoor was still running this process when I left several years later.

    You should too (footnote 3, footnote 4).

    Starting your own Phone Screen Team

    1. Identifying Interviewers (footnote 5)

    Get your Lever or Greenhouse (or ATS of choice) into an analyzable place somewhere, and then quantify how well interviewers perform. There’s lots of ways to analyze performance; here’s a simple approach which favors folks who generated lots of offers from as few as possible onsites and phone screens.


    You can adjust the constants to where zero would match a median interviewer. A score of zero, then, is good.

    Your query will look something like this:

    Interviewer Phone Screens Onsites Offers Score
    Accurate Alice 20 5 3 (45 - 20 - 20) / 20 = 0.25
    Friendly Fred 20 9 4 (60 - 36 - 20) / 20 = 0.2
    Strict Sally 20 4 2 (30 - 16 - 20) / 20 = -0.3
    Chaotic Chris 20 10 3 (45 - 40 - 20) / 20 = -0.75
    No Good Nick 20 12 2 (30 - 48 - 20) / 30 = -1.9

    Ideally, hires would also be included in the funnel, since a great phone screen experience would make a candidate more likely to join. I tried including them; unfortunately, the numbers get too small and we start running out of statistical predictive power.

    2. Logistics & Scheduling

    Phone Screen interviewers no longer do onsite interviews (except as emergency backfills). The questions they ask are now retired from the onsite interview pool to avoid collisions.

    Ask the engineers to identify and block off 4 hour-long weekly slots to make available to recruiting (recruiting coordinators will love you). Use a tool like or calendly to create a unified availability calendar. Aim to have no more than ~2.5 interviews per interviewer per week. To minimize burnout, one thing we tried was to take 2 weeks off interviewing every 6 weeks.

    To avoid conflict, ensure that interviewers’ managers are bought in to the time commitment and incorporate their participation during performance reviews.

    3. Onboarding Interviewers

    When new engineers join the company and start interviewing, they will initially conduct on-site interviews only. If they perform well, consider inviting them into the phone screen team as slots open up. Encourage new members to keep the same question they were already calibrated on, but adapt it to the phone format as needed. In general, it helps to make the question easier and shorter than if you were conducting the interview in person.

    When onboarding a new engineer onto the team, have them shadow a current member twice, then be reverse-shadowed by that member twice. Discuss and offer feedback after each shadowing.

    4. Continuous Improvement

    Interviewing can get repetitive and lonely. Fight this head-on by having recruiting coordinators add a second interviewer (not necessarily from the team) to join 10% or so of interviews and discuss afterwords.

    Hold a monthly retrospective with the team and recruiting, with three items on the agenda:

    • discuss potential process improvements to the interviewing process
    • review borderline interviews with the group to review together, if your interviewing tool supports recording and playback
    • have interviewers read through feedback their candidates got from onsite interviewers and look for consistent patterns.

    5. Retention

    Eventually, interviewers may get burnt out and say things like “I’m interviewing way more people than others on my actual team - why? I could just go do onsite interviews.” This probably means it’s time to rotate them out. Six months feels about right for a typical “phone screen team” tour of duty, to give people a rest. Some folks may not mind and stay on the team for longer.

    Buy exclusive swag for team members. Swag are cheap and these people are doing incredibly valuable work. Leaderboards (“Sarah interviewed 10 of the new hires this year”) help raise awareness. Appreciation goes a long way.

    Also, people want to be on teams with cool names. Come up with a cooler name than “Phone Screen Team.” My best idea so far is “Ambassadors.”


    There’s something very Dunder Mifflin about companies that create Growth Engineering organizations to micro-optimize conversion, only to have those very growth engineers struggle to focus due to interview thrash from an inefficient hiring process. These companies invest millions into hiring, coaching and retaining the very best sales people. Then they leave recruiting - selling the idea of working at the company - in the hands of an engineer that hasn’t gotten a lick of feedback on their interviewing since joining two years ago, with a tight project deadline on the back of her mind.

    If you accept the simple truth that not all interviewers are created equal, that the same rigorous quantitative process with which you improve the business should also be used to improve your internal operations, and if you’re trying to hire quickly, you should consider creating a Technical Phone Screen Team.

    FAQs, Caveats, and Pre-emptive Defensiveness

    1. Was this statistically significant, or are you conducting pseudoscience? Definitely pseudoscience. Folks in the sample were conducting about 10 interviews a month, ~25 per quarter. Perhaps not yet ready to publish in Nature but meaningful enough to infer from, especially considering the relatively low cost of being wrong.
    2. Why didn’t the on-site pass rate double, as predicted? First, not all of the top folks ended up joining the team. Second, the best performers did well because of a combination of skill (great interviewers, friendly, high signal) and luck (got better candidates). Luck is fleeting, resulting in a regression to the mean.
    3. What size does this start to make sense at? Early on, you should just identify who you believe your best interviewers are and have them (or yourself) do all the phone screens. Then, once you start hiring rapidly enough that you are doing about 5-10 phone screens a week, run the numbers and invite your best 2-3 onsite interviewers to join and create the team.
    4. What did you do for specialized engineering roles? They had their own dedicated processes. Data Science ran a take home, Front-End engineers had their own Phone Screen sub-team, and Data and ML Engineers went through the general full-stack engineer phone screen.
    5. Didn’t shrinking your Phone Screener pool hurt your diversity? In fact, the opposite happened. First, the phone screener pool had a higher percentage of women than the engineering organization at the time; second, a common interviewing anti-pattern is “hazing” - asking difficult questions and then rejecting somebody for “not even remembering about Kahn’s algorithm, lolz.” The best phone screeners don’t haze, bringing a more diverse group onsite.