Three hundred programming interviews in thirty days

We launched Triplebyte one month ago, with the goal of improving the way programmers are hired. Too many companies run interviews the way they always have, with resumes, white boards and gut calls. We described our initial ideas about how to do better than this in our manifesto. Well, a little over a month has now passed. In the last 30 days, we've done 300 interviews. We've started to put our ideas into practice, to see what works and what doesn't, and to iterate on our process. In this post, I'm going to talk about what we've learned from the first 300 interviews.

I go into a lot of detail in this post. The key findings are:
  1. Performance on our online programming quiz is a strong predictor of programming interview success
  2. Fizz buzz style coding problems are less predictive of ability to do well in a programming interview
  3. Interviews where candidates talk about a past programing project are also not very predictive

Process

Our process has four steps:
  1. Online technical screen.
  2. 15-minute phone call discussing a technical project.
  3. 45-minute screen share interview where the candidate writes code.
  4. 2-hour screen share where they do a larger coding project.
Candidates work on their own computers, using their own dev environments and strongest languages. In both of the longer interviews, they pick the problem or project to work on from a short list. We're looking to find strengths, so the idea is that most candidates should be able to pick something they're comfortable with. We keep the list of options short, however, to help standardize evaluation. We want to have a lot of data on each problem.

We're looking for programming process and understanding, not leaps of insight. We do this by offering help with design/algorithm of each problem (and not penalizing candidates for this). We evaluate interviews with a score card. For now we go a little overboard, tracking the time to reach a number of milestones in each problem. We also score on understanding, whether they speak specifically or generally, do they seem nervous, and a bunch of other things (basically everything we can think of). Most of these, no doubt, are horrible measures of performance. We record them now so that we can figure out which are good measures later.

Screening

The first experiment we ran was screening people without looking at resumes. Most job applicants are rejected at the screening stage. The sad truth is that a high percentage of the people applying for any job post on the Internet are bad. To protect the time of their interviewers, companies need a way to filter people early, at the mouth of the hiring funnel. Resumes are the traditional way to do this. However, as Aline Lerner has shown, resumes don't work. Good programmers can't be reliably distinguished from bad ones by looking at their resumes. This is a problem. What the industry needs is a way to screen candidates by looking at their actual ability, not where they went to school or worked in the past[1]. To this end, we tested two screening steps:
  1. A fizzbuzz-like programming assignment. Applicants completed two simple problems. We tracked the time to complete each, and manually graded each on correctness and code quality.
  2. An automated quiz. The questions on the quiz were multiple choice, but involved understanding actual code (e.g., look at a function, and select which of several bugs is present).
We then correlated the results of these two steps with success in our subsequent 45 minute technical interview. The following graph shows the correlations after 300 interviews.

Correlation between screening steps and interview decisions


We can see that the quiz is a strong predictor of success in our interviews! Almost a quarter of interview performance (23%) can be explained by the score on the quiz. 15% can be explained by quiz completion time (faster is better). Speed and score are themselves only loosely correlated (being accurate means you're only slightly more likely to be fast). This means that they can be combined, into what we're calling the composite score, which has the strongest correlation of all and explains 29% of interview performance![2].

The fizzbuzz-style coding problems, however, did not perform as well. While the confidence intervals are large, the current data shows less correlation with interview results. I was surprised by this. Intuitively, asking people to actually program feels like the better test of ability, especially because our interviews (the measures we're using to evaluate screening effectiveness) are heavily focused on coding. However, the data shows otherwise. The coding problems were also harder for people to finish. We saw twice the drop off rate on the coding problems as we saw on the quiz.

Talking versus coding

Before launching, we spoke to a number of smart people with experience in technical hiring to collect ideas for the interviewing. The one I liked the most was having candidates talk us through a technical project, including looking at source code. This seemed like it’d be the least adversarial, most candidate friendly approach.

As soon as we started doing them however, I saw a problem. Almost everyone was passing. Our filter was not filtering. We tried extending the duration of the interviews to probe deeper and looking at code over Google hangouts. Still, the pass rate remained too high.

The problem was we weren’t getting enough signal from talking about projects to confidently fail people. So we started following up with interviews where we asked people to write code. Suddenly, a significant percentage of the people who had spoken well about impressive-sounding projects failed, in some cases spectacularly, when given relatively simple programming tasks. Conversely, people who spoke about very trivial sounding projects (or communicated so poorly we had little idea what they had worked on) were among the best at actual programming.

In total we did 90 experience interviews, scoring across several factors (did the person seem smart, did they understand their project well, were they confident, and was the project impressive). Then we correlated our factors with performance in the 45 minute programming interview. Confidence had essentially zero correlation. Impressiveness, smartness and understanding each had about a 20% correlation. In other words, experience interviews underperformed our automated quiz in predicting success at coding.

Now, talking about past experience in more depth may be meaningful. This is how (I think) I know which of my friends are great programmers. But, we found, 45 minutes is not enough time to make talking about coding a reasonable analog for actually coding.

Interview duration, and interviewer sentiment

A final test we ran was to look at when during the interview we make decisions. Laszlo Bock, VP of People at Google, has written much about how interviewers often make decisions in the first few minutes of an interview, and spend the rest of the time backing up this decision. I wanted to make sure this was not true for us. To test this, we added a pop-up to our interviewing software, asking us every five minutes during each interview if the candidate is performing well, or poorly. Looking at these sentiments in aggregate, we can tell exactly when during each interview we made the decision.

We found that in 50% of our 45-min interviews, we "decide" (become positive for someone who ends up passing, or negative for someone who does not pass) in the first 20 minutes. In 20%, however, we do not settle on our final sentiment until the last 5 minutes. In the 2-hour interview, the results are similar. We decide 60% in the first 20 minutes (both positively and negatively), but 10% make it almost to the 2-hour mark. (In that case, unfortunately, it's positives turning to negatives, because we can't afford to send people we're unsure about to companies)[3].

Conclusion

It's been a crazy month. Guillaume, Harj and I have spent nearly all our time in interviews. Sometimes, at 10 PM on a Saturday, after a day of interviewing, I wonder why we started this company. But as I write this blog post, I remember. Hiring decisions are important, and too many companies are content to do what they've always done. In our first 30 days, we've come up with a replacement for resume screens, and shown that it works well. We've found that programming experience interviews (used at a bunch of companies) don't work particularly well. And we've written software to help us measure when and why we make decisions.

For now, we're evaluating all of our experiments against our final round interview decisions. This does create some danger of circular reasoning (perhaps we're just carefully describing our own biases). But we have to start somewhere, and basing our evaluations on how people write actual code seems like a good place. The really exciting point comes when we can re-run all this analysis, basing it on actual job performance, rather than interview results. Doing that is why we started this company.

Next, we want to experiment with giving candidates projects to do on their own time (I'm particularly interested in making this an option, to help with interview anxiety), and interviews where candidates are asked to work with an existing codebase. We're also adding harder questions to the quiz, to see if we can improve its effectiveness. We'd love to hear what you think about these ideas. Email us at founders@triplebyte.com.

Thanks to Emmett Shear, Greg Brockman and Robby Walker for reading drafts of this.

An earlier version of this post confused the correlation coefficient R with R^2, and overstated the correlations. Since this blog was posted, however, a new version of the quiz has increased the correlation of the composite score to 0.69 (0.47 R^2)

1. This is a complex issue. There are good arguments for allowing experienced programmers to skip screening steps, and not have to continually re-prove themselves. At some point, track record should be enough. However, this type of screening can also be done in very bad ways (e.g., only interviewing people who have worked at top companies or come from a few schools). Evaluating experience is something we plan to experiment with, but for now we're focusing on how to directly identify programming ability.

2. It’s worth noting the error bars (showing 95% confidence intervals). The true value for each of the correlations in the graph falls in the range shown with 95% confidence. The error bars are large because our sample is small. However, even comparing the bottom of our confidence interval to Aline Lerner’s results on resume screening (she found a correlation close to 0), shows our quiz is a far better first step in a hiring funnel than resumes are.

3. We're not perfect, and we certainty reject great people. I always like to mention this when talking about rejections. We know this (and think it's true of all interview processes). We're trying to get better.

52 responses
Very interesting. I've thought from time to time about changing track into technical recruiting for many of the reasons you describe. Cool to see a startup tackling the problem with an approach like yours. G'luck!
I really think technical, or any other kind of hiring is broken. As you showed best indicators are questions (like your quizzes) or by showing "live" you can rather than a pretty CV, certificate or a fancy university name. I have been rejected many, many, many times because the first screening (CV check by non-technical recruiter). My last example was at Airbnb were I had to hack my way to get noticed in order to get the first interview. The funny thing is that I was the fasted candidate to get hired + I won a company-wide award for my work at the company just 4 months after joining. I haven't finished a degree because I thought was boring and I was learning things I already taught myself before, but this fact makes my resume go down the list very fast. Because interviewers don't have time to lose and thousands of candidates to check I'm sure they will find very useful the use of technology on getting those good prospects in front of everyone else. Something I've seen many times at my past jobs is having good technical applicants, some of them are even referred by one team member and are turned down later because culture. I don't know why but engineers and technical people are more likely to fail at those than others. Maybe we are too nerds. The surprising thing is that they check culture as the last step because those who can run those type of interview are a few and can't become full-time culture keepers. This is an enormous waste of time and resources for the applicant, the interviewers and the company itself.
Hi, I had a reading of the conclusions you made and I felt as if you defined a process of hiring machines to code rather than humans. So I took a few moments to read your manifesto (the premise on which your entire conclusion is made), and here is my take on it. 1. /"Whiteboard coding and algorithm questions aren't good predictors of how effective someone will be at writing real code." Whiteboard coding show how someone really thinks. It illustrates the though process of the person and that helps the interviewer to judge him on his rational thinking and his logical approaches. Algorithms add to this by illustrating the problem solving ability. A person may not be able to solve an Algorithm actually, but the attempt on a whiteboards speaks more than his implementation on a online platform. 2./ "Candidates deserve a consistent experience and consistent evaluation". The entire USP of an interview is the diversity which allows the interviewer to judge if someone is able to adapt to new situations and come out of his comfort zone. What you are suggesting is to change the interview process into a GRE exam which will only in-turn develop the culture among developers to prepare for that exam for 2 years. 3./"Hiring decisions should be made using a clear scoring system, not gut feelings" Most of the companies have a 3round or 4 round interview process. It is obvious enough to remove the gut feeling factor. If you wanna argue that it may be possible that a candidate got selected based on the gut feeling of all 4 interviewers then my counter argument is that he is worth being selected if he could generate that gut feeling in so many people. 4./"The hiring process should be focused on discovering strengths, not uncovering weaknesses" Agree to the point. However, the irony is that you are trying to define a particular process to hiring. I wonder if it could actually perform the "discovery" part. 5./"Candidates should be told exactly what to expect in an interview, and be allowed to prepare in advance. " So basically you want to hire the guy who has studied the most over the smartest guy in the room. From my experience, I can surely say if companies like "Google" and "Fb" used to follow that practice, I wouldn't be even writing their name here. 6./"truthful feedback on how they did so they know how they can improve" Agreed. Something that should be adapted by all companies in their recruiting process. 7./"Good programmers come from all types of background." You enforce my point with this statement. Good programmers need not be just people who could quickly write a program for a search in a large of data using hash maps but can also be people who have brilliant problem solving ability and are slow in transforming that into code, or people who are amazing in thinking software design and scalability and probably cannot remember code syntax so well. A company needs a good blend of all these people. Then only a good ecosystem to growth is created rather than just making a team of 10 machines who could transform a pseudo code into Java in 10 minutes. 8./" The software industry needs to experiment more with hiring processes and figure out what really works." I think many are already doing that by doing Tech Hackathons, online challenges, weekend projects, open source contribution references etc. So, not something new which you guys figured out. 9./"Candidates are at a fundamental disadvantage in salary and equity negotiations" Not sure what kind of companies you have surveyed. I think most well known companies maintain clear standards of salary and compensation plan. Though people will surely be flattered reading this. :) 10./"Companies should not have to make recruiting a core competency" Now you are just trying to open the market for yourself by yourself. No comments. :P Would love to hear your counter arguments. Mail me. :-)
Many fields require serious professional certification. You can't become a doctor unless you go through a board certification that includes simulated patient interaction. Likewise, you can't build a bridge or a dam without engineering certification. IMO, this article demonstrates the need to certify software engineers; using a process similar to the interviewing process described. Therefore, when hiring, we can skip most of the "do you know how to code" and get down to cultural fit and mutual interest.
A comment on "giving candidates projects to do on their own time". This can bias the results toward candidates who have significant free time. Candidates with a current job, family, or other obligations can be at a disadvantage. Even without an explicit time limit, candidates are likely to feel they are failing if they cannot complete something quick enough.
@Andrew Rondeau Oh god, you're not talking about certificate mills again, are you? That's the whole problem.
I was recently given a take home problem as part of a job interview. Sounded like a piece of cake since I had written more advanced similar systems in the past. However what I got to work with was a nightmare. After 6 hours spent getting the project with complex dependencies to build, I found it would not run on my computer. There was an uncaught exception in the openGL code which caused it to crash even before the log was initialized. Took me a few more hours to find the bug (which occurred because my openGL driver was 6 months old.). So I reported the fatal bug, which was then dismissed as a "driver issue" not a real problem.. So onto the real problem. Now I found another error which would not allow me to save the changes to this virtual world app they were working on. I spent another 8 hours making sure it was something I was doing wrong. So I reported the errors that I found. I received an email later that day telling me I was not a "good fit". So next time I am asked to do a take home problem, I will tell them what my hourly contracting rate is first.
@Andrew Rondeau - Like, University?
Good arrempt to solve a very difficult problem. So I wonder if you testing may only identify thd skilled and missed the stellar creative people. ( who are quite rare). I have hired people who took a while to "spark" I wonder if your analysis could be a bit premature. i.e. see what you think in 1 year. Some times are the ideas that take a while to appear. There people may not do so well in your testing. I can think of some amazing people - who have incredible breakthoughts . But these disruptive innovations came after months - so that if you were assuming that a short time was a good test period you would have missed this stellar people and ideas. Th ey inmplemented their ideas well to so that the customer was blown away with the solution. It make a lot of money for the firm.
Maybe I read through this a bit too quickly, but I'm not sure what your measure of "success" is? You say "interviewing success" - does that mean the person being interviewed was hired? Does it mean the person was hired and has been deemed successful in their first 30 days on the job? (no, apparently it doesn't based on what you say in the conclusion: " The really exciting point comes when we can re-run all this analysis, basing it on actual job performance, rather than interview results") It's really not clear what's being measured here.
I personally hate the "take home test" approach to interviewing. I've had multiple such tests that take anywhere from 10-25 hours to complete because simply answering the question isn't enough; you need to give textbook correct answers and your code must be formatted perfectly with the requisite comments and documentation. In short, it's pretty similar to an upper-level college course's final exam; however, in college, you can get a good grade with a few mistakes; in interviewing, you get rejected for a few mistakes. I'm done giving a company 15 hours of my time just to get to a first interview; this is arrogant, condescending, and completely devalues my time. The reality of hiring is you're going to make mistakes, like every other part of running a business. Even in an extended "interview" such as dating for a potential life partner, people make mistakes so I'm not sure how the hiring process can be quantified to remove said error. The interview process is so excruciating these days I often hate the companies I'm talking with. While we're at it, the skills requirements listed with jobs today are astounding. My experience is that a company wants to hire a programmer with at least a journeymen's level of expertise in 6-8 skills. If you have 5 and are comfortable you can learn the other 3, you're dead in the water. Let's be honest, the latest Javascript framework isn't that complicated. The latest NoSQL database isn't that hard to learn. The truly hard parts of joining a new company are learning how projects are managed, getting the political lay of the land, finding a sherpa to answer your questions in the first couple of weeks, and learning where you fit within the organization.
Really good analysis, just going to drop my 2 cents. Ammon, have you asked the programmers if they liked the way you did the interviews? How did they felt? What would they change, eliminate or add? If they felt in any particular or special way during the interview? Where they comfortable? Would they prefer to give first resume and then code? Even if you are measuring code, don't forget that we are all humans, not machines. Also, I used to think that time per interview was important because "people can't lie for a long time", but actually I think that more times with shorter time could actually be more effective to understand someone's skills and personality.
I took the quiz -- it didn't tell me how well I did.
If you did interviews correctly you'd only need 30, not 300. Your premise is as false and frankly dangerously inept as your conclusions.
The US notion that resumes should be 4 pages or 2 pages only I suspect works against you here. As someone with 30+ years of experience if I compress my work history down to this length it becomes little more than a shout out to places I've worked. For the recruiters prepared to invest the time my 'full' resume comes out to near 20 pages with lots of detail and often serves to highlight similar projects, similar workplaces and the like. The downside of course is the length is daunting, none of the automated resume scanning tools beloved of the industry cope with it, many (of the my mind poorer recruiters) don't even bother reading it preferring to wing it in an interview. But interesting to have some insight from the other side of the fence.
I definitely agree that actually coding is a much better indicator then talking about coding, that doesn't surprise me. What does surprise me is that the simple fizz-buzz didn't do as well as the quiz. Perhaps the quiz required a deeper knowledge. Very interesting.
@Enno, the information included on a resume should be relevant and may change on every different job you apply to. There's no way someone will go through more than 2-3 pages the first time they check it. You can have lots of experience but if you don't know how to show that you are the right one using 2 pages you may be in trouble and maybe be a bad sign for the recruiter. Why you don't include the last or the most relevant ones and explain you have 30+ years of experience and can offer a much detailed one including ins and outs of similar projects? If they want to know everything I've done they can check my personal website, blog or LinkedIn profile (even though I don't include everything I've done there, like being a Postman or summer camps). Keep it short and concise and offer a much detailed one in case they are really interested.
Very cool experimentation and I like your manifesto too. Just a (big) statistical note, correlation (r) and variance (r^2) are different things: http://www.psychstat.missouristate.edu/introboo... >Intuitively, asking people to actually program feels like the better test of ability, especially because our interviews (the measures we're using to evaluate screening effectiveness) are heavily focused on coding. However, the data shows otherwise. Hey, there you go proving we should be using data instead of gut feel/our instincts! >experience interviews underperformed our automated quiz in predicting success at coding. And congrats on replicating the Industrial-Organizational Psychology research on hiring methods! http://mavweb.mnsu.edu/howard/Schmidt%20and%20H... >Confidence had essentially zero correlation. This is an important nuance. There's a lot of research showing that how confident you seem during an interview is a big predictor of whether you'll get hired. This is why interviews tend to be so unreliable, It's another example of why we can't trust our gut feel about a candidate. >The really exciting point comes when we can re-run all this analysis, basing it on actual job performance, rather than interview results. What job performance metrics are you planning on collecting? Overall, thumbs up!
I feel that many companies today only care about bettering themselves and their projects. What happened to building strong relationships with talented people and hiring good people to train to replace yourself.. I have now been working in the Bay Area software world since 1998 and am disgusted at how most larger companies truly feel about, and treat employees as robots.
On the topic of how long does it take to make a decision: There are multiple times as an interviewer that I have made one decision about a candidate during the interview and then changed my mind afterwards. This is because my process is to document as much as I can remember as soon as possible after the interview. Doing this means I review the objective things that happened during the interview instead of relying on my feeling of how things are going during the interview (when I'm also distracted with other tasks- presenting the problem, keeping track of time, etc). Making a decision about a candidate during the interview can be a mistake.
To state the obvious, the presumption is that this interviewing approach somehow models reality and adequately captures the candidate's ability to perform in the "game" as part of a team. As to notions of where the candidate will fit within the organization, if the mgt layer hasn't figure this one out when the new employee walks in the door the first day, then mgt is failing to do their job and a setup for failure.
This is a nice and valuable small piece of research but nowhere in the vicinity of being used as a recipe or even a hint. The focus is on candidate's success on the interview which does not guarantee success as an employee, much less predict actual performance after adjustment with the company. Synthetic challenges approach make wrong assumption from the start and can only be expected to be misleading. The only way to estimate how candidate will perform on this particular position long-term is to assign a task within the position's responsibilities and environment, on actual code. Otherwise the approach stays (obviously) probablilistic. One major issue with that is candidate success is still proportional to conformance to whatever hiring practice is there and best professionals are rarely conformists. Which makes the whole affair a filter optimization task (and in the end reducing HR expense) which has very little to do with actually hiring best people available.
This is the most ridiculous hiring process.
Very interesting stuff. I agree that most of coding tests based on algorithms and academic knowledge are useless. The best way to test the candidate is to check how he/she deals with the real code. That's why we have started the DevSKiller (http://devskiller.com). We use programming tasks based on real applications and code review tasks to test candidates skills. Most of our customers are very happy with results of code review testing. With code review of some code fragment, you can check candidate’s familiarity with design patterns and coding practices. It is very important to test real skills like knowledge about some specific libraries and frameworks. We also have some very positive feedback from candidates, who are surprised that the exam tests actual skills, which are required from the candidate in the job description.
The major flaw with your interview process is that if a candidate is more qualified than your interviewer (no S), you have no way of accommodating it. That basically means you can hire people whom are at best as good as you are. And that's not good enough for plenty of startups.
27 visitors upvoted this post.