The Effective Engineer

books software-engineering

Introduction

  • In the ensuing years, I’ve come to learn that working more hours isn’t the most effective way to increase output. In fact, working too many hours leads to decreased productivity and burnout.
  • So what makes an effective engineer? Intuitively, we have some notion of which engineers we consider to be effective. They’re the people who get things done. They’re the ones who ship products that users love, launch features that customers pay for, build tools that boost team productivity, and deploy systems that help companies scale. Effective engineers produce results.
  • Effective engineers focus on value and impact—they know how to choose which results to deliver.
  • An effective engineer, therefore, is defined by the rate at which he or she produces value per unit of time worked

Part 1: Adopt the Right Mindsets

1. Focus on High-Leverage Activities

  • Figuring out these answers would be a marked departure from my regular duties of writing software, but I was excited because I knew that onboarding the new hires effectively would have a larger impact than any code I could produce.
  • Leverage is defined by a simple equation. It’s the value, or impact, produced per time invested:
  • Leverage, therefore, is the yardstick for measuring how effective your activities are. Leverage is critical because time is your most limited resource. Unlike other resources, time cannot be stored, extended, or replaced. 2 The limitations of time are inescapable, regardless of your goals
  • Former Intel CEO Andrew Grove explains that by definition, your overall leverage—the amount of value that you produce per unit time—can only be increased in three ways:
    • By reducing the time it takes to complete a certain activity.
    • By increasing the output of a particular activity.
    • By shifting to higher-leverage activities.
  • These three ways naturally translate into three questions we can ask ourselves about any activity we’re working on:
    • How can I complete this activity in a shorter amount of time?
    • How can I increase the value produced by this activity?
    • Is there something else that I could spend my time on that would produce more value?
  • As these examples illustrate, for any given activity, there are three approaches you can take to increase the leverage of your time spent. When you successfully shorten the time required for an activity, increase its impact, or shift to a higher-leverage activity, you become a more effective engineer.
  • Direct Energy Toward Leverage Points, Not Just Easy Wins
  • Wong[an engineer-lead in fb] had to gradually apply pressure to change the prevailing mindset to one where people considered the hiring process an art form to be mastered.
  • And over his four years at the company, an obsession with speed and quality in hiring became one of Facebook’s competitive advantages. While slower-moving companies dilly-dallied, Facebook closed candidates.

2. Optimize for Learning

  • Optimizing for learning is a high-leverage activity for the effective engineer, and in this chapter, we’ll examine why.
  • So how do we avoid complacency and instead shift ourselves toward a growth mindset? LinkedIn co-founder Reid Hoffman suggests treating yourself like a startup. In his book The Startup of You, he explains that startups initially prioritize learning over profitability to increase their chances of success. They launch beta versions of their product and then iterate and adapt as they learn what customers actually want. Similarly, setting yourself up for long-term success requires thinking of yourself as a startup or product in beta, a work-in-progress that needs to be invested in and iterated on every single day. 12
  • Seek Work Environments Conducive to Learning
    1. Fast growth. Questions to consider
      • What is the weekly or monthly growth rates of core business metrics (e.g., active users, annual recurring revenue, products sold, etc.)?
      • Are the particular initiatives that you’d be working on high priorities, with sufficient support and resources from the company to grow?
      • How aggressively has the company or team been hiring in the past year?
      • How quickly have the strongest team members grown into positions of leadership?
    2. Training. Questions to consider
      • Is each new person expected to figure things out on his or her own, or is there a more formalized way of onboarding new engineers?
      • Is there formal or informal mentorship?
      • What steps has the company taken to ensure that team members continue to learn and grow?
      • What new things have team members learned recently?
    3. Openness. Questions to consider
      • Do employees know what priorities different teams are working on?
      • Do teams meet to reflect on whether product changes and feature launches were worth the effort? Do they conduct post-mortems after outages?
      • How is knowledge documented and shared across the company?
      • What are examples of lessons that the team has learned?
    4. Pace
    5. People
      • Do the people who interviewed you seem smarter than you?
      • Are there skills they can teach you?
      • Were your interviews rigorous and comprehensive? Would you want to work with the types of people who would do well on them?
      • Do people tend to work on one-person projects, or are teamwork and cooperation common themes?
    6. Autonomy
      • Do people have the autonomy to choose what projects they work on and how they do them?
      • How often do individuals switch teams or projects?
      • What breadth of the codebase can an individual expect to work on over the course of a year?
      • Do engineers participate in discussions on product design and influence product direction?
  • Dedicate Time on the Job to Develop New Skills. The solution is to borrow a lesson from Google. Google pioneered an idea called “20% time,”
  • Whichever route you decide, here are ten suggestions to take advantage of the resources available to you at work:
    • Study code for core abstractions written by the best engineers at your company
    • Write more code.
    • Go through any technical, educational material available internally
    • Master the programming languages that you use.
    • Send your code reviews to the harshest critics.
    • Enroll in classes on areas where you want to improve.
    • Participate in design discussions of projects you’re interested in.
    • Work on a diversity of projects.
    • Make sure you’re on a team with at least a few senior engineers whom you can learn from.
    • Jump fearlessly into code you don’t know
    • engineering success was highly correlated with “having no fear in jumping into code they didn’t know.” Fear of failure often holds us back, causing us to give up before we even try. But as Johnson explains, “in the practice of digging into things you don’t know, you get better at coding.” 24
  • There are many ways to learn and grow in whatever you love to do. Here are ten starting points to help inspire a habit of learning outside of the workplace:
    • Learn new programming languages and frameworks.
    • Invest in skills that are in high demand.
    • Read books.
    • Join a discussion group.
    • Attend talks, conferences, and meetups
    • Build and maintain a strong network of relationships.
    • Lucky people dramatically increase the possibility of a lucky chance encounter by meeting a large number of people in their daily lives. The more people they meet, the greater opportunity they have of running into someone who could have a positive effect on their lives.” 33
    • Follow bloggers who teach.
    • Write to teach.
    • Tinker on side projects.
    • Pursue what you love.

3. Prioritize Regularly

  • [the author] was the first engineer on Quora’s user growth team and eventually became the engineering lead for the group. Our team developed a healthy cadence where we would prioritize ideas based on their estimated returns on investment, run a batch of experiments, learn from the data what worked and what didn’t, and rinse and repeat. In a single year, our team grew Quora’s monthly and daily active user base by over 3x. 2 And I learned through that experience that a successful user growth team rigorously and regularly prioritizes its work.

  • Track To-Dos in a Single, Easily Accessible List

  • Write down and review to-dos. Spend your mental energy on prioritizing and processing your tasks rather than on trying to remember them. Treat your brain as a processor, not as a memory bank.

  • The first step in effective prioritization is listing every task that you might need to do.

  • To-do lists should have two major properties: they should be a canonical representation of our work, and they should be easily accessible. A single master list is better than an assortment of sticky notes, sheets of paper, and emails

  • Ask yourself on a recurring basis: Is there something else I could be doing that’s higher-leverage?

  • Focus on What Directly Produces Value

  • When you get the important things right, the small things often don’t matter. That’s true in life as well.

  • Focus on the Important and Non-Urgent

  • One piece of advice that I consistently give my mentees is to carve out time to invest in skills development. Their productivity might slow down at first, but with time, the new tools and workflows that they learn will increase their effectiveness and easily compensate for the initial loss.

  • Investing in Quadrant 2 solutions can reduce urgent tasks and their associated stress.

    • Note: there q4 is one of the 4 quadrants in the prioritising system
  • Protect Your Maker’s Schedule

  • Limit the Amount of Work in Progress

  • After prioritizing our tasks and blocking off contiguous chunks of time, it can be tempting to try to tackle many things at once.

  • The number of projects that you can work on simultaneously varies from person to person. Use trial and error to figure out how many projects you can work on before quality and momentum drop, and resist the urge to work on too many projects at once.

  • Fight Procrastination with If-Then Plans

  • Of the students who articulated these “implementation intentions,” 71% of them mailed in their essays. Only 32% of the other students did. A minor tweak in behavior resulted in over twice the completion rate. 19 20

  • Halvorson describes the if-then plan, in which we identify ahead of time a situation where we plan to do a certain task. Possible scenarios could be “if it’s after my 3pm meeting, then I’ll investigate this long-standing bug,” or “if it’s right after dinner, then I’ll watch a lecture on Android development.”

  • The concept of if-then planning also can help fill in the small gaps in our maker’s schedule. How many times have you had 20 minutes free before your next meeting, spent 10 of those minutes mulling over whether there’s enough time to do anything, finally picked a short task, and then realized you don’t have enough time left after all?

  • Make a Routine of Prioritization

  • Once we’re knee-deep working on those tasks, a common pitfall for many engineers is neglecting to revisit those priorities.

  • Because optimizing for learning is important and non-urgent, I generally include some tasks related to learning something new.

  • What’s important isn’t to follow my mechanics, but to find some system that helps you support a habit of prioritizing regularly.

    Work on what directly leads to value. Don’t try to do everything. Regularly ask yourself if there’s something higher-leverage that you could be doing. Work on the important and non-urgent. Prioritize long-term investments that increase your effectiveness, even if they don’t have a deadline. Reduce context switches. Protect large blocks of time for your creative output, and limit the number of ongoing projects so that you don’t spend your cognitive energy actively juggling tasks. Make if-then plans to combat procrastination. Binding an intention to do something to a trigger significantly increases the likelihood that you’ll get it done. Make prioritization a habit. Experiment to figure out a good workflow. Prioritize regularly, and it’ll get easier to focus on and complete your highest-leverage activities.

  • Successful technology companies build the equivalent of a pilot’s flight instruments, making it easy for engineers to measure, monitor, and visualize system behavior.

  • Treat data measurement and analysis as parts of the product development workflow rather than as activities to be bolted on afterwards.

  • Key Takeaways Measure your progress. It’s hard to improve what you don’t measure. How would you know what types of effort are well spent? Carefully choose your top-level metric. Different metrics incentivize different behaviors. Figure out which behaviors you want. Instrument your system. The higher your system’s complexity, the more you need instrumentation to ensure that you’re not flying blind. The easier it is to instrument more metrics, the more often you’ll do it. Know your numbers. Memorize or have easy access to numbers that can benchmark your progress or help with back-of-the-envelope calculations. Prioritize data integrity. Having bad data is worse than having no data, because you’ll make the wrong decisions thinking that you’re right.

Part 2: Execute, Execute, Execute

4. Invest in Iteration Speed

  • Effective engineers invest heavily in iteration speed.
  • Move Fast to Learn Fast
  • Continuous deployment is but one of many powerful tools at your disposal for increasing iteration speed. Other options include investing in time-saving tools, improving your debugging loops, mastering your programming workflows, and, more generally, removing any bottlenecks that you identify.
  • All of these investments accomplish the same goal as continuous deployment: they help you move fast and learn quickly about what works and what doesn’t
  • Invest in Time-Saving Tools
  • In reality, Sarah wouldn’t actually front-load all that time into creating tools. Instead, she would iteratively identify her biggest bottlenecks and figure out what types of tools would let her iterate faster. But the principle still holds: time-saving tools pay off large dividends.
  • Sometimes, the time-saving tool that you built might be objectively superior to the existing one, but the switching costs discourage other engineers from actually changing their workflow and learning your tool. In these situations, it’s worth investing the additional effort to lower the switching cost and to find a smoother way to integrate the tool into existing workflows. Perhaps you can enable other engineers to switch to the new behavior with only a small configuration change.
  • Shorten Your Debugging and Validation Loops
  • In actuality, much of our engineering time is spent either debugging issues or validating that what we’re building behaves as expected.
  • Don’t fall into this trap! The extra investment in setting up a minimal debugging workflow can help you fix an annoying bug sooner and with less headache.
  • Effective engineers have an obsessive ability to create tight feedback loops for what they’re testing,” Mike Krieger, co-founder and CTO of the popular photo-sharing application Instagram, told me during an interview. “They’re the people who, if they’re dealing with a bug in the photo posting flow on an iOS app … have the instinct to spend the 20 minutes to wire things up so that they can press a button and get to the exact state they want in the flow every time.”
  • Master Your Programming Environment
  • Moreover, certain basic skills are required for the craft of programming, including code navigation, code search, documentation lookup, code formatting, and many others.
  • Here are some ways you can get started on mastering your programming fundamentals:
    • Get proficient with your favorite text editor or IDE.
    • Learn at least one productive, high-level programming language.
    • Get familiar with UNIX (or Windows) shell commands.
    • Prefer the keyboard over the mouse.
    • Automate your manual workflows
    • Test out ideas on an interactive interpreter.
    • Make it fast and easy to run just the unit tests associated with your current changes.
    • In general, the faster that you can run your tests, both in terms of how long it takes to invoke the tests and how long they take to run, the more you’ll use tests as a normal part of your development—and the more time you’ll save.
  • Don’t Ignore Your Non-Engineering Bottlenecks
  • Effective engineers identify and tackle the biggest bottlenecks, even if those bottlenecks don’t involve writing code or fall within their comfort zone. They proactively try to fix processes inside their sphere of influence, and they do their best to work around areas outside of their control.
  • Find out where the biggest bottlenecks in your iteration cycle are, whether they’re in the engineering tools, cross-team dependencies, approvals from decision-makers, or organizational processes. Then, work to optimize them.
  • Learn basic commands like grep, sort, uniq, wc, awk, sed, xargs, and find, all of which can be piped together to execute arbitrarily powerful transformations.

5. Measure What You Want to Improve

  • My experience at Google demonstrated the power of a well-chosen metric and its ability to tackle a wide range of problems. Google runs thousands of live traffic search experiments per year, 11 and their reliance on metrics played a key role in ensuring search quality and building their market share.
  • Use Metrics to Drive Progress
  • As Peter Drucker points out in The Effective Executive, “If you can’t measure it, you can’t improve it.”
  • Good metrics accomplish a number of goals.
    • First, they help you focus on the right things.
    • Second, when visualized over time, good metrics help guard against future regressions.
    • Third, good metrics can drive forward progress.
  • Sam Schillace, Box’s VP of Engineering, he explained a technique called performance ratcheting that they now use to address this problem and apply downward pressure on performance metrics.
  • At Box, they use metrics to set a threshold that they call a performance ratchet. Any new change that would push latency or other key indicators past the ratchet can’t get deployed until it’s optimized, or until some other feature is improved by a counterbalancing amount. Moreover, every time the performance team makes a system-level improvement, they lower the ratchet further. The practice ensures that performance trends in the right direction.
  • Even still, given the benefits of good metrics, it’s worth asking yourself:
    • Is there some way to measure the progress of what I’m doing?
    • If a task I’m working on doesn’t move a core metric, is it worth doing? Or is there a missing key metric?
  • Pick the Right Metric to Incentivize the Behavior You Want
  • The right metric functions as a North Star, aligning team efforts toward a common goal; the wrong metric leads to efforts that might be ineffective or even counterproductive.
    • Note: similar to how boundary conditions. working on the wrong metric becomes counter productive very fast
  • Picking the right metric applies to your personal goals as well as your professional ones. I knew writing this book would be a long and challenging project, so I established the habit of writing every day. Early on, I set a goal of writing for at least three hours per day, and I kept track of my progress. What I noticed after a few weeks, however, was that I would spend much of those three hours re-reading and re-writing to perfect my sentences. In fact, some days after editing, I actually would end up with fewer words than I had started out with initially. Great writers like Stephen King and Mark Twain underscore the importance of revision, but I knew that I was rewriting too much too early, and that I would be better off drafting more chapters. And so, I changed my metric. Rather than focusing on writing three hours per day, I focused on writing 1,000 words per day. Some days, that took me two hours; other days, it took four or five. The new metric incentivized me to focus on drafting new content rather than focusing on sentence quality—something I could revisit at a later time. That simple change was all I needed to significantly increase my writing pace.
  • When deciding which metrics to use, choose ones that
    1. maximize impact
    2. are actionable
    3. are responsive yet robust.
  • Jim Collins, the author of Good to Great, argues that what differentiates great companies from good companies is that they align all employees along a single, core metric that he calls the economic denominator. The economic denominator answers the question: “If you could pick one and only one ratio—profit per x …—to systematically increase over time, what x would have the greatest and most sustainable impact on your economic engine?”
  • An actionable metric is one whose movements can be causally explained by the team’s efforts.
  • vanity metrics, as Eric Ries explains in The Lean Startup, track gross numbers like page views per month, total registered users, or total paying customers.
  • Increases in vanity metrics may imply forward product progress, but they don’t necessarily reflect the actual quality of the team’s work.
  • Actionable metrics, on the other hand, include things like signup conversion rate, or the percentage of registered users that are active weekly over time. Through A/B testing (a topic we’ll discuss in Chapter 6), we can trace the movement of actionable metrics directly back to product changes on the signup page or to feature launches.
  • A responsive metric updates quickly enough to give feedback about whether a given change was positive or negative, so that your team can learn where to apply future efforts.
  • However, a metric also needs to be robust enough that external factors outside of the team’s control don’t lead to significant noise. Trying to track performance improvements with per-minute response time metrics would be difficult because of their high variance.
  • However, a metric also needs to be robust enough that external factors outside of the team’s control don’t lead to significant noise. Trying to track performance improvements with per-minute response time metrics would be difficult because of their high variance. However, tracking the response times averaged over an hour or a day would make the metric more robust to noise and allow trends to be detected more easily.
  • Instrument Everything to Understand What’s Going On
  • A team of Silicon Valley veterans finally flew into Washington to help fix the site. The first thing they did was to instrument key parts of the system and build a dashboard, one that would surface how many people were using the site, the response times, and where traffic was going. Once they had some visibility into what was happening, they were able to add caching to bring down load times from 8 seconds down to 2, fix bugs to reduce error rates down from an egregious 6% to 0.5%, and scale the site up so that it could support over 83k simultaneous users. 27 Six weeks after the trauma team arrived and added monitoring, the site was finally in a reasonable working condition. Because of their efforts, over 8 million Americans were able to sign up for private health insurance. 28
  • The stories from Twitter and Obamacare illustrate that when it comes to diagnosing problems, instrumentation is critical.
    • Note: esp for things running all the time. ie yatai
  • However, many of the questions we want to answer tend to be exploratory, since we often don’t know everything that we want to measure ahead of time. Therefore, we need to build flexible tools and abstractions that make it easy to track additional metrics.
  • Internalize Useful Numbers
  • When you find yourself wondering which of several designs might be more performant, whether a number is in the right ballpark, how much better a feature could be doing, or whether a metric is behaving normally, pause for a moment. Think about whether these are recurring questions and whether some useful numbers or benchmarks might be helpful for answering them. If so, spend some time gathering and internalizing that data.
  • Be Skeptical about Data Integrity
  • The right metric can slice through office politics, philosophical biases, and product arguments, quickly resolving discussions. Unfortunately, the wrong metric can do the same thing—with disastrous results.
  • When I asked Schillace how to protect ourselves against data abuse, he argued that our best defense is skepticism.
  • Given the importance of metrics, investing the effort to ensure that your data is accurate is high-leverage. Here are some strategies that you can use to increase confidence in your data integrity:
    • Log data liberally, in case it turns out to be useful later on
    • Build tools to iterate on data accuracy sooner.
    • Write end-to-end integration tests to validate your entire analytics pipeline.
    • Examine collected data sooner.
    • Cross-validate data accuracy by computing the same metric in multiple ways.
    • When a number does look off, dig in to it early

6. Validate Your Ideas Early and Often

  • When I asked Levy what key lessons he learned from this experience, the one that stood out was the importance of validating the product sooner.
  • “That’s by far better than trying to … build something and then trust that you got everything right—because you can’t get everything right.”
  • Find Low-Effort Ways to Validate Your Work
  • What’s the scariest part of this project? That’s the part with the most unknowns and the most risk. Do that part first.”
  • Eric Ries, author of The Lean Startup, defines the MVP as “that version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort.”
  • When building an MVP, you want to focus on high-leverage activities that can validate hypotheses about your users as much as possible, using as little effort as possible, to maximize the chances that the product will succeed.
  • Houston[dropbox founder] demoed a limited version of his product, showing files synchronizing seamlessly across a Mac, a Windows PC, and the web
    • Note: we can build a video for flow.sh and bentoctl
  • The strategy of faking the full implementation of an idea to validate whether it will work is extremely powerful.
  • We might not all be working on startup products, but the principle of validating our work with small efforts holds true for many engineering projects.
    • Note: planning a proper MVP and constructing the test to verify our hypothesis is absolutely critical. This is what will seperate you from others, the pace at which you learn
  • Or maybe you’re contemplating tackling a gnarly bug. Before investing time into fixing it, you can use data from logs to validate whether the bug actually is affecting a sufficient numbers
  • Or maybe you’re contemplating tackling a gnarly bug. Before investing time into fixing it, you can use data from logs to validate whether the bug actually is affecting a sufficient numbers of users to justify spending your resources.
  • Continuously Validate Product Changes with A/B Testing
  • Obama’s campaign email test is a prime example of how using data to validate your ideas, even ones that seem perfectly reasonable, can be extremely high-leverage.
  • Even a well-tested, cleanly designed, and scalable software product doesn’t deliver much value if users don’t engage with it or customers don’t buy it.
  • In an A/B test, a random subset of users sees a change or a new feature; everyone else in the control group doesn’t.
  • A/B test tells you how much better that variation actually is. Quantifying that improvement informs whether it makes sense to keep investing in the same area
  • You won’t know which situation you’re in unless you measure your impact.
  • They would articulate a hypothesis, construct an A/B test to validate the hypothesis, and then iterate based on what they learned.
  • For example, they hypothesized that “showing a visitor more marketplace items would decrease bounce rate,” ran an experiment to show images of similar products at the top of the listing page, and analyzed whether the metrics supported or rejected the hypothesis (in fact, it reduced bounce rate by nearly 10%). Based on that experiment, the team learned that they should incorporate images of more marketplace products into their final design.
  • Initially, it’s tricky to determine what’s practically significant, but as you run more experiments, you’ll be able to prioritize better and determine which tests might give large payoffs.
  • Beware the One-Person Team
  • While there isn’t anything inherently wrong with working on a one-person project, it does introduce additional risks that, if not addressed, can reduce your chance of success
  • First and foremost, it adds friction to the process of getting feedback—and you need feedback to help validate that what you’re doing will work.
  • In addition, knowing that your teammates are depending on you increases your sense of accountability. The desire to help your team succeed can override the dips in motivation that everyone occasionally feels.
  • Be open and receptive to feedback.
  • Commit code early and often.
  • Request code reviews from thorough
  • Ask to bounce ideas off your teammates.
  • Design the interface or API of a new system first
  • Send out a design document before devoting your energy to your code.
  • If possible, structure ongoing projects so that there is some shared context with your teammates.
  • Solicit buy-in for controversial features before investing too much time.
  • Build Feedback Loops for Your Decisions
  • Any decision you make … should have a feedback loop for it. Otherwise, you’re just … guessing.”
    • Note: this is true for all aspects of the job like hiring, team structure etc

7. Improve Your Project Estimation Skills

  • Project estimation is one of the hardest skills that an effective engineer needs to learn. But it’s crucial to master: businesses need accurate estimates to make long-term plans for their products
    • Note: this is the hardest because this is were we fuck up the most.
  • Use Accurate Estimates to Drive Project Planning
  • A good estimate,” he writes, “is an estimate that provides a clear enough view of the project reality to allow the project leadership to make good decisions about how to control the project to hit its targets.”
  • His definition distinguishes the notion of an estimate, which reflects our best guess about how long or how much work a project will take, from a target, which captures a desired business goal.
  • Given that it’s not possible to deliver all features by the target date, is it more important to hold the date constant and deliver what is possible, or to hold the feature set constant and push back the date until all the features can be delivered? Understanding your business priorities fosters more productive conversations, letting you devise a better project plan. Doing so requires accurate estimates.
    • Note: this becomes the new question
  • So how do we produce accurate estimates that provide us the flexibility we need? Here are some concrete strategies:
  • Decompose the project into granular tasks.
  • Estimate based on how long tasks will take, not on how long you or someone else wants them to take.
  • Think of estimates as probability distributions, not best-case scenarios
  • Because we operate with imperfect information, we instead should consider our estimates as probability distributions over a range of outcomes, spanning the best-case and worst-case scenarios. Instead of telling a product manager or other stakeholder that we’ll finish a feature in 6 weeks, we might instead tell them, “There’s a 50% likelihood that we can deliver the feature 4 weeks from now, and a 90% chance that we can deliver it within 8 weeks.”
  • Let the person doing the actual task make the estimate.
  • Beware of anchoring bias.
  • Avoid committing to an initial number before actually outlining the tasks involved, as a low estimate can set an initial anchor that makes it hard to establish a more accurate estimate later on.
  • multiple approaches to estimate the same task
  • Beware the mythical man-month
  • Validate estimates against historical data
  • Use timeboxing to constrain tasks that can grow in scope.
  • Allow others to challenge estimates
  • In Chapter 6, we learned that iteratively validating our ideas can lead us to better engineering outcomes. In the same way, iteratively revising our estimates can lead us to better project outcomes. Estimates contain more uncertainty at the beginning of a project, but the variance decreases as we flesh out the details. Use incoming data to revise existing estimates and, in turn, the project plan; otherwise, it will remain based on stale information.
  • Measuring the actual time it takes to perform a task and comparing it against the estimated time helps reduce the error bounds both when we’re revising past estimates or making future ones.
  • Discovering that certain tasks take much longer than expected lets us know sooner if we’re falling behind.
  • Budget for the Unknown
  • Usually, however, the disaster is due to termites, not tornadoes.” 15 Little decisions and unknowns caused the Ooyala schedule to slip slowly, one day at a time.
    • Note: small slips in and surprises slowly chip in and make the project harder to work on
  • When setting schedules, build in buffer time for the unexpected interruptions. Some combination of them will occur with reasonable probability during long projects. Be explicit about how much time per day each member of the team will realistically spend on a given project.
  • Define Specific Project Goals and Measurable Milestones
  • What frequently causes a project to slip is a fuzzy understanding of what constitutes success—in Ooyala’s case, reducing total work vs. shipping a working product earlier—which, in turn, makes it difficult to make effective tradeoffs and assess whether a project is actually on track.
  • Define specific goals for a project based on the problem you’re working to solve, and then use milestones to measure progress on those goals.
  • The simple exercise of setting a project goal produces two concrete benefits. First, a well-defined goal provides an important filter for separating the must-haves from the nice-to-haves in the task list.
  • The second benefit of defining specific project goals is that it builds clarity and alignment across key stakeholders.
  • Building alignment also helps team members be more accountable for local tradeoffs that might hurt global goals. In the middle of a long project, it’s easy for someone to disappear down a rabbit hole for a week, rewriting some code library or building a partially-related feature
  • Even more effective than defining specific goals is outlining measurable milestones to achieve them.
  • Milestones act as checkpoints for evaluating the progress of a project and as channels for communicating the team’s progress to the rest of the organization.
  • Reduce Risk Early
  • The goal from the beginning should be to maximize learning and minimize risk, so that we can adjust our project plan if necessary.
  • a risk common to all large projects comes during system integration, which almost always takes longer than planned.
  • Code complexity grows as a function of the number of interactions between lines of code more than the actual number of lines, so we get surprised when subsystems interact in complex ways.
  • Front-loading the integration work provides a number of benefits. First, it forces you to think more about the necessary glue between different pieces and how they interact, which can help refine the integration estimates and reduce project risk. Second, if something breaks the end-to-end system during development, you can identify and fix it along the way, while dealing with much less code complexity, rather than scrambling to tackle it at the end. Third, it amortizes the cost of integration throughout the development process, which helps build a stronger awareness of how much integration work is actually left.
    • Note: how focusing on the integration helps between 2 systems helps us
  • Approach Rewrite Projects with Extreme Caution
  • Frederick Brooks coined the term “second-system effect” to describe the difficulties involved in rewrites
  • The general tendency is to overdesign the second system, using all the ideas and frills that were cautiously sidetracked on the first one.
  • Engineers who successfully rewrite systems tend to do so by converting a large rewrite project into a series of smaller projects.
  • Rewriting a system incrementally is a high-leverage activity. It provides additional flexibility at each step to shift to other work that might be higher-leverage,
  • possible—perhaps there’s no way to simultaneously deploy the old and new versions to different slices of traffic. The next best approach is to break the rewrite down into separate, targeted phases.
  • Don’t Sprint in the Middle of a Marathon
  • Despite our best efforts, we’ll still sometimes find ourselves in projects with slipping deadlines. How we deal with these situations is as important as making accurate estimates in the first place.
  • There are a number of reasons why working more hours doesn’t necessarily mean hitting the launch date:
  • Project estimation and project planning are extremely difficult to get right, and many engineers (myself included) have learned this the hard way. The only way to get better is by practicing these concepts, especially on smaller projects where the cost of poor estimations is lower. The larger the project, the higher the risks, and the more leverage that good project planning and estimation skills will have on your success.

Part 3: Build Long-Term Value

8. Balance Quality with Pragmatism

  • When I joined Google’s Search Quality Team right out of college, I picked up best practices for programming and software engineering much faster than I could have at many other places.
  • But this upside comes with a cost. Since every code change, regardless of whether it’s designed for 100 users or 10 million, is held to the same standard, the overhead associated with experimental code is extremely high
  • Writing tests and thoroughly reviewing prototype code might make sense, but blanket requirements don’t
  • Establish a Sustainable Code Review Process
  • Fundamentally, there’s a tradeoff between the additional quality that code reviews can provide and the short-term productivity win from spending that time to add value in other ways. Teams that don’t do code reviews may experience increasing pressure to do them as they grow. Newly hired engineers may reason incorrectly about code, pattern-match from bad code, or start re-solving similar problems in different ways, all because they don’t have access to the senior engineers’ institutionalized knowledge.
  • code review processes can be tuned to reduce friction while still retaining the benefits.
    • Note: it doesn’t have to be all or nothing. we can have code review for important parts or for new hires or an over the shoulder review
  • Manage Complexity through Abstraction
  • MIT Professor Daniel Jackson explains just how important it is to choose the right abstractions. “Pick the right ones, and programming will flow naturally from design; modules will have small and simple interfaces; and new functionality will more likely fit in without extensive reorganization,” Jackson writes. “Pick the wrong ones, and programming will be a series of nasty surprises: interfaces will become baroque and clumsy as they are forced to accommodate unanticipated interactions, and even the simplest of changes will be hard to make.” 23
  • But like many other aspects of code quality, building an abstraction for a problem comes with tradeoffs. Building a generalized solution takes more time than building one specific to a given problem.
  • When we’re looking for the right tool for the job and we find it easier to build something from scratch rather than incorporate an existing abstraction intended for our use case, that’s a signal that the abstraction might be ill-designed. Create an abstraction too early, before you have a firm handle on the general problem you’re solving, and the resulting design can be overfitted to the available use cases.
  • Good abstractions should be: 34 easy to learn easy to use even without documentation hard to misuse sufficiently powerful to satisfy requirements easy to extend appropriate to the audience
  • Designing good abstractions take work. Study other people’s abstractions to learn how to build good ones yourself. Because the adoption of an abstraction scales with its ease of use and its payoffs, an abstraction’s usage and popularity provides a reasonable proxy for its quality.
  • Automate Testing
  • In the absence of rigorous automated testing, the time required to thoroughly do manual testing can become prohibitive. Many bugs get detected through production usage and external bug reports. Each major feature release and each refactor of existing code become a risk, resulting in a spike to the error rate that gradually recovers as bugs get reported and fixed. This leads to software error rates like the solid line in the graph shown in Figure 1: 36
  • Automated testing doesn’t just reduce bugs; it provides other benefits as well. The most immediate payoff comes from decreasing repetitive work that we’d otherwise need to do by hand
  • Tests also allow engineers to make changes, especially large refactorings, with significantly higher confidence.
  • When code does break, automated tests help to efficiently identify who’s accountable.
  • Small unit tests tend to be easy to write, and while each one might only provide a small benefit, a large library of them quickly builds confidence in code correctness. Integration tests are harder to write and maintain, but creating just a few is a high-leverage investment.
  • Writing the first test is often the hardest. An effective way to initiate the habit of testing, particularly when working with a large codebase with few automated tests, is to focus on high-leverage tests—ones that can save you a disproportionate amount of time relative to how long they take to write. Once you have a few good tests, testing patterns, and libraries in place, the effort required to write future tests drops. That tips the balance in favor of writing more tests, creating a virtuous feedback cycle and saving more development time. Start with the most valuable tests, and go from there.
  • Repay Technical Debt
  • The key to being a more effective engineer is to incur technical debt when it’s necessary to get things done for a deadline, but to pay off that debt periodically
  • Unfortunately, technical debt often is hard to quantify. The less confident you are about how long a rewrite will take or how much time it will save, the better off you are starting small and approaching the problem incrementally. This reduces the risk of your fix becoming too complex, and it gives you opportunities to prove to yourself and others that the technical debt is worth repaying.
  • Like the other tradeoffs we’ve talked about, not all technical debt is worth repaying. You only have a finite amount of time, and time spent paying off technical debt is time not spent building other sources of value. Moreover, interest payments on some technical code debt is higher than others. The more frequently a part of the codebase is read, invoked, and modified, the higher the interest payments for any technical debt in that code. Code that’s peripheral to a product or that rarely gets read and modified doesn’t affect overall development speed as much, even if it’s laden with technical debt.
  • Key Takeaways Establish a culture of reviewing code. Code reviews facilitate positive modeling of good coding practices. Find the right balance between code reviews and tooling to trade off code quality and development speed. Invest in good software abstractions to simplify difficult problems. Good abstractions solve a hard problem once and for all, and significantly increase the productivity of those who use it. But if you try to build abstractions when you have incomplete information about use cases, you’ll end up with something clunky and unusable. Scale code quality with automated testing. A suite of unit and integration tests can help alleviate the fear of modifying what might otherwise be brittle code. Focus on ones that save the most time first. Manage your technical debt. If you spend all your resources paying off interest on your debt, you won’t have enough time left to work on new things. Focus on the debt that incurs the most interest.

9. Minimize Operational Burden

  • During Instagram’s early years, Krieger explained, its team consisted of no more than five engineers. That scarcity led to focus.
  • Far and away, the most valuable lesson they learned was to minimize operational burden.
  • This is why minimizing operational burden is so critical. The recurring costs of operating a system or product require time and energy that could be spent on higher-leverage activities.
  • “Every single, additional [technology] you add,” Krieger cautions, “is guaranteed mathematically over time to go wrong, and at some point, you’ll have consumed your entire team with operations.” And so, whereas many other startup teams adopted trendy NoSQL data stores and then struggled to manage and operate them, the Instagram team stuck with tried and true options like PostgreSQL, Memcache, and Redis that were stable, easy to manage, and simple to understand. 6 7 They avoided re-inventing the wheel and writing unnecessary custom software that they would have to maintain. These decisions made it significantly easier for the small team to operate and scale their popular app.
  • Embrace Operational Simplicity
  • Effective engineers focus on simplicity. Simple solutions impose a lower operational burden because they’re easier to understand, maintain, and modify
  • When asked what he’d learned from designing the iPod, Steve Jobs responded, “When you first start off trying to solve a problem, the first solutions you come up with are very complex, and most people stop there. But if you keep going, and live with the problem and peel more layers of the onion off, you can oftentimes arrive at some very elegant and simple solutions. Most people just don’t put in the time or energy to get there.” 8
  • Having too complex of an architecture imposes a maintenance cost in a few ways:
  • Engineering expertise gets splintered across multiple systems.
  • Increased complexity introduces more potential single points of failure.
  • New engineers face a steeper learning curve when learning and understanding the new systems.
  • Effort towards improving abstractions, libraries, and tools gets diluted across the different systems.
  • Build Systems to Fail Fast
  • These techniques cause software to fail slowly. The software may continue to run after an error, but this is often in exchange for less decipherable bugs further down the road. Suppose we introduce logic into a web server so that if it reads in a misspelled configuration parameter for max_database_connections, it defaults the parameter to 5. The program might start and run as usual, but once deployed to production, we’ll be searching everywhere trying to understand why database queries are slower than usual. Or suppose our application silently fails to save a user’s state to a data structure or database, so that it can keep running for longer. Later on, when it doesn’t read back the expected data, the program might be so far removed from the failure that it’s difficult to pinpoint the root cause. Or suppose an analytics program that processes log files simply skips over all corrupted data that it encounters. It’ll be able to continue generating reports, but days later, when customers complain that their numbers are inconsistent, we’ll be scratching our heads and struggling to find the cause.
    • Note: we might try to fix things but it still ends up worse than before
  • Examples of failing fast include: Crashing at startup time when encountering configuration errors Validating software inputs, particularly if they won’t be consumed until much later Bubbling up an error from an external service that you don’t know how to handle, rather than swallowing it Throwing an exception as soon as possible when certain modifications to a data structure, like a collection, would render dependent data structures, like an iterator, unusable Throwing an exception if key data structures have been corrupted rather than propagating that corruption further within the system Asserting that key invariants hold before or after complex logic flows and attaching sufficiently descriptive failure messages Alerting engineers about any invalid or inconsistent program state as early as possible
  • Failing fast doesn’t necessarily mean crashing your programs for users. You can take a hybrid approach: use fail-fast techniques to surface issues immediately and as close to the actual source of error as possible; and complement them with a global exception handler that reports the error to engineers while failing gracefully to the end user.
  • In both of these cases, failing fast would have made errors more easily detectable, helping to reduce the frequency and duration of production issues.
    • Note: failing slow can cause hard to debug issues wasting even more of valuable dev time
  • Relentlessly Automate Mechanical Tasks
  • Johnson, however, distinguished between two types of automation: automating mechanics and automating decision-making.
  • Automation can produce diminishing returns as you move from automating mechanics to automating decision-making. Given your finite time, focus first on automating mechanics. Simplify a complicated chain of 12 commands into a single script that unambiguously does what you want. Only after you’ve picked all the low-hanging fruit should you try to address the much harder problem of automating smart decisions.
  • Make Batch Processes Idempotent
  • One technique to make batch processes easier to maintain and more resilient to failure is to make them idempotent. An idempotent process produces the same results regardless of whether it’s run once or multiple times
  • One technique to make batch processes easier to maintain and more resilient to failure is to make them idempotent. An idempotent process produces the same results regardless of whether it’s run once or multiple times. It therefore can be retried as often as necessary without unintended side effects.
  • Hone Your Ability to Respond and Recover Quickly
  • At Netflix, engineers did something counterintuitive: they built a system called Chaos Monkey that randomly kills services in its own infrastructure. 18 Rather than spending energy keeping services alive, they actively wreak havoc on their own system.
  • Netflix’s approach illustrates a powerful strategy for reducing operational burden: developing the ability to recover quickly
  • No matter how careful we are, unexpected failures will always occur. Therefore, how we handle failures plays a large role in our effectiveness
  • Scripting moved the decision-making process away from the distracting and intense emotions of the game.
    • Note: this is especially useful for high pressure situations like foodball matches or in our case outages
  • Like Netflix, other companies have also adopted strategies for simulating failures and disasters, preparing themselves for the unexpected

10. Invest in Your Team’s Growth

  • One of the biggest lessons I learned from Ooyala is that investing in a positive, smooth onboarding experience is extremely valuable
  • You’re a staff engineer if you’re making a whole team better than it would be otherwise. You’re a principal engineer if you’re making the whole company better than it would be otherwise. And you’re distinguished if you’re improving the industry.”
  • Wong firmly believes the secret to your own career success is to “focus primarily on making everyone around you succeed.”
  • Make Hiring Everyone’s Responsibility
  • building a great team can be higher-leverage than working on “traditional” software engineering
  • So how do we design an effective interview process? A good interview process achieves two goals. First, it screens for the type of people likely to do well on the team. And second, it gets candidates excited about the team, the mission, and the culture.
  • Ideally, even if a candidate goes home without an offer, they still leave with a good impression of the team and refer their friends to interview with the company
  • Design a Good Onboarding Process
  • Share Ownership of Code
  • Avoid one-person teams. Review each other’s code and software designs. Rotate different types of tasks and responsibilities across the team. Keep code readable and code quality high. Present tech talks on software decisions and architecture. Document your software, either through high-level design documents or in code-level comments. Document the complex workflows or non-obvious workarounds necessary for you to get things done. Invest time in teaching and mentoring other team members.
  • Build Collective Wisdom through Post-Mortems
  • After a site outage, a high-priority bug, or some other infrastructure issue, effective teams meet and conduct a detailed post-mortem. They discuss and analyze the event, and they write up what happened, how and why it happened, and what they can do to prevent it from happening in the future.
  • It’s less common to dedicate the same healthy retrospection to projects and launches.
  • Build a Great Engineering Culture
  • And because the best engineers look for a strong engineering culture, it becomes a useful tool for recruiting talent. Hiring those engineers further strengthens the culture and creates a positive feedback loop.
  • Based on my hundreds of interviews and conversations, I’ve found that great engineering cultures: Optimize for iteration speed. Push relentlessly towards automation. Build the right software abstractions. Focus on high code quality by using code reviews. Maintain a respectful work environment. Build shared ownership of code. Invest in automated testing. Allot experimentation time, either through 20% time or hackathons. Foster a culture of learning and continuous improvement. Hire the best.
  • If there’s one idea that I want you to take away from this book, it’s this: Time is our most finite asset, and leverage—the value we produce per unit time—allows us to direct our time toward what matters most.
  • Leverage is the lens through which effective engineers view their activities.