OODA vs technical debt

This post is a response to Ovid's series about agility without testing:

I started to respond to the last and realized that my comment was long enough to be a blog post of my own.

First, let me say that I'm enjoying this series. Ovid and Abigail are both challenging conventional wisdom around technical debt and I think that's really healthy.

However, I note that Ovid's evidence in favor of emergent behavior is anecdotal, which is probably inevitable for this sort of thing, but dangerous. "It worked these handful of times I remember it" has confirmation bias and no statistical significance.

We can't run a real experiment, but we can run a thought experiment: 100 teams of strict TDD vs 100 teams of the Ovid approach [which he really ought to brand somehow] from the same starting point (perhaps in parallel universes) for a few months of development.

What could we expect? Certainly, the TDD teams will spend more of their time on testing than the Ovid teams. So the Ovid teams will deliver more features and fix more bugs in the same period of time.

If one believes even a little of the Lean Startup hype, the Ovid teams will have more opportunities to see customer reactions — they will have a shorter OODA loop.

On the flip side, the TDD team has less technical debt and lower risk profile. I disagree with the idea that technical debt is an option. I believe it does have an ongoing cost — that future development is less efficient and more time consuming to at least some degree.

I call this "servicing" technical debt, which is just like paying only the interest on your credit card. You might never pay down any of the technical debt, but as you accumulate more, you'll pay more to service it.

It seems clear to me that which result you prefer depends quite a lot on the maturity of the product (possibly expressed in terms of expected growth rate) and the overall risk level.

For a brand-new startup, the risk of failure is already pretty high regardless of coding style. A faster OODA loop probably reduces risk more than improved tests do, because the bigger risk is building something customers don't want. And with such a high risk of failure, there's a chance that you'll simply be able to default on technical debt.

If I can riff on the financial crisis, a startup has subprime technical debt. It's either successful — in which case there will be growth sufficient to pay off technical debt (if the risk/reward tradeoff justifies it) — or it fails, in which case the debt is irrelevant. Rapid growth deflates technical debt.

For a mature business, however, it might well go the other way. Risk to an existing profit stream is more meaningful and technical debt has to be paid off or serviced (rather than defaulted on) which reduces future profitability that might not be sufficiently offset by growth.

The quandary will be businesses — or products (if part of an established business) — that are in between infancy and maturity. There the "right" approach will depend heavily on the risk tolerance and growth prospects.

Regardless, I tend to strongly favor TDD at the unit-test level, where I find TDD helps me better define what I want out of a particular piece of code and then be sure that subsequent changes don't break that. At the unit test level, the effort put into testing can be pretty low and the benefits to my future code productivity fairly high.

But as the effort of testing some piece of behavior increases — due to external dependencies or other interaction effects — it's more tempting to me to let go of TDD and rely on human observation of the emergent behaviors because I'd rather spend my time coding features than tests.

I think that puts me a little closer to the Ovid camp than the strict TDD camp, but not all the way.

This entry was posted in coding and tagged , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

12 Comments

  1. Abigail
    Posted April 24, 2013 at 3:27 pm | Permalink

    Not only will the teams following Ovids strategy a shorter OODA, they're also more likely to earn money faster. Many startups play a race against the clock: have something producing money before the investment capital runs out. Using TDD so development in the future goes faster doesn't help you if you run out of money before (and most startups fail, that is, they run out of money before earning more money than they have costs). In my talks about this subject I call this "faster development in the future is worthless if the future never happens". Or, as some economists say, "100 USD now is worth more than 100 USD in the future" (and that's even true if there's no inflation).

    Now, I'm not arguing startups (or anyone else) should not use TDD, or that collection technical debt is always ok. But people shouldn't be too focussed on the future, it's the now that matters. A long time ago, when I was still too much in the camp of "never collect technical debt", I had a discussion with someone who is a lot smarter than I am. Condensed down, it went like "So, you want to spend X hours/week to avoid collection technical debt. If we all do this, we lose Y hours/week dev. time. That's Z features we cannot implement. On average, a feature brings in W euros/day. That's your investment, when do you earn it back?". The investment is known as "lost opportunity costs". (And this really adds up, the features I cannot do in the first week, don't make money in the second week either, in which I have another set of features I cannot create; both sets of missing features don't make money in the third week, in which my set of missing features gets longer, etc).

    It's not foolish to spend time to reduce technical debt. But it is foolish to do so without realizing you're actually making an investment now for a possible gain in the future. And investments carry risks. Many people and business went bankrupt because they made an investment that seemed like a good idea at the time.

  2. Posted April 24, 2013 at 5:20 pm | Permalink

    David, thanks for the write-up. I'm happy to see others proposing well-thought out counterpoints. I see that you wrote this:

    However, I note that Ovid's evidence in favor of emergent behavior is anecdotal, which is probably inevitable for this sort of thing, but dangerous. "It worked these handful of times I remember it" has confirmation bias and no statistical significance.

    You are absolutely correct that the way I describe emergent and unexpected behavior (both are important!) is anecdotal, but this is largely due to an NDA. Given the background of both Abigail and myself, you might guess that we have first-hand experience with this strategy being incredibly successful, thus our advocacy of looking at things from a different viewpoint. In fact, if you dig into modern Agile recommendations, most of what I'm suggesting is actually best practices!

    Where I differ is focusing on the area of customer-facing code that deals with subjective opinion (for example, "is a short synopsis going to cause worse conversion than a long synopsis?") rather than objective fact (for example, customers can't buy your product because clicking "submit" crashes the app). It's the "subjective opinion" area where we can test customer behavior rather than code behavior and this is fascinating as hell.

    The main area where I diverge in approach in this area is a heavy, heavy focus on (... drum roll please ...) testing. I'm a huge proponent of it, but the testing is focused on the high-risk areas that immediately and unquestionably destroy value. For the grey areas, it's OK to try different strategies.

    In fact, I suspect that a number of companies are using similar approaches, though I don't think they're doing a great job of specifying them too closely or sharing their experience. Thus, I can't describe this system with more than an "anecdotal" approach due to a combination of NDAs and having a new customer-centric approach to delivering what customers want.

    • Posted April 24, 2013 at 5:45 pm | Permalink

      What I mean by "anecdotal" is that you're describing cases that you remember being successful. Confirmation bias means that you're less likely to notice cases when it didn't work.

      You are not an objective observer when you are a participant in the experiment.

      You and Abigail worked at the same company, so you don't even provide very independent observations, either. What worked for you and Abigail at one particular company might be a horrible idea elsewhere.

      My approach to these things is to try to ignore the anecdotes, assume that any recommended approach might be good in certain circumstances and bad in others, and try to reason through how to figure out which is which.

      That means instead of saying "here's an example of how things worked out when my code didn't show results to the customer", I ran the thought experiment I described.

      In the end, I arrived at the same place, but in way that I think is more defensible than anecdotes.

      • Posted April 24, 2013 at 7:04 pm | Permalink

        Just to be clear, you realize that a thought experiment doesn't provide more validity than what you're describing as anecdotal, right? At the very least, Abigail and I can say "we have first-hand experience", something a thought experiment cannot. That's not to say that Abigail and I are right (it's hard, when you're pushing an NDA-laden idea), but arguing from experience rather than conjecture is going to be more compelling to some people.

        This is especially true when you look at the history of A/B testing and realize that opinion often fails in the face of customer behavior.

        • Posted April 24, 2013 at 8:25 pm | Permalink

          I think you're making too much of the NDA thing.

          I think it's sufficient to start off by saying (as Abigail has) that testing is an investment. Then your first posts are largely about explaining where you think that testing has the greatest value and where to draw the line.

          Your last post is tricker to categorize, but the premise seems to be that unintended behavior changes (with an appropriate safety net) can be insightful. But you never explain why a higher mutation rate is good.

          My response was intended to clarify what I think is the mechanism: that the monitoring prerequisites you assume mean that the faster mutation gives you a faster OODA loop. It's not mutation for mutation sake, it's mutation for the sake of accelerated learning.

          If I can sum up your argument another way, you're saying "test for the result you want" rather than "test for the result you expect".

          The subtle distinction is that what you "want" is for business metrics to move in a certain direction. That can't be determined via traditional testing, it can only be tested by exposure to customers.

          All that you can safely "expect" is that an application behaves in a certain way, which is what traditional testing tries to ensure.

          If you think of the application actually as the input to a system of customer interactions which results in business metrics as outputs, you're arguing that people should be more open to greater volatility of the inputs in order to observe effects on the outputs.

          That is the bit that seems sensible to me, but easily missed by someone tied to traditional testing as dogma.

      • Posted April 24, 2013 at 7:11 pm | Permalink

        In addition to my previous comment, I can't dispute your comments about confirmation bias because that's absolutely true. Under no circumstances would I argue that my experiences are universally true, but I would strongly urge people to look at publicly available information about companies who use a similar strategy and at least ask if this strategy is hurting them. (There's more I could say here that would actually alter my case, but again, an NDA prohibits me from being completely forthright).

  3. Posted April 29, 2013 at 2:24 am | Permalink

    What I think is missing here is an open source framework that can be easily (?) used to set up monitoring the customer behavior and providing the feedback.
    We all have lot of experience writing tests (ok, I know this is wishful thinking :), but most of us have never seen a working monitoring system.

    In our open source work we have not even seen large eough interaction with customers to have meaningful A/B testing.

  4. Posted April 29, 2013 at 4:45 am | Permalink

    just a few words to balance the NDA laden discussion. We use Ovids approach since we started (unfunded) ~4.5 years ago. We love this approach and managed to establish a working company in a market that is more than crowded. We use A/B testing heavily and try to be as customer focussed as we can and use a lot of monitoring to make sure our customers can do what they want.
    Strict TDD is not an option for us because by the time a feature settles in a way that customers love it we have rewritten it so often that we would have thrown away a lot of testing code. We push features very raw, see how our customers interact with them and and morph them into something they like to use. This is often far from what we planned to release in the first place.

    The whole Lean Startup idea is that we don't know where we will end up but we know how to measure if we get closer. Then it is just an exercise of iterating closer and closer to what your customers really want (as opposed to what you have planned and written down in your shiny business plan :-)

    • Posted May 1, 2013 at 6:13 pm | Permalink

      Strict TDD is not an option for us because by the time a feature settles in a way that customers love it we have rewritten it so often that we would have thrown away a lot of testing code.

      Exactly! Well said.

  5. cfedde
    Posted May 1, 2013 at 11:21 am | Permalink

    No one has proposed an environment that is completely free of all automated testing.

    Any reasonable application monitoring system is going to be built out of "checks" which in the end are analogous to continuous integration and regression testing. Such systems provide the foundation on which we build dependable platforms. So in the end it seems that the main argument is about timing, rigor and volume of testing rather than any kind of "to test or not to test" choice.

    We are all natural scientists. We all make streams of hypotheses about the world we live and work in and then execute experiments to prove or disprove these hypotheses. The main difference in this narrow system development world seems to be what tools we choose up front and what kind of artifacts we leave behind.

    • Posted May 1, 2013 at 6:18 pm | Permalink

      the main argument is about timing, rigor and volume of testing rather than any kind of "to test or not to test" choice

      I agree. Further, I used TDD as one example of a practice of technical debt avoidance. Refactoring and code cleanup are similar technical debt management strategies that divert resources from feature deployment.

      I think the debate is really about when to optimize for feature deployment versus cruft and risk avoidance.

  6. Posted May 7, 2013 at 5:02 pm | Permalink

    Ovid might "brand" the 100 teams of his approach as "pragmatic testers", because after all what I read here and on his blog, this is more or a pragmatic approach.

© 2009-2014 David Golden All Rights Reserved