What is a NOTAM, and why should you care
Last week, the Federal Aviation Administration suffered an outage to a system called NOTAM. It's not the only critical infrastructure system that's in danger of failing.
Last week, the Federal Aviation Administration suffered an outage to a system called NOTAM. It’s a messaging system that notifies airports and pilots of safety-critical conditions, like that there’s a crane blocking visibility of a certain runway at a certain airport, or that a runway has ice on it. The outage brought American airspace to a standstill, and it’s all anyone was talking about for a day.
The next day, the problem was over, and people weren’t talking about it anymore. Federal systems suffer publicly visible outages like this every once in a while, and it is big news for a day or so. Then the problem is fixed, and we forget about them. But we shouldn’t: I don’t know what caused FAA’s outage; there is a report that the issue was caused when “personnel corrupted a data file,” but this tells us nothing about the reasons for why it happened. But I strongly suspect the outage was not due to an acute event (“Someone pressed the wrong button—OOPS! All better now”), but rather decades of underinvestment that steadily increased the risk of accidents. Just about every Federal IT system I’ve come across in my ~15 years is suffering from this slow bleed, and as a result, they are barely held together, always on the verge of collapse, with “personnel” fighting like hell 24/7 to keep them up. I bet the NOTAM outage has been a long time coming. So while this event quickly fades from the news, the system is likely just as vulnerable to another outage today as it was a few days ago, as are countless other systems that are critical to our daily lives.
Most of these critical systems are invisible to us. Had you ever heard of NOTAM before last week? I hadn’t. And I’m supposedly a Federal IT expert! There are thousands of such systems that the vast majority of the American public are completely unaware of. They all do something that American life depends on. And they’re all hanging on by a thread.
IT systems are delicate. Running one is like raising a child: it requires constant care and investment just to keep it running. If you want to do something new and fancy that it wasn’t originally built to do, that’s another major investment that will take years of research, planning, and implementation, followed by higher ongoing maintenance costs to account for the added complexity. My point is, to outside observers, computer systems might appear to “just work,” until they don’t, and then they do again. But that is not at all correct. Just look at Twitter. Feature-wise, it is an extremely simple service: let a user post a short message, and let that user’s followers see the message. It does this on a massive scale, but the number of things it does is comparatively small. But following the mass exodus of employees a few months ago, the cracks quickly began to show: spambots are having a much easier time getting onto Twitter and staying on. There have been days when the service has been unavailable for short periods. And the login system was broken for millions of users for a few days. This is how well the service is working, despite still being supported by around 5,000 staff.
Now compare the functionality of Twitter to a system that has to send messages to airports and pilots around the world, many of whom are airborne, and make those messages accessible from, I imagine, a wide range of devices, including many that are several decades old. Are these devices even on the Internet? At one point, they certainly were not—NOTAMs have been around since the late 1940s. (If you want to dig into NOTAM and see some examples, here is the FAA’s guide for pilots, published as—god help us all—a PowerPoint deck.) My point is, Federal systems are complicated, old, slow, expensive, crusty—and incredibly important.
How do we let this happen to systems that are so important? Many reasons. Agency budgets are highly variable from year to year, so an in-progress plan to modernize systems can suddenly be derailed for reasons that have nothing to do with the mission and more to do with Congressional horse trading. Political leadership changes every few years, which causes entire organizations—IT systems and everything else—to shift their priorities. Getting rid of staff whose skills are outdated is unheard of; recruiting replacements who know how to build modern IT systems and convincing them to work for a government salary is almost as difficult.
But the main thing holding back federal systems is that the government has deferred its responsibility for systems to private firms. Most federal IT shops do not do much engineering of their own. Instead, they hire companies to do it, and these companies build and maintain key mission systems at great physical and mental distance from their government customer. The government is often poorly positioned to monitor the contractor’s work quality—not to mention, the government staff don’t even have the expertise to evaluate the work. We end up with a situation that resembles the old quip about Soviet industry: contractors pretend to work, and the government pretends to monitor the work.
There will never be a day when Federal systems are built and run solely by government employees. That would not be good; the private sector has too much talent and domain knowledge to ignore. But the work must be led by government employees. The work of designing our public digital infrastructure, of defining its success, of testing it on users, and of re-envisioning it as the world changes, needs to be performed by people who raised their hand to support a civic mission, and took an oath to defend the Constitution. It should not be led by people whose primary allegiance is elsewhere, and who have signed up to work any old software project, so long as it offers a paycheck.
I’ve written about this problem here and there over the years, but always in a less accessible way targeted at others in the industry. But this problem affects everyday people, and I want them to understand how. This stuff is arcane and boring to most, but in our system of government, the only real solution to this problem is for those same everyday people to start caring about it. That challenge—to make everyday people passionate about a raft of indecipherable government acronyms and how they are impacting their lives—will be the main focus of these letters going forward. This is a steep challenge. It's probably impossible! But I know what the future holds if no one tries.