This is the episode where I have to explain a chart to you. Not a simple chart. A chart with a log scale on both axes, vulnerability density on the X axis, total critical CVE count on the Y axis, and bubble sizes representing codebase scale. A chart where up and to the right means “things that are destroying the internet right now,” and down and to the left means “things that are doing their best.”
Almost nothing critical lived in the safe quadrant.
That was 2014. The projects have changed. The quadrant has not.
Setting the stage
Las Vegas, August 2014: a room full of hackers and a chart nobody wanted to see
DEF CON 22 was held at the Rio Hotel and Casino. The security research community was still processing Heartbleed, which had been disclosed four months earlier in April. The talk was called “Open Source Fairy Dust,” and the core argument was simple enough to summarize on a t-shirt: the internet runs on software maintained by people who are never paid to care about security, and nobody has actually looked at most of it.
The data behind that argument took roughly six months to compile, running automated static and dynamic analysis pipelines across 2,000+ open source projects using a fleet of AWS spot instances, Jenkins for orchestration, clang, Breakman, and FindBugs for static analysis, DTrace and GDB for dynamic analysis, and the National Vulnerability Database as a cross-reference to validate findings and calibrate the tooling. Crucially, to handle the sheer volume of data, I applied machine learning techniques and clustering algorithms to separate the wheat from the chaff of vulnerability mountain, filtering out noise to surface true positives. The methodology had flaws, as any large-scale automated analysis does. But the signal was so strong that the flaws in the methodology didn’t change the conclusion by an order of magnitude. The projects that were dangerous were very dangerous.
A note on the dataset: the 2,000+ projects analyzed were not a random sample. They were weighted toward projects that actually run the internet — web servers, DNS resolvers, mail servers, hypervisors, crypto libraries, security tools, and the most commonly downloaded packages from early npm and PyPI. The dataset was biased toward “things that matter if they’re broken.” That bias means the results are worse than they would have been for a truly random sample. It does not mean I overstated the risk for the infrastructure that was actually being analyzed.
The methodology
How do you measure “how broken is the internet” with a Jenkins pipeline?
The core metric was vulnerability density: for every thousand lines of code, how many critical vulnerabilities were present? This was plotted on a log scale because the distribution was not normal — it was power-law shaped, with a small number of projects accounting for a disproportionate share of total critical CVE count. Exim was not a mild outlier. It was a category unto itself.
The density metric was calculated two ways: from the automated static/dynamic analysis findings, and from NVD historical data. Both were plotted. Where they diverged significantly, that divergence was itself informative — it usually meant either that the automated tooling was finding things NVD hadn’t catalogued yet (newly discovered), or that NVD had catalogued things the automated tooling missed (known but hard to find programmatically). The cross-reference gave confidence intervals.
Static analysis
Clang static analyzer, Coverity, Fortify, Breakman (Ruby), FindBugs (Java). Identified potential memory safety issues, injection vectors, and logic errors without executing code. High false-positive rate, but useful for bounding the search space before dynamic analysis.
Dynamic analysis
DTrace, GDB, afl-fuzz, custom fuzzing harnesses. Executed the software under instrumentation with genetically and evolutionary malformed inputs. Found runtime behaviors that static analysis missed. Provided code coverage metrics to bound confidence in the findings.
NVD cross-reference
National Vulnerability Database historical data used to calibrate findings. Projects with high CVE density historically were expected to have high density in the automated analysis. Divergences were investigated manually.
Manual review
Selected findings from automated tools were manually verified. This was the rate-limiting step — there are only so many hours in a day. The most interesting findings (high-confidence criticals in widely-deployed projects) were prioritized.
The infrastructure behind this analysis was not sophisticated by 2026 standards. A few hundred AWS spot instances, a Jenkins pipeline, some Python glue code to aggregate findings, and a lot of manual review. What made it novel in 2014 was not the technology — it was doing it systematically across a large number of projects that nobody had bothered to look at, and being willing to publish what was found even when the findings were uncomfortable.
The uncomfortable part: nobody was looking at the code. The “many eyes make all bugs shallow” hypothesis — Linus’s Law — was and remains empirically unsupported for most open source projects. The projects that received serious security scrutiny were a small fraction of the projects being deployed everywhere. The rest operated on collective assumption.
The chart
Up and to the right is terrifying. Almost everything critical lives there.
Before diving into the individual projects, it helps to understand what the chart is actually showing and why the log scale matters. Vulnerability density is not uniformly distributed across a codebase — it tends to cluster in specific subsystems (memory management, input parsing, cryptographic implementation). A project with 100,000 lines of code and 500 critical vulnerabilities is not the same as a project with 1,000 lines and 5 critical vulnerabilities, even though they have the same density. Scale matters. The bubble size in the chart below represents codebase scale.
13K
6K
4.5K
3.2K
2K
mod.
mod.
290
Project deep dives
The outliers, one by one
The chart tells the aggregate story. The individual projects tell the causal story. Each one is a different flavor of the same underlying failure: the people writing the code did not have the time, resources, or incentive to make it secure, and the people depending on the code assumed someone else had checked.
Exim is the mail transfer agent that quietly handles the most email in the world. Not Gmail. Not Exchange. Exim. Installed by default on more Linux distributions than any other MTA, powering universities, ISPs, corporate mail infrastructure, and government systems across the globe. And in 2014, it had approximately one critical vulnerability for every 2.7 thousand lines of code.
The reason Exim sits so dramatically in the upper-right danger zone is not malice or incompetence. It is the intersection of three structural facts: Exim is written in C, which means every memory management decision is a potential attack surface. Exim processes untrusted input from the open internet by design — that is its entire purpose. And Exim was, for most of its history, maintained by a very small team of volunteers with no dedicated security engineering support.
The canonical Exim vulnerability class is not subtle. It is buffer overflows and integer overflows in the SMTP input processing path. An attacker sends a carefully crafted envelope to an Exim server; Exim trusts the length field; Exim writes past the end of a buffer; code execution. The specific mechanisms vary across the 13,000 criticals, but the root cause is consistent: C code that handles external input without sufficient bounds checking, written by people whose mental model was “legitimate SMTP clients send well-formed envelopes.”
The DEF CON 22 talk asked the audience to guess the most dangerous project before revealing the chart. The correct answer — Exim — was guessed by exactly one person. Most guesses were Sendmail (reasonable but outdated), Exchange (not open source), or a DNS server. The fact that the internet’s most dangerous mail server was not the internet’s most famous mail server was itself part of the point: the projects that get scrutiny are not always the projects that need it.
BIND is the Berkeley Internet Name Domain server — the software that resolves domain names to IP addresses for a substantial fraction of the internet. If BIND has a vulnerability, the internet’s directory service becomes an attack surface. DNS cache poisoning, remote code execution, amplification attacks for DDoS — the attack classes enabled by DNS vulnerabilities are among the most consequential in network security.
Bind 8’s position in the danger zone was not a surprise to anyone who had been paying attention. The BIND team had been producing security advisories at a steady cadence for years. What the DEF CON 22 analysis added was scale and context: Bind 8 had approximately 6,000 critical vulnerabilities across its history, the vast majority of which were memory management issues in C code handling DNS protocol parsing. The pattern was identical to Exim: external untrusted input, C, insufficient bounds checking, volunteers.
The more interesting story was Bind 9. The BIND team had recognized by the early 2000s that Bind 8’s architecture was fundamentally broken from a security perspective, and undertook a complete rewrite. Bind 9 was positioned as the security-conscious successor. The DEF CON 22 data showed Bind 9 at approximately 2,000 critical CVEs — meaningfully better than Bind 8, but still firmly in the high-risk zone. The rewrite improved the security posture without solving the underlying structural problem: a codebase written in C, processing untrusted DNS packets from the open internet, maintained by a resource-constrained team.
The Bind 9 story foreshadows a pattern that repeats throughout the dataset and through history: security rewrites improve the situation without resolving it. You can fix the specific vulnerabilities that motivated the rewrite. You cannot simultaneously fix the language, the deployment environment, the maintainer resource constraints, and the incentive structure. Bind 9 was better than Bind 8. It was not safe. It was a better approximation of safety produced by people who genuinely cared, working within structural constraints that made “actually safe” unreachable.
Security researcher Dan Kaminski discovered a fundamental flaw in the DNS protocol itself that allowed cache poisoning attacks against virtually all DNS implementations, including BIND. The disclosure was handled through an unprecedented coordinated effort — Kaminski worked with DNS vendors for months before public disclosure to ensure patches were ready simultaneously. Despite the coordination, the patch deployment was incomplete for years. The DEF CON 22 talk referenced this directly: the DNS administrators who delayed patching did so because “change is hard” and they didn’t personally suffer the consequences of running unpatched DNS. This is the incentive structure problem in its purest form.
The OpenSSL story is the one that broke through to the mainstream press in 2014, so it requires less historical reconstruction than Exim or BIND. But the DEF CON 22 framing adds context that the mainstream coverage missed.
Heartbleed (CVE-2014-0160) was disclosed on April 7, 2014, four months before DEF CON 22. It was a buffer over-read in OpenSSL’s implementation of the TLS heartbeat extension — a feature that allows a TLS client to keep a connection alive by sending a small payload and receiving it back. The bug: the server trusted the length field in the heartbeat request without validating it against the actual payload length. A malicious client could specify a length of 65,535 bytes while sending a 1-byte payload, and the server would respond by sending 65,535 bytes of process memory — potentially including private keys, passwords, session tokens, and other sensitive data that had been anywhere near that memory region.
The code that contained Heartbleed was contributed by a developer named Robin Seggelmann on New Year’s Eve 2011. It was reviewed and committed by another OpenSSL developer. Both were volunteers. The OpenSSL project at the time of the disclosure had one full-time developer and a handful of part-time volunteers, collectively receiving less than $2,000/year in donations for a library that secured the majority of encrypted internet traffic on the planet. When security researcher Ben Laurie estimated the economic value of the software these volunteers were maintaining for free, the number was somewhere in the hundreds of billions of dollars.
The DEF CON 22 framing of Heartbleed was not about the technical details — those had been extensively covered. It was about the governance failure. OpenSSL at 4,500 criticals was not an outlier in the dataset; it was emblematic of the pattern. A critical piece of infrastructure, maintained by volunteers with insufficient resources, processing untrusted input, written in a memory-unsafe language. The specific bug was Heartbleed. The category was “inevitable.”
Heartbleed triggered the creation of the Core Infrastructure Initiative — a Linux Foundation project in which major tech companies pooled resources to fund security audits and maintenance of critical open source infrastructure. OpenSSL received meaningful funding. The post-Heartbleed OpenSSL and the forked LibreSSL and BoringSSL projects all improved substantially. The 2014 DEF CON 22 data points to the pre-Heartbleed state. But the pattern of “critical infrastructure, underfunded, discovered after the fact” did not end with OpenSSL. It continued through XZ Utils in 2024 and into the current moment.
Not every project in the dataset lived in the danger zone, and it’s important to understand why the exceptions exist. Apache and nginx were both in the relatively safer quadrant in 2014, which deserves explanation because they handle the same category of untrusted internet input as Exim and BIND — and yet their CVE density was substantially lower.
Apache’s relative safety had structural causes. The Apache Software Foundation had, by 2014, developed a mature security response process, a dedicated security team, and a culture of security review that was unusual in the open source landscape. The ASF took security seriously not because its volunteers were inherently more conscientious than Exim’s, but because it had the institutional infrastructure to support that conscientiousness: defined disclosure procedures, a security team with decision authority, and enough organizational scale to attract contributors who specialized in security.
Nginx’s story was different. nginx was newer (2004 vs. 1995), and its author, Igor Sysoev, had written it with performance and correctness as primary design goals. The architecture was fundamentally different from Apache — event-driven, single-threaded, with a smaller attack surface almost by design. Fewer features meant fewer code paths; fewer code paths meant fewer places for vulnerabilities to hide.
Neither Apache nor nginx was “safe” in any absolute sense. Both had CVEs. Both had critical vulnerabilities over their lifetimes. The scatter chart is a relative comparison within the dataset, not an absolute certification of security. A project being in the safe quadrant meant: compared to Exim and BIND, this project was doing better. It did not mean: this project had no vulnerabilities worth finding. The DEF CON 22 framing was always comparative, never absolute.
The root cause
Why the dangerous projects were dangerous: the incentive structure nobody wanted to discuss
The most important slide in the DEF CON 22 talk was not the scatter chart. It was a priority ranking of what open source maintainers actually optimize for, derived from analyzing project history, issue tracker patterns, and contributor behavior across the dataset. The ranking, from most to least prioritized:
This priority order is not a character flaw in the people who build open source software. It is a rational response to the incentive environment they operate in. If your project’s users reward functionality and punish instability, and if security failures are invisible until someone demonstrates an exploit, the rational allocation of limited volunteer hours places security near the bottom of the stack. This is not a failure of individual virtue. It is a structural problem that individual virtue cannot solve.
“Everybody was sure somebody would do it. Anybody could have done it. But nobody did it — and everybody blamed somebody, when nobody did what anybody could have done.”
— DEF CON 22, 2014. The Everybody/Somebody/Nobody/Anybody parable applied to Heartbleed, and by extension to every other critical vulnerability in the dataset. The blame directed at the OpenSSL maintainers in 2014 was particularly acute. One to three developers maintaining 500,000 lines of C that secures the majority of internet traffic. The blame should have been directed at the system that made that possible, not the humans who worked in it.The contributor ecosystem
Who actually builds the software the internet runs on, and why that matters for security
The DEF CON 22 talk included an analysis of contributor archetypes across the dataset. Understanding who contributes to OSS projects — and what motivates them — is essential to understanding why the incentive structure produces the results it does.
The activist
Contributes to solve a specific ethical, political, or lifestyle problem. Deep motivation, often technically capable, but narrowly focused. A privacy-focused activist contributing to a VPN client is unlikely to spend cycles on input validation in the SMTP parsing path of a different project.
The hobbyist
Scratches an itch. Contributes a patch or two, maybe maintains a small project for a year or two, then moves on. The long tail of OSS contribution. For widely-deployed infrastructure projects, hobbyist contributions in C/C++ are among the highest-risk inputs: enthusiastic, well-intentioned, and statistically unlikely to have deep security engineering expertise.
The artist
Contributes for the craft of it. Creates things like new programming languages, esoteric algorithms, visually interesting code. Deeply passionate, often highly skilled, not primarily security-focused. For infrastructure projects, artist contributions tend to be in the core logic rather than the attack surface, which reduces risk somewhat.
The professionally motivated
Works for a company that depends on the project, or is paid to contribute. This is the archetype that produces the most security-conscious contributions — but only when the company cares about security. A company contributing to a mail server for reliability reasons will contribute reliability-focused code. A company contributing for security reasons (rare) will contribute security-focused code.
The implication for the vulnerability data is direct. The projects with the worst security posture were predominantly maintained by hobbyists and small teams of volunteers with deep domain expertise (DNS, SMTP, cryptography) but limited security engineering background. Writing a correct SMTP parser in C is a different skill set from writing a safe SMTP parser in C. Most of the people who built these projects had the former and not the latter, and the projects reflected that asymmetry.
The language problem
C, C++, and the memory safety debt that is still being paid in 2026
The correlation between C/C++ codebases and high vulnerability density in the dataset was not perfect — there were C projects with relatively low density — but it was strong enough to identify as a primary risk factor. Understanding why requires a brief detour into what C actually is as a programming model.
What C gives you
- Direct memory management — you decide when to allocate and free
- No bounds checking by default — you can read and write past the end of arrays
- No type safety at runtime — you can cast anything to anything
- No null pointer protection — dereferencing null is undefined behavior
- Integer arithmetic that wraps silently — overflow is your problem
- The full performance of the hardware with no safety abstractions
What C costs you
- Every memory allocation and deallocation must be manually correct across all code paths
- Every array access must be manually bounds-checked or the consequences are undefined
- Every integer arithmetic operation that could overflow must be explicitly guarded
- The cognitive overhead scales with codebase size: 500K lines of C requires tracking hundreds of thousands of invariants
- Security bugs in C are often valid C — the compiler won’t catch them
The practical implication: in a large C codebase maintained by a small team under resource pressure, the probability of memory safety bugs approaches certainty. Not because the developers are careless, but because the language requires a level of sustained attention to safety invariants that humans reliably cannot maintain across 500,000 lines of code over decades of development by rotating contributors. The bugs in Exim, BIND, and OpenSSL were not programmer mistakes in the conventional sense. They were the predictable output of a system that required programmers to be perfect and then expressed surprise when they weren’t.
The memory safety problem in systems software is not a secret, and it predates DEF CON 22. The Mozilla Foundation began developing Rust in 2010 specifically to address it. Google’s internal data showed that approximately 70% of security vulnerabilities in Chrome and Android were memory safety issues. The NSA issued guidance recommending memory-safe languages in 2022. The White House Office of the National Cyber Director issued similar guidance in 2024. In 2026, Exim is still written in C. BIND is still written in C. The OpenSSL codebase still has C at its core. The language problem is known, documented, and the transition is happening at the pace of organizational inertia rather than the pace of the threat.
The disclosure problem
What happens when you find something and nobody has a process for receiving it
One of the more embarrassing moments in the DEF CON 22 talk was a personal one. During the analysis, a significant vulnerability was found in a widely-deployed DNS-adjacent project. The standard disclosure process was attempted: emailed the security@ address listed in the RFC, sent a message to the project’s general mailing list, waited for a response. Silence. After two weeks, the vulnerability was filed in the project’s public issue tracker, because that was the only mechanism that seemed to result in any response.
A core developer responded within hours — not to acknowledge the vulnerability, but to ask why it had been filed publicly where everyone could see it. The answer: because there was no other visible mechanism. The project had no documented security disclosure process. The security@ address bounced. The general mailing list was unanswered. The issue tracker was the only thing that worked.
Of the 2,000+ projects analyzed in the DEF CON 22 dataset, fewer than 15% had a documented security disclosure process — a dedicated security contact, a defined timeline for acknowledgment and remediation, or any published policy for how to report vulnerabilities. For projects in the danger zone (high density, high count), the number was even lower. The projects most in need of coordinated vulnerability disclosure were the ones least prepared to receive it. This is not coincidental. The same resource constraints that produce security vulnerabilities also produce the absence of security processes.
The industry has improved since 2014. GitHub’s private security advisory feature, the widespread adoption of SECURITY.md files, HackerOne and Bugcrowd for bug bounties, CVE CNA programs for open source projects — all of these represent real progress. But the progress is uneven. The projects that receive security investment are disproportionately the high-profile ones. The long tail of critical infrastructure — the Exims and the FreeRADIUSes and the OpenPGP.jses — still largely operates without coordinated disclosure processes, still relies on volunteers to respond to security reports in their spare time, still lacks the institutional infrastructure to handle a flood of simultaneous vulnerability reports. Which is exactly what Project Glasswing is about to produce.
What changed (and what didn’t)
From 2014 to 2026: a twelve-year audit
The DEF CON 22 data was collected in 2014. Twelve years is a long time in technology. It is worth being explicit about what has improved, what has remained static, and what has gotten worse.
- OpenSSL post-Heartbleed: CII funding, dedicated team, LibreSSL and BoringSSL forks with improved security posture
- Linux kernel: memory safety investments, growing adoption of Rust for new kernel modules, active security team
- Disclosure processes: SECURITY.md widespread, GitHub private advisories, CVE automation improving
- Tooling: OSS-Fuzz running continuously against hundreds of critical projects, catching vulnerabilities before deployment
- Language alternatives: Rust adoption growing in systems software; memory safety in new code is increasingly achievable
- SBOM awareness: software bill of materials requirements emerging in government procurement
- The maintainer resource constraint: critical infrastructure still predominantly maintained by volunteers or underfunded small teams
- The incentive structure: stability and performance still dominate; security still last
- The language debt: most critical infrastructure is still C/C++; rewrites are slow
- The deployment lag: organizations still running old versions because “change is hard”
- The Everybody/Somebody/Nobody dynamic: diffusion of responsibility still produces audit gaps
- The disclosure gap: projects with the worst security posture still have the worst disclosure processes
- Attack surface expansion: the transitive dependency graph has grown by orders of magnitude; the 847-package login form didn’t exist in 2014
- Threat actor sophistication: nation-state supply chain attacks (XZ, Trivy, Axios) represent a qualitative escalation from 2014
- ML stack: an entirely new category of critical infrastructure (TensorFlow, PyTorch, LiteLLM) with the same structural problems and a security posture reflecting its research origins
- Discovery capability: Glasswing-class AI can find bugs faster than the ecosystem can patch them — a capability inversion that didn’t exist in 2014
- Targeted maintainer attacks: the XZ social engineering playbook has been replicated; OSS maintainers are now high-value human targets
The bridge
From the 2014 scatter chart to the 2026 cascade: the same structural failure, twelve years deeper
The DEF CON 22 analysis was not a prediction of Trivy, LiteLLM, or Axios. It was a diagnosis of the structural conditions that made those incidents inevitable once the threat model evolved to include nation-state actors with the patience and resources to exploit them systematically.
The specific vulnerabilities change. The attack vectors evolve. The infrastructure layers shift from DNS and SMTP to CI/CD pipelines and AI gateways. But the underlying cause — critical infrastructure maintained by under-resourced humans operating in an incentive environment that doesn’t reward security — has not been resolved. It has been compounded.
Project Glasswing finds bugs at machine speed because the bugs are still there. The 27-year-old OpenBSD bug that Mythos found existed because OpenBSD, for all its security focus, still runs on code written by humans in a memory-unsafe language over decades, with finite review capacity and infinite attack surface. The FFmpeg bug was 16 years old. These are not anomalies. They are what the 2014 data predicted: the safe quadrant was never as safe as it looked, and the dangerous quadrant was always more dangerous than anyone was willing to publicly admit.
The 2014 DEF CON 22 framing: “nobody is looking at the code, the projects that run the internet are riddled with critical vulnerabilities, and the incentive structure guarantees this will continue.”
The 2026 update: we were right about all of it. The specific projects have changed. The structural diagnosis has not. And now we have a model that finds 27-year-old bugs in OpenBSD and 16-year-old bugs in FFmpeg in a matter of weeks — which means either we fix the underlying structure, or we spend the next decade watching an AI scanner surface the backlog of everything nobody looked at, faster than the humans responsible for patching it can respond.
