Old code is an antique. Tangled history finally gets interesting.
Software Antiques Collection — Where tangled history finally gets interesting.
Source code from before the 1990s was written long before the age of AI. It is a software antique.
Hand-typed, made to work, left behind, and forgotten. We collect it from dead platforms before it disappears, and explore creative ways to use it today.
“Clean OSS is edited literature. Legacy code is field recordings.”
Collect first
Quietly, before it disappears
Find interesting uses
Creatively, without constraints
Answers come later
The time spent reading is part of the value
27B
unique source files
Software Heritage
220B
lines of COBOL
still in production
30-50%
industrial dead code
nobody understands
50%+
OSS projects die
within first 4 years
6.6yr
code half-life
Linux Kernel
4mo
code half-life
Angular (20x shorter)
3x
more predictable
code vs English
You don't have to decide yet. Some things only become visible after you collect enough.
A fossil record of developer struggles etched in comments. Surprisingly untouched territory.
An antique appraiser that estimates the age, origin, and context of unknown fragments.
Track the rise and fall of paradigms across 20 years of code. A natural history of software.
A parallel corpus of spaghetti-to-clean refactors. Fun even to just browse.
Turn the rhythm and structure of old code into sound. Listen to the legacy.
What happens if we train AI that only knows clean OSS on messy real-world code?
☙ FOUND IN: payroll_calc.c — last modified 2003-04-12
// ============================================
// FIXME: this workaround has been here since 1998
// Original author: unknown (left company in 2001)
// Last modified: 2003-04-12
// Nobody knows why removing this breaks payroll
// ============================================
if (month == 2 && day > 28) {
day = 28; // TODO: handle leap years properly
// HACK: just... don't deploy in February
}
// See you space cowboy...Do you have code like this sleeping somewhere? We are pretty good at reading it.
DISCOVER
API / Archive index
CLONE
--depth 1
EXTRACT
Metadata + cloc
SCAN
Smells + Secrets
STORE
Parquet + Raw
Frequently Asked Questions
Legacy Code Archive is a software archaeology project that systematically collects and preserves legacy code from the pre-AI era (roughly the 1980s to the 2000s). We rescue code sleeping in long-forgotten platforms before it disappears, and explore creative ways to use it.
We focus on pre-AI era code. Sources include Google Code Archive (1.4M projects), CodePlex Archive (108K repositories), SourceForge (~500K projects), abandoned GitHub repositories, academic code, government OSS, and retro/demoscene works (1980s–2000s).
Old code can be a “software antique” that is often missing from modern AI training data. TODO comments preserve real developer struggles like fossils, and recurring code-smell patterns can inform today’s software quality work. Tracing paradigm shifts across decades of code also has academic value as a natural history of software.
We explore both research and creative directions: TODO Archaeology (comment fossil records), Code Archaeology AI (estimating age/origin/context of fragments), Software Natural History (paradigm shift studies), Before/After pair datasets (refactoring comparisons), Code Sonification (turning structure into sound), and “Legacy Whisperer” (training AI to understand messy code).
Yes. We welcome information about old codebases, research collaboration, and introductions to sources. Please contact us and select “Legacy Code Archive” in the contact form. We focus on code with explicit licenses in public repositories.
Software Heritage is a large-scale archive preserving tens of billions of source files. Legacy Code Archive is also about creative reuse: interpreting and exhibiting old code through methods like TODO archaeology and code sonification, and discovering new value in “antiques,” not only storing them.
GET IN TOUCH
Whether it’s about collection/research, or “please do something about this code,” we’d love to hear from you.
CONTACT US→LEGACY CODE ARCHIVE
The antiques dealer doesn't explain.
They only say, “This is good, isn't it?”