Question 1

What is Legacy Code Archive?

Accepted Answer

Legacy Code Archive is a software archaeology project that systematically collects and preserves legacy code from the pre-AI era (roughly the 1980s to the 2000s). We rescue code sleeping in long-forgotten platforms before it disappears, and explore creative ways to use it.

Question 2

What kind of code do you collect?

Accepted Answer

We focus on pre-AI era code. Sources include Google Code Archive (1.4M projects), CodePlex Archive (108K repositories), SourceForge (~500K projects), abandoned GitHub repositories, academic code, government OSS, and retro/demoscene works (1980s–2000s).

Question 3

Why does old code matter?

Accepted Answer

Old code can be a “software antique” that is often missing from modern AI training data. TODO comments preserve real developer struggles like fossils, and recurring code-smell patterns can inform today’s software quality work. Tracing paradigm shifts across decades of code also has academic value as a natural history of software.

Question 4

How will the collected code be used?

Accepted Answer

We explore both research and creative directions: TODO Archaeology (comment fossil records), Code Archaeology AI (estimating age/origin/context of fragments), Software Natural History (paradigm shift studies), Before/After pair datasets (refactoring comparisons), Code Sonification (turning structure into sound), and “Legacy Whisperer” (training AI to understand messy code).

Question 5

Can I contribute code or information?

Accepted Answer

Yes. We welcome information about old codebases, research collaboration, and introductions to sources. Please contact us and select “Legacy Code Archive” in the contact form. We focus on code with explicit licenses in public repositories.

Question 6

How is it different from Software Heritage?

Accepted Answer

Software Heritage is a large-scale archive preserving tens of billions of source files. Legacy Code Archive is also about creative reuse: interpreting and exhibiting old code through methods like TODO archaeology and code sonification, and discovering new value in “antiques,” not only storing them.

LEGACY
CODE
ARCHIVE

WHAT IS THIS

INTERESTING NUMBERS

PUBLISHED WORKS

We Excavated 74,433 TODO Comments and Heard Developers Scream