About The Archive
36 years of CIA World Factbook data, preserved and searchable
The CIA only publishes the current edition of the World Factbook. When a new edition replaces the old, decades of geopolitical data disappears from public access. Historical editions survive only as scattered archives—plain-text on Gutenberg, cached JSON on GitHub, zip files on the Wayback Machine—each in a different format with no way to search or compare across years.
This archive brings every edition from 1990 to 2025 into a single, normalized, searchable database. Every country, every field, every edition—parsed from original CIA publications, standardized across 36 years of format changes, and made available for research, analysis, and public use.
The CIA discontinued the online World Factbook, making this archive one of the last comprehensive preserved copies of the full publication history.
Comprehensive validation identified and repaired three data quality issues: (1) 1996 data for 7 countries (Venezuela, Armenia, Greece, Luxembourg, Malta, Monaco, Tuvalu) was truncated in the Project Gutenberg source—replaced with complete data from the CIA's own original text file recovered from the Wayback Machine. (2) Zimbabwe 1998 had 176 duplicate/noise fields from HTML parser artifacts—cleaned. (3) Germany GDP 1994–1996 was stored under a self-named field instead of "National product"—corrected. See Sources & Methodology for full details.
Full-text search across 1,061,522 fields with AND/OR/NOT operators, phrase matching, field and year filtering.
Navigate any country in any year. View side-by-side comparisons. Track individual fields across the full 36-year span.
Choropleth maps, regional dashboards, timeline animations, country dossiers formatted per ICD 203 analytic standards.
Download as CSV, Excel, or formatted PDF. Bulk export all 36 years for any country. Print-ready paginated reports.
| Audience | Use Case |
|---|---|
| Researchers | Track how countries evolved over decades—GDP trajectories, population shifts, governance changes |
| Analysts | Compare economic, military, or demographic indicators across time and regions |
| Journalists | Verify historical claims against primary source data from U.S. Government publications |
| Archivists | Preserve access to public-domain government publications before they disappear |
| Students | Study international relations, political science, or intelligence analysis with real-world data |
| Data Scientists | Build structured, longitudinal country datasets from standardized, canonical field names |
| Component | Technology |
|---|---|
| Backend | Python / FastAPI |
| Templates | Jinja2 |
| Database | SQLite (deployed) / SQL Server (source) |
| Charts | Apache ECharts 5 |
| Hosting | Fly.io (Docker) |
| Source Control | GitHub |
| ETL Pipeline | Coverage |
|---|---|
| Plain-text parser | 1990–2001 (Gutenberg + CIA original) |
| HTML parser (5 variants) | 2000–2020 |
| JSON parser | 2021–2025 |
| Field canonicalization | 1,090 → 414 names |
| Entity standardization | 281 entities, 9 types |
| Regional classification | 6 COCOM commands |
Research analyst with training in information organization, archival methodology, intelligence analysis standards, and historical research methods. This project combines all four disciplines—library science to structure a government publication, historical methods for 36 years of format changes, ICD 203/208 standards for data presentation, and crime/intelligence analysis for the analytic tools.
| Credentials | |
|---|---|
| Education | MLIS (Library & Info Science) |
| Education | B.A. in History |
| Certificate | Crime & Intelligence Analysis |
This project is not affiliated with the Central Intelligence Agency, the Office of the Director of National Intelligence, or the U.S. Government. All data originates from the CIA World Factbook, a public-domain publication. The intelligence community formatting (ICD 203/208 structure, COCOM regional organization, confidence badges) is used for presentation purposes only and does not imply access to classified sources or methods.
Full Sources & Methodology →