Sourcing Methodology
How we source, verify, classify, and protect the information in the Epstein Exposed database. Every fact is traceable to a source document. Every connection is algorithmically derived. Every victim is protected.
Last updated: March 1, 2026
Source Archives
All data originates from publicly available government records, court filings, and FOIA responses. We do not use leaked, stolen, or illegally obtained materials. Every document is traceable to its releasing authority.
DOJ EFTA Releases (DS1–DS12)
28,942 PDFsDatasets released by the U.S. Department of Justice under the Epstein Files Transparency Act (H.R.4405). Twelve sequential releases from federal archives including prosecution files, grand jury exhibits, and investigative materials.
Court Filings (SDNY & SDFL)
2,130+ filingsUnsealed court documents from Giuffre v. Maxwell (15-cv-7433), USA v. Maxwell (20-cr-330), USA v. Epstein (19-cr-490), and related civil litigation in the Southern Districts of New York and Florida.
FBI Investigative Reports
100+ FD-302sFBI Form FD-302 witness interview memoranda and investigative reports obtained through FOIA requests or released as part of court proceedings. Includes Palm Beach and SDNY investigation files.
Flight Manifests
1,708 flightsAircraft manifests from Epstein’s fleet: Boeing 727-31 (N908JE “Lolita Express”), Gulfstream II (N171JE), Cessna 421, and helicopter. Entered as court exhibits with FAA tail number verification.
Black Book
1,571 entriesEpstein’s personal contact directory obtained by law enforcement during the 2005 Palm Beach investigation. Entered into court records as Exhibit 80. Presence indicates recorded contact information, not wrongdoing.
Financial Records
446 flows tracedWire transfers, bank statements, shell company filings, and financial flow analyses. Includes Deutsche Bank and JPMorgan compliance records, USVI estate proceedings, and forensic accounting of $5.3B+ in traced flows.
Epstein-Docs Archive
8,186 documentsCommunity-maintained GitHub archive with structured document summaries and metadata. Cross-referenced against official releases for accuracy. Supplements DOJ datasets with additional context and indexing support.
Email Correspondence
9,900+ emailsInternal communications from the Epstein operation including scheduling emails, financial instructions, travel arrangements, and correspondence between associates. Person-linked via name matching.
Department of Justice
Official DOJ releases, indictments, plea agreements, non-prosecution agreements, and sentencing documents. Includes the controversial 2007 NPA and 2019 SDNY indictment.
House Oversight Committee
Documents released by the House Oversight Committee investigations, including internal correspondence, agency communications, and testimony transcripts related to prosecution oversight.
State Court Records
Proceedings from USVI v. Estate of Jeffrey Epstein, Florida state investigations, and New Mexico regulatory actions. Includes the $105M USVI government settlement.
FOIA Responses
Freedom of Information Act responses from the FBI, DOJ, Secret Service, FAA, and other federal agencies. Requests ongoing with periodic new releases.
Verification Framework
Every piece of information is classified into one of five verification tiers based on its source reliability. Higher tiers carry greater evidentiary weight. Trust levels are displayed throughout the interface so users can assess information quality at a glance.
Court Document
Tier 1Primary sources filed with federal and state courts. Highest reliability. Includes unsealed filings, judicial rulings, exhibits admitted into evidence, and court-ordered releases.
Official Government Record
Tier 2Records issued by government agencies in their official capacity. Authenticated by the issuing body. Includes DOJ releases, FBI reports, FAA records, and legislative documents.
Sworn Testimony
Tier 3Depositions and trial testimony given under oath, subject to perjury penalties. Represents one party’s account of events. Cross-referenced against other testimony and documentary evidence where possible.
Investigative Source
Tier 4Findings from credentialed journalists and established research institutions. Cross-referenced against primary sources where possible but not independently verified by this project.
Unverified
Tier 5Uncorroborated claims, anonymous tips, or single-source information. Included for completeness with clear labeling. Users should treat this information with appropriate skepticism.
Data Processing Pipeline
Raw documents undergo a nine-stage processing pipeline before entering the searchable database. Each stage is auditable, and processing metadata is retained for every document.
Document Ingestion
PDFs downloaded from DOJ EFTA release servers and verified against published dataset manifests. File integrity confirmed via size and count matching against official DOJ dataset descriptions.
SHA-256 Hash Verification
Cryptographic hash computed for every ingested document and stored in the document_hashes table. Enables detection of any post-publication modification or deletion by the releasing agency.
OCR Text Extraction
Optical character recognition applied to scanned documents using Tesseract with automated cleanup and validation passes. Extracted text is stored in full-text search indexes for instant retrieval.
Structured Summarization
Structured summaries are generated during ingestion with document type classification, key person identification, and source citations. Derived summaries are treated as research aids and should be checked against the underlying source files.
Named Entity Recognition
Multi-word person name matching with word boundary enforcement. Single-word names are never auto-linked to prevent false positives. Minimum 5-character threshold for entity candidates.
Cross-Reference Linking
Person-to-document relationships established via the document_persons table. Each link records the matching method and confidence level for full auditability.
Semantic Embedding
Document text chunked and embedded into high-dimensional vectors for semantic search. HNSW indexing enables sub-second similarity queries across the full corpus.
Integrity Monitoring
Continuous re-verification of document hashes against stored values. Automated detection of DOJ deletions and modifications with public reporting of anomalies.
PII Redaction
Automated detection and redaction of victim personally identifiable information: Social Security numbers, dates of birth, phone numbers, and addresses in victim context. 80+ public figures excluded from name redaction.
Person Classification
Every person in the database is assigned a category based on their primary public role and a trust level based on their legal status in proceedings related to the Epstein case.
Categories
Elected officials, government appointees, diplomats, political operatives
Executives, entrepreneurs, financial professionals, hedge fund managers
Members of royal families and aristocratic lineages
Entertainers, actors, musicians, media personalities
Known associates, staff members, personal contacts, recruiters
Attorneys, judges, prosecutors, law enforcement
Professors, researchers, university administrators, scientists
High-society figures, philanthropists
Military officers, intelligence operatives, national security officials
Persons not fitting standard categories
Trust Levels
Each person is assigned a trust level based on their status in legal proceedings. These levels help users understand the context of each individual's inclusion in the database.
Found guilty in a court of law for offenses related to the Epstein case. Conviction is a matter of public record.
Formally charged or indicted by a grand jury. Case may be pending, resolved through plea, or dismissed. Charging documents are public record.
Named in civil lawsuits, victim allegations, or sworn testimony but not criminally charged. Allegations are claims, not findings of fact.
Appears in flight logs, contact books, financial records, or other documentary evidence. This does not imply wrongdoing or knowledge of criminal activity.
Referenced in documents, depositions, media reporting, or secondhand accounts. Included for research completeness. Mention alone carries no implication.
Evidence Attribution
Source badges appear on person profiles and search results to indicate what types of primary evidence document their connection to the case. Badges are assigned algorithmically based on verified document linkages.
Person appears on one or more aircraft manifests as a passenger, pilot, or crew member.
Person’s contact information is recorded in Epstein’s personal directory (Court Exhibit 80).
Person is named in one or more court documents, including filings, exhibits, or transcripts.
Person is referenced in Department of Justice materials, including EFTA releases.
Person or associated entity appears in financial flow records, wire transfers, or bank compliance documents.
Person is referenced in reporting, secondhand accounts, or unsworn statements only. No primary-source documentary evidence.
Connection Methodology
Connections between persons are derived algorithmically from co-appearances in documentary evidence. The system counts shared flights, co-mentions in documents, and direct references in depositions to establish relationship strength.
Direct documentary evidence of a substantive relationship. Multiple independent sources confirm the connection.
Circumstantial evidence of contact or proximity. Connection documented but nature of relationship may be unclear.
Tangential or single-source connections. Included for completeness but should not be relied upon for conclusions.
Victim Privacy & PII Redaction
Protecting victim and survivor privacy is a core obligation of this project. We proactively redact personally identifiable information and maintain a confidential process for additional removal requests.
Proactive Redaction Program
Phase 1: 100 FBI FD-302 interview memos — 624 redactions of victim SSNs, dates of birth, and contact information.
Phase 2: 927 critical documents — 13,495 redactions including victim names in ALL CAPS context, addresses, and phone numbers.
Phase 3: 539 SSN-classified documents — 2,726 redactions with 7,765 SSN-like patterns analyzed and 6,968 false positives correctly excluded.
What We Redact
- •Social Security numbers (full and partial)
- •Dates of birth for victims and minors
- •Phone numbers in victim-related context
- •Home addresses in victim-related context
- •Victim names (excluding 80+ public figures)
Request Removal
Victims and survivors can request additional information removal through our confidential portal. Attorney-coordinated requests accepted with a fillable PDF form.
Document Integrity
Every document in the EFTA releases is cryptographically verified to ensure it has not been modified after publication. We maintain an independent record of the original releases.
Our integrity monitoring system has detected anomalies in DOJ Dataset 9 (DS9): 866 document deletions and 401 modifications post-publication. These anomalies are publicly documented on our integrity dashboard.
View Integrity Dashboard →Editorial Standards
No editorializing. Epstein Exposed is a research and transparency tool. We do not draw conclusions about any individual's guilt or innocence. We present documentary evidence and allow users to evaluate it independently.
Source attribution. Every fact is attributed to its source document wherever possible. Derived summaries are treated as research aids, linked to the source files from which they derive, and should be verified against the underlying record.
Algorithmic connections. Connections between individuals are derived from documentary co-appearances, not editorial judgment. The system uses multi-word name matching with word boundary enforcement. Single-word names are never auto-linked.
Community review. Crowdsourced verification through our community forum. Members can flag errors, suggest corrections, and contribute research under moderated conditions.
Error correction. If you believe any information is inaccurate, please contact us at contact@epsteinexposed.com so we can review and correct the record. Corrections are logged in our public changelog.
Legal Framework
This project operates within established legal frameworks for public interest research and journalism.
First Amendment
Publication of truthful information from public records is protected speech. This project reports documented facts from official sources without editorial embellishment.
EFTA (H.R.4405)
The Epstein Files Transparency Act mandates DOJ release of case-related documents. Our database indexes these congressionally authorized releases.
Public Records
All documents are U.S. government public records, court filings, or materials released through official channels. 17 U.S.C. § 105 places federal government works in the public domain.
Victim Rights (CVRA)
We honor the Crime Victims’ Rights Act by proactively redacting victim PII and maintaining a confidential removal request process for survivors and their legal representatives.
Related Resources
Complete field definitions for every data type in the database
SHA-256 hash verification dashboard and deletion tracking
Content removal policy, victim redaction requests, DMCA process
Live counts, AI agent metrics, embedding coverage
27-endpoint REST API with OpenAPI specification
12 criminal, civil, and regulatory proceedings ($853M+)