Skip to main content
Skip to content
Methodology

Sourcing Methodology

How we source, verify, classify, and protect the information in the Epstein Exposed database. Every fact is traceable to a source document. Every connection is algorithmically derived. Every victim is protected.

Last updated: March 1, 2026

2,129,486
Documents
1,480
Persons
3,615
Flights
9,961
Emails
16,924
PII Redactions
1.38M
Hashes Verified

Source Archives

All data originates from publicly available government records, court filings, and FOIA responses. We do not use leaked, stolen, or illegally obtained materials. Every document is traceable to its releasing authority.

DOJ EFTA Releases (DS1–DS12)

28,942 PDFs

Datasets released by the U.S. Department of Justice under the Epstein Files Transparency Act (H.R.4405). Twelve sequential releases from federal archives including prosecution files, grand jury exhibits, and investigative materials.

Source: DOJ.gov EFTA release portal

Court Filings (SDNY & SDFL)

2,130+ filings

Unsealed court documents from Giuffre v. Maxwell (15-cv-7433), USA v. Maxwell (20-cr-330), USA v. Epstein (19-cr-490), and related civil litigation in the Southern Districts of New York and Florida.

Source: PACER / CourtListener

FBI Investigative Reports

100+ FD-302s

FBI Form FD-302 witness interview memoranda and investigative reports obtained through FOIA requests or released as part of court proceedings. Includes Palm Beach and SDNY investigation files.

Source: FBI FOIA Vault / Court exhibits

Flight Manifests

1,708 flights

Aircraft manifests from Epstein’s fleet: Boeing 727-31 (N908JE “Lolita Express”), Gulfstream II (N171JE), Cessna 421, and helicopter. Entered as court exhibits with FAA tail number verification.

Source: Court exhibits / FAA registry

Black Book

1,571 entries

Epstein’s personal contact directory obtained by law enforcement during the 2005 Palm Beach investigation. Entered into court records as Exhibit 80. Presence indicates recorded contact information, not wrongdoing.

Source: Palm Beach PD / Court Exhibit 80

Financial Records

446 flows traced

Wire transfers, bank statements, shell company filings, and financial flow analyses. Includes Deutsche Bank and JPMorgan compliance records, USVI estate proceedings, and forensic accounting of $5.3B+ in traced flows.

Source: Court filings / SEC / FinCEN

Epstein-Docs Archive

8,186 documents

Community-maintained GitHub archive with structured document summaries and metadata. Cross-referenced against official releases for accuracy. Supplements DOJ datasets with additional context and indexing support.

Source: epstein-docs.github.io (CC BY 4.0)

Email Correspondence

9,900+ emails

Internal communications from the Epstein operation including scheduling emails, financial instructions, travel arrangements, and correspondence between associates. Person-linked via name matching.

Source: Court exhibits / FOIA releases

Department of Justice

Official DOJ releases, indictments, plea agreements, non-prosecution agreements, and sentencing documents. Includes the controversial 2007 NPA and 2019 SDNY indictment.

Source: DOJ.gov / PACER

House Oversight Committee

Documents released by the House Oversight Committee investigations, including internal correspondence, agency communications, and testimony transcripts related to prosecution oversight.

Source: oversight.house.gov

State Court Records

Proceedings from USVI v. Estate of Jeffrey Epstein, Florida state investigations, and New Mexico regulatory actions. Includes the $105M USVI government settlement.

Source: State court systems

FOIA Responses

Freedom of Information Act responses from the FBI, DOJ, Secret Service, FAA, and other federal agencies. Requests ongoing with periodic new releases.

Source: Federal agency FOIA offices

Verification Framework

Every piece of information is classified into one of five verification tiers based on its source reliability. Higher tiers carry greater evidentiary weight. Trust levels are displayed throughout the interface so users can assess information quality at a glance.

1

Court Document

Tier 1

Primary sources filed with federal and state courts. Highest reliability. Includes unsealed filings, judicial rulings, exhibits admitted into evidence, and court-ordered releases.

Examples: Indictments, depositions, court orders, admitted exhibits, sealed-then-unsealed filings
Verification: Authenticated by court filing systems (PACER, ECF). Document numbers and docket entries verified.
2

Official Government Record

Tier 2

Records issued by government agencies in their official capacity. Authenticated by the issuing body. Includes DOJ releases, FBI reports, FAA records, and legislative documents.

Examples: FBI FD-302 reports, DOJ press releases, FAA flight records, Congressional testimony
Verification: Verified against agency publication records. Hash-verified where digital originals available.
3

Sworn Testimony

Tier 3

Depositions and trial testimony given under oath, subject to perjury penalties. Represents one party’s account of events. Cross-referenced against other testimony and documentary evidence where possible.

Examples: Deposition transcripts, trial testimony, sworn declarations, affidavits
Verification: Court reporter certification verified. Transcript accuracy confirmed against audio where available.
4

Investigative Source

Tier 4

Findings from credentialed journalists and established research institutions. Cross-referenced against primary sources where possible but not independently verified by this project.

Examples: Miami Herald investigation, New York Times reporting, academic research papers
Verification: Source publication and author credentials verified. Claims flagged when not corroborated by primary documents.
5

Unverified

Tier 5

Uncorroborated claims, anonymous tips, or single-source information. Included for completeness with clear labeling. Users should treat this information with appropriate skepticism.

Examples: Anonymous submissions, social media claims, unattributed allegations
Verification: Clearly labeled as unverified. Not used to establish connections or trust levels without corroboration.

Data Processing Pipeline

Raw documents undergo a nine-stage processing pipeline before entering the searchable database. Each stage is auditable, and processing metadata is retained for every document.

01

Document Ingestion

PDFs downloaded from DOJ EFTA release servers and verified against published dataset manifests. File integrity confirmed via size and count matching against official DOJ dataset descriptions.

28,942 PDFs
02

SHA-256 Hash Verification

Cryptographic hash computed for every ingested document and stored in the document_hashes table. Enables detection of any post-publication modification or deletion by the releasing agency.

1.38M hashes
03

OCR Text Extraction

Optical character recognition applied to scanned documents using Tesseract with automated cleanup and validation passes. Extracted text is stored in full-text search indexes for instant retrieval.

2.01M records
04

Structured Summarization

Structured summaries are generated during ingestion with document type classification, key person identification, and source citations. Derived summaries are treated as research aids and should be checked against the underlying source files.

8,186 summaries
05

Named Entity Recognition

Multi-word person name matching with word boundary enforcement. Single-word names are never auto-linked to prevent false positives. Minimum 5-character threshold for entity candidates.

Safety-first NER
06

Cross-Reference Linking

Person-to-document relationships established via the document_persons table. Each link records the matching method and confidence level for full auditability.

2.44M+ links
07

Semantic Embedding

Document text chunked and embedded into high-dimensional vectors for semantic search. HNSW indexing enables sub-second similarity queries across the full corpus.

2.67M vectors
08

Integrity Monitoring

Continuous re-verification of document hashes against stored values. Automated detection of DOJ deletions and modifications with public reporting of anomalies.

Daily checks
09

PII Redaction

Automated detection and redaction of victim personally identifiable information: Social Security numbers, dates of birth, phone numbers, and addresses in victim context. 80+ public figures excluded from name redaction.

16,924 redactions

Person Classification

Every person in the database is assigned a category based on their primary public role and a trust level based on their legal status in proceedings related to the Epstein case.

Categories

Politician

Elected officials, government appointees, diplomats, political operatives

Business

Executives, entrepreneurs, financial professionals, hedge fund managers

Royalty

Members of royal families and aristocratic lineages

Celebrity

Entertainers, actors, musicians, media personalities

Associate

Known associates, staff members, personal contacts, recruiters

Legal

Attorneys, judges, prosecutors, law enforcement

Academic

Professors, researchers, university administrators, scientists

Socialite

High-society figures, philanthropists

Military / Intelligence

Military officers, intelligence operatives, national security officials

Other

Persons not fitting standard categories

Trust Levels

Each person is assigned a trust level based on their status in legal proceedings. These levels help users understand the context of each individual's inclusion in the database.

Convicted

Found guilty in a court of law for offenses related to the Epstein case. Conviction is a matter of public record.

Charged

Formally charged or indicted by a grand jury. Case may be pending, resolved through plea, or dismissed. Charging documents are public record.

Alleged

Named in civil lawsuits, victim allegations, or sworn testimony but not criminally charged. Allegations are claims, not findings of fact.

Appearance in Records

Appears in flight logs, contact books, financial records, or other documentary evidence. This does not imply wrongdoing or knowledge of criminal activity.

Mentioned

Referenced in documents, depositions, media reporting, or secondhand accounts. Included for research completeness. Mention alone carries no implication.

Evidence Attribution

Source badges appear on person profiles and search results to indicate what types of primary evidence document their connection to the case. Badges are assigned algorithmically based on verified document linkages.

Flight Logs

Person appears on one or more aircraft manifests as a passenger, pilot, or crew member.

Black Book

Person’s contact information is recorded in Epstein’s personal directory (Court Exhibit 80).

Court Filing

Person is named in one or more court documents, including filings, exhibits, or transcripts.

DOJ Docs

Person is referenced in Department of Justice materials, including EFTA releases.

Financial

Person or associated entity appears in financial flow records, wire transfers, or bank compliance documents.

Mention Only

Person is referenced in reporting, secondhand accounts, or unsworn statements only. No primary-source documentary evidence.

Connection Methodology

Connections between persons are derived algorithmically from co-appearances in documentary evidence. The system counts shared flights, co-mentions in documents, and direct references in depositions to establish relationship strength.

Strong

Direct documentary evidence of a substantive relationship. Multiple independent sources confirm the connection.

Threshold: 3+ shared documents, co-defendant status, family, employer/employee, or direct victim/perpetrator relationship
Moderate

Circumstantial evidence of contact or proximity. Connection documented but nature of relationship may be unclear.

Threshold: Shared flights, phone contacts in black book, 1–2 shared documents, or mentioned together in depositions
Weak

Tangential or single-source connections. Included for completeness but should not be relied upon for conclusions.

Threshold: Same social circles, one-time mentions, single-source claims, or disputed/unconfirmed links
Important: A connection between two persons does not imply a personal relationship, shared culpability, or knowledge of criminal activity. Connections are documentary co-appearances only.

Victim Privacy & PII Redaction

Protecting victim and survivor privacy is a core obligation of this project. We proactively redact personally identifiable information and maintain a confidential process for additional removal requests.

Proactive Redaction Program

16,924
PII Redactions
1,567
Documents
3
Phases

Phase 1: 100 FBI FD-302 interview memos — 624 redactions of victim SSNs, dates of birth, and contact information.

Phase 2: 927 critical documents — 13,495 redactions including victim names in ALL CAPS context, addresses, and phone numbers.

Phase 3: 539 SSN-classified documents — 2,726 redactions with 7,765 SSN-like patterns analyzed and 6,968 false positives correctly excluded.

What We Redact

  • Social Security numbers (full and partial)
  • Dates of birth for victims and minors
  • Phone numbers in victim-related context
  • Home addresses in victim-related context
  • Victim names (excluding 80+ public figures)

Request Removal

Victims and survivors can request additional information removal through our confidential portal. Attorney-coordinated requests accepted with a fillable PDF form.

Document Integrity

Every document in the EFTA releases is cryptographically verified to ensure it has not been modified after publication. We maintain an independent record of the original releases.

Algorithm
SHA-256
Hashes Stored
1.38M
Monitoring
Continuous

Our integrity monitoring system has detected anomalies in DOJ Dataset 9 (DS9): 866 document deletions and 401 modifications post-publication. These anomalies are publicly documented on our integrity dashboard.

View Integrity Dashboard →

Editorial Standards

No editorializing. Epstein Exposed is a research and transparency tool. We do not draw conclusions about any individual's guilt or innocence. We present documentary evidence and allow users to evaluate it independently.

Source attribution. Every fact is attributed to its source document wherever possible. Derived summaries are treated as research aids, linked to the source files from which they derive, and should be verified against the underlying record.

Algorithmic connections. Connections between individuals are derived from documentary co-appearances, not editorial judgment. The system uses multi-word name matching with word boundary enforcement. Single-word names are never auto-linked.

Community review. Crowdsourced verification through our community forum. Members can flag errors, suggest corrections, and contribute research under moderated conditions.

Error correction. If you believe any information is inaccurate, please contact us at contact@epsteinexposed.com so we can review and correct the record. Corrections are logged in our public changelog.

Related Resources