Skip to main content
Skip to content
Case File
kaggle-ho-017014House Oversight

Technical assessment of metadata and OCR quality in Google Books corpus

Technical assessment of metadata and OCR quality in Google Books corpus The document details internal quality metrics and filtering thresholds for Google Books metadata and OCR. It contains no references to influential actors, financial flows, or misconduct, offering no actionable investigative leads. Key insights: Metadata date errors reduced from 27% to 6.2% after filtering.; OCR quality scores assigned per volume (0‑100) using a PPM‑based model.; Different OCR quality thresholds applied by language (e.g., 80% for Latin alphabets).

Date
Unknown
Source
House Oversight
Reference
kaggle-ho-017014
Pages
1
Persons
5
Integrity
No Hash Available
Loading document viewer...

Ask AI About This Document

0Share
PostReddit
Review This Document

Forum Discussions

This document was digitized, indexed, and cross-referenced with 1,500+ persons in the Epstein files. 100% free, ad-free, and independent.

Support This ProjectSupported by 1,550+ people worldwide
Annotations powered by Hypothesis. Select any text on this page to annotate or highlight it.