PDF Tools
PDF Text Extractor
Extract text from scanned PDFs using on-device OCR. Supports multiple languages. Your files never leave your device.
Languages
Language data is downloaded on first use and cached by your browser.
PDF files
Initialising…
The text PDF uses a basic Latin font — non-Latin characters will appear as ?. Use Copy all or Download .md to preserve the full text.
Your files never leave your device.
Powered by Tesseract OCR via Tesseract.js (Apache 2.0) · PDF.js (Apache 2.0) · pdf-lib (MIT).
How it works
- 1
Select the language(s) of text in your PDF, then upload the file.
- 2
OCR runs locally in your browser using Tesseract.js and WebAssembly — no data leaves your device.
- 3
Copy the extracted text, or download it as a clean PDF or Markdown file.
Frequently asked questions
Which languages are supported? ▾
English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese (Simplified), and Japanese.
Does OCR work on non-scanned PDFs? ▾
The tool first attempts to extract embedded text directly. OCR is only applied to pages that contain scanned or image-based content without selectable text.
Is my data private? ▾
Yes — OCR runs entirely in your browser using WebAssembly. No files or extracted text are sent to any server at any point.
How accurate is the OCR? ▾
Accuracy depends on scan quality. Clean, high-resolution scans typically yield very good results. Handwriting, unusual fonts, or low-quality scans may reduce accuracy.