All Tools
🔤

PDF Tools

PDF Text Extractor

Extract text from scanned PDFs using on-device OCR. Supports multiple languages. Your files never leave your device.

Languages

Language data is downloaded on first use and cached by your browser.

📁 Click to upload a scanned PDF or drag & drop
PDF files

Your files never leave your device.

Powered by Tesseract OCR via Tesseract.js (Apache 2.0) · PDF.js (Apache 2.0) · pdf-lib (MIT).

How it works

  1. 1

    Select the language(s) of text in your PDF, then upload the file.

  2. 2

    OCR runs locally in your browser using Tesseract.js and WebAssembly — no data leaves your device.

  3. 3

    Copy the extracted text, or download it as a clean PDF or Markdown file.

Frequently asked questions

Which languages are supported?

English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese (Simplified), and Japanese.

Does OCR work on non-scanned PDFs?

The tool first attempts to extract embedded text directly. OCR is only applied to pages that contain scanned or image-based content without selectable text.

Is my data private?

Yes — OCR runs entirely in your browser using WebAssembly. No files or extracted text are sent to any server at any point.

How accurate is the OCR?

Accuracy depends on scan quality. Clean, high-resolution scans typically yield very good results. Handwriting, unusual fonts, or low-quality scans may reduce accuracy.