How to extract data from a scanned PDF (even unreadable ones)
A scanned PDF is just an image — you can't copy-paste it. Here's how to extract data using OCR and AI, even from old documents.
You try to copy text from a PDF and... nothing. The cursor slides over the document without selecting anything. It's a scanned PDF: not text, just an image. Result: impossible to search, copy, or analyze. Here's how to get around that.
Why some PDFs are "locked"
There are two types of PDFs:
Native PDF: created directly from Word, Excel, or software — the text is encoded, searchable, and copyable.
Scanned PDF: a photo of a paper document. There's no encoded text, just pixels. You can't select anything.
Scanned PDFs are common for:
- Old digitized contracts
- Notarial deeds
- Scanned supplier invoices
- Old administrative files
- Documents sent by mail and then digitized
What OCR is and how it works
OCR (Optical Character Recognition) is the technology that converts an image of text into real text. It "reads" the pixels of the scan like a human would read a page, then generates plain text from what it sees.
Modern OCR engines achieve 95-99% accuracy on well-scanned documents. On old, damaged, or handwritten documents, accuracy varies.
Options for extracting data from a scanned PDF
Option 1: Adobe Acrobat (paid)
Adobe offers a "Recognize Text" function in Acrobat Pro. It's effective, but costs ~$25/month and requires installing heavy software.
Option 2: Google Drive (free, limited)
You can open a PDF in Google Drive → right-click → "Open with Google Docs". Google applies OCR and generates a text document. Works on simple documents, less well on tables or complex layouts.
Option 3: Specialized AI tools
Tools like PDFFocus combine OCR + AI: they recognize text from the scan AND let you ask questions directly on the document. This is the most powerful method because:
- OCR is applied automatically without any action from you
- You don't need to copy-paste recognized text — you can query directly
- Tables and structures are better preserved
- Multi-page PDFs are handled
How to get the best OCR results
Result quality depends heavily on scan quality. Key factors:
Resolution: minimum 300 DPI for decent OCR. Below that, characters are too blurry.
Orientation: a skewed document gives degraded results. Most tools automatically correct tilt.
Contrast: a very faint or too dark document reduces accuracy. Adjust levels before scanning if possible.
Font: standard fonts (Times, Arial, Helvetica) are well recognized. Handwritten or decorative fonts much less so.
Language: specify the document language if the tool allows it — this improves accuracy.
What you can do once the text is extracted
Once OCR is applied, you can:
- Search the document (Ctrl+F)
- Copy excerpts
- Ask questions via AI about the content
- Extract specific data (amounts, names, dates)
- Compare multiple documents
Practical example: you have 30 scanned supplier invoices and want the total for all March invoices. Upload them, ask the AI to calculate — that takes 2 minutes instead of 1 hour.
OCR limitations to know
- Handwritten documents: standard OCR often fails. Specialized solutions exist but are less accessible.
- Complex tables: columns can get mixed up. Always verify extracted data.
- Protected PDFs: some PDFs have protections that block even OCR.
- Very old documents: faded ink, yellowed paper = less reliable OCR.
Always verify extracted numbers on financial documents. OCR can confuse "8" and "0", or commas and periods depending on the country.
Key takeaways
- If you can't select text = scanned PDF = OCR needed
- Google Drive works for simple, short documents
- For complex or professional documents, a dedicated OCR + AI tool gives much better results
- Always verify extracted data, especially numbers
If you have scanned PDFs piling up in your folders and regularly need to extract information from them, that's exactly the use case PDFFocus was designed for.
Related tool
Extract Data from a PDF with AI
Try PDFFocus for free
Drop your PDF, ask your questions. No install, no credit card.
Start your 7-day free trialNo credit card · Cancel in 1 click