Two Types of PDFs — Very Different Solutions
Before trying to extract text, identify which type of PDF you have:
Type 1: Searchable PDF (has a text layer)
Created from Word, Excel, InDesign, or any software that exports to PDF directly. You can select and copy text, and Ctrl+F finds words. To extract all text from this type:
- Go to rifixpdf.xyz/pdf2txt
- Open your PDF
- Click Extract Text
- All text from all pages appears in the panel
- Copy to clipboard or download as a .txt file
Type 2: Scanned PDF (image only)
Created by scanning paper, photographing a document, or saving screenshots as PDF. Text is not selectable — the whole page is an image. To extract text from this type, you need OCR:
- Go to rifixpdf.xyz/ocr
- Open your scanned PDF
- Select the document language
- Click Run OCR
- Wait for Tesseract.js to process each page
- Copy or download the extracted text
Open your PDF and try to select a word by clicking and dragging. If text highlights in blue, it's Type 1 (searchable). If nothing highlights — the whole page selects as a single image — it's Type 2 (scanned). Use pdf2txt for Type 1 and OCR for Type 2.
What Affects OCR Accuracy?
OCR accuracy on scanned documents depends on several factors:
- Scan quality — 300 DPI produces reliable results; 150 DPI is marginal; below 100 DPI often fails. Phone camera photos of documents work reasonably well in good light.
- Page straightness — skewed or warped pages reduce accuracy significantly. Use Scan Clean to straighten pages before OCR.
- Contrast — faded ink, pencil writing, or yellowed paper reduces accuracy. High contrast black text on white is ideal.
- Language selection — Rifix OCR supports English, Malay, Tamil, Chinese, Arabic, Hindi, Japanese, Korean, French, German, and Spanish. Always select the correct language.
- Font type — standard printed fonts achieve 95%+ accuracy. Decorative or handwritten text is less reliable.
Extracting Text from Specific Pages
For a long document where you only need text from certain pages:
- Use Split PDF to extract the pages you need into a separate file
- Then run that smaller file through PDF to Text or OCR
This is also faster — OCR on a 3-page extract is much quicker than processing a 100-page document.
What You Can Do with Extracted Text
Once you have the raw text, common uses include:
- Searching — paste into a text editor and use Ctrl+F to find specific information
- Editing — clean up the text and paste back into a Word document or new PDF
- Data extraction — pull out names, dates, invoice numbers, or other structured data
- Translation — paste into Google Translate or DeepL for a quick translation
- Accessibility — convert to plain text for screen readers or text-to-speech software
Extracting Text for CSV/Excel
If your PDF contains tables (financial statements, reports, data exports), use PDF to CSV instead of plain text extraction. It attempts to preserve the column and row structure, giving you a spreadsheet-ready output rather than a continuous block of text.
| PDF Type | Correct Tool | Output |
|---|---|---|
| Searchable PDF (text layer) | PDF to Text | .txt file, instant |
| Scanned PDF (image only) | OCR Scan | .txt file, takes 1–2 min |
| PDF with tables/data | PDF to CSV | .csv file for Excel |
Extract text from your PDF now
Free, private, browser-based — your file never leaves your device.
PDF to Text Free →