Two Types of PDFs — Very Different Solutions

Before trying to extract text, identify which type of PDF you have:

Type 1: Searchable PDF (has a text layer)

Created from Word, Excel, InDesign, or any software that exports to PDF directly. You can select and copy text, and Ctrl+F finds words. To extract all text from this type:

  1. Go to rifixpdf.xyz/pdf2txt
  2. Open your PDF
  3. Click Extract Text
  4. All text from all pages appears in the panel
  5. Copy to clipboard or download as a .txt file

Type 2: Scanned PDF (image only)

Created by scanning paper, photographing a document, or saving screenshots as PDF. Text is not selectable — the whole page is an image. To extract text from this type, you need OCR:

  1. Go to rifixpdf.xyz/ocr
  2. Open your scanned PDF
  3. Select the document language
  4. Click Run OCR
  5. Wait for Tesseract.js to process each page
  6. Copy or download the extracted text
💡 How to Tell Which Type You Have

Open your PDF and try to select a word by clicking and dragging. If text highlights in blue, it's Type 1 (searchable). If nothing highlights — the whole page selects as a single image — it's Type 2 (scanned). Use pdf2txt for Type 1 and OCR for Type 2.

What Affects OCR Accuracy?

OCR accuracy on scanned documents depends on several factors:

Extracting Text from Specific Pages

For a long document where you only need text from certain pages:

  1. Use Split PDF to extract the pages you need into a separate file
  2. Then run that smaller file through PDF to Text or OCR

This is also faster — OCR on a 3-page extract is much quicker than processing a 100-page document.

What You Can Do with Extracted Text

Once you have the raw text, common uses include:

Extracting Text for CSV/Excel

If your PDF contains tables (financial statements, reports, data exports), use PDF to CSV instead of plain text extraction. It attempts to preserve the column and row structure, giving you a spreadsheet-ready output rather than a continuous block of text.

PDF TypeCorrect ToolOutput
Searchable PDF (text layer)PDF to Text.txt file, instant
Scanned PDF (image only)OCR Scan.txt file, takes 1–2 min
PDF with tables/dataPDF to CSV.csv file for Excel
NR
Nowsath Rifaya · Founder, Rifix PDF Editor
Operations professional based in Singapore. Built Rifix to solve a real work problem — handling confidential PDF documents without uploading them to unknown servers. Writes from direct experience using these tools daily.

Extract text from your PDF now

Free, private, browser-based — your file never leaves your device.

PDF to Text Free →