How to Extract Text from a PDF

Extracting text from a PDF sounds simple — but there are actually two very different situations: PDFs that already have a text layer (where Ctrl+F works), and scanned PDFs that are purely images. Both are solvable for free in your browser, using different approaches.

Two Types of PDFs — Very Different Solutions

Before trying to extract text, identify which type of PDF you have:

Type 1: Searchable PDF (has a text layer)

Created from Word, Excel, InDesign, or any software that exports to PDF directly. You can select and copy text, and Ctrl+F finds words. To extract all text from this type:

Go to rifixpdf.xyz/pdf2txt
Open your PDF
Click Extract Text
All text from all pages appears in the panel
Copy to clipboard or download as a .txt file

Type 2: Scanned PDF (image only)

Created by scanning paper, photographing a document, or saving screenshots as PDF. Text is not selectable — the whole page is an image. To extract text from this type, you need OCR:

Go to rifixpdf.xyz/ocr
Open your scanned PDF
Select the document language
Click Run OCR
Wait for Tesseract.js to process each page
Copy or download the extracted text

💡 How to Tell Which Type You Have

Open your PDF and try to select a word by clicking and dragging. If text highlights in blue, it's Type 1 (searchable). If nothing highlights — the whole page selects as a single image — it's Type 2 (scanned). Use pdf2txt for Type 1 and OCR for Type 2.

What Affects OCR Accuracy?

OCR accuracy on scanned documents depends on several factors:

Scan quality — 300 DPI produces reliable results; 150 DPI is marginal; below 100 DPI often fails. Phone camera photos of documents work reasonably well in good light.
Page straightness — skewed or warped pages reduce accuracy significantly. Use Scan Clean to straighten pages before OCR.
Contrast — faded ink, pencil writing, or yellowed paper reduces accuracy. High contrast black text on white is ideal.
Language selection — Rifix OCR supports English, Malay, Tamil, Chinese, Arabic, Hindi, Japanese, Korean, French, German, and Spanish. Always select the correct language.
Font type — standard printed fonts achieve 95%+ accuracy. Decorative or handwritten text is less reliable.

Extracting Text from Specific Pages

For a long document where you only need text from certain pages:

Use Split PDF to extract the pages you need into a separate file
Then run that smaller file through PDF to Text or OCR

This is also faster — OCR on a 3-page extract is much quicker than processing a 100-page document.

What You Can Do with Extracted Text

Once you have the raw text, common uses include:

Searching — paste into a text editor and use Ctrl+F to find specific information
Editing — clean up the text and paste back into a Word document or new PDF
Data extraction — pull out names, dates, invoice numbers, or other structured data
Translation — paste into Google Translate or DeepL for a quick translation
Accessibility — convert to plain text for screen readers or text-to-speech software

Extracting Text for CSV/Excel

If your PDF contains tables (financial statements, reports, data exports), use PDF to CSV instead of plain text extraction. It attempts to preserve the column and row structure, giving you a spreadsheet-ready output rather than a continuous block of text.

PDF Type	Correct Tool	Output
Searchable PDF (text layer)	PDF to Text	.txt file, instant
Scanned PDF (image only)	OCR Scan	.txt file, takes 1–2 min
PDF with tables/data	PDF to CSV	.csv file for Excel

Nowsath Rifaya · Founder, Rifix PDF Editor

Operations professional based in Singapore. Built Rifix to solve a real work problem — handling confidential PDF documents without uploading them to unknown servers. Writes from direct experience using these tools daily.

Extract text from your PDF now

Free, private, browser-based — your file never leaves your device.

PDF to Text Free →