Uses Tesseract.js to recognise text from scanned documents or images. Supports 12 languages. Processing is done locally — your file never leaves your device.
OCR (Optical Character Recognition) converts scanned PDFs and images into searchable, selectable text. A scanned PDF is a photograph of paper — you cannot search, copy, or edit the text. After OCR, the document has a real text layer: search with Ctrl+F, copy passages, and have content indexed by document management systems.
Test if you need OCR: try selecting text on a page. If you cannot select it, OCR is required. If text highlights normally, the document already has a text layer.
No. The original scan image is preserved exactly. OCR adds an invisible text layer — the document looks identical but text becomes selectable and searchable.
English, Malay, Tamil, Chinese (Simplified and Traditional), Arabic, Hindi, Japanese, Korean, French, German, and Spanish.
Printed text achieves 95%+ accuracy. Handwriting is more challenging — neat block capitals may work acceptably, but cursive handwriting typically needs manual correction after OCR.
Yes. All processing happens locally in your browser using Tesseract.js. Your file is never uploaded to any server.
Rifix OCR supports English, Malay, Tamil, Chinese (Simplified and Traditional), Arabic, Hindi, Japanese, Korean, French, German, and Spanish.
Printed text from a good quality scan achieves 95%+ accuracy. Handwritten text is less reliable. For best results, scan at 300 DPI or higher with black text on white background.
A single page typically takes 10-30 seconds depending on your device speed and language. The Tesseract OCR engine runs in a background Web Worker so your browser stays responsive.
No. OCR adds an invisible text layer under the original image. The document looks identical but text becomes searchable, selectable and copyable.