Skip to content
Sections
>> Trisquel >> Packages >> nabia >> graphics >> ocrmypdf
etiona  ] [  etiona-updates  ] [  nabia  ] [  aramo  ]
[ Source: ocrmypdf  ]

Package: ocrmypdf (9.6.0+dfsg-1)

add an OCR text layer to PDF files

OCRmyPDF generates a searchable PDF/A file from a regular PDF containing only images, allowing it to be searched.

It uses the Tesseract OCR engine and so supports all the languages that Tesseract does.

Some other main features:

  * Places OCR text accurately below the image to ease copy / paste
  * Keeps the exact resolution of the original embedded images
  * When possible, inserts OCR information as a lossless operation
    without rendering vector information
  * Keeps file size about the same
  * If requested deskews and/or cleans the image before performing OCR
  * Validates input and output files
  * Provides debug mode to enable easy verification of the OCR results
  * Processes pages in parallel when more than one CPU core is
    available
  * Battle-tested on thousands of PDFs, a test suite and continuous
    integration.

Other Packages Related to ocrmypdf

  • depends
  • recommends
  • suggests
  • dep: ghostscript (>= 9.18~dfsg~)
    interpreter for the PostScript language and for PDF
  • dep: icc-profiles-free
    ICC color profiles for use with color profile aware software
  • dep: liblept5
    image processing library
  • dep: python3
    interactive high-level object-oriented language (default python3 version)
  • dep: python3-cffi-backend-api-max (>= 9729)
    Package not available
  • dep: python3-cffi-backend-api-min (<= 9729)
    Package not available
  • dep: python3-chardet
    universal character encoding detector for Python3
  • dep: python3-img2pdf (>= 0.3.0)
    Lossless conversion of raster images to PDF (library)
  • dep: python3-pdfminer (>= 20181108+dfsg-3)
    PDF parser and analyser (Python3)
  • dep: python3-pikepdf (>= 1.7.0)
    Python library to read and write PDFs with QPDF
  • dep: python3-pil
    Python Imaging Library (Python3)
  • dep: python3-pkg-resources
    Package Discovery and Resource Access using pkg_resources
  • dep: python3-reportlab
    ReportLab library to create PDF documents using Python3
  • dep: python3-tqdm
    fast, extensible progress bar for Python 3 and CLI tool
  • dep: tesseract-ocr (>= 4.0.0)
    Tesseract command line OCR tool
  • dep: zlib1g
    compression library - runtime
  • rec: pngquant
    PNG (Portable Network Graphics) image optimising utility
  • rec: unpaper
    post-processing tool for scanned pages
  • sug: img2pdf
    Lossless conversion of raster images to PDF
  • sug: ocrmypdf-doc
    add an OCR text layer to PDF files - documentation
  • sug: python-watchdog
    Package not available

Download ocrmypdf

Download for all available architectures
Architecture Package Size Installed Size Files
all 104.7 kB518 kB [list of files]