Skip to content
Sections
>> Trisquel >> Packages >> aramo >> graphics >> ocrmypdf
etiona  ] [  etiona-updates  ] [  nabia  ] [  aramo  ]
[ Source: ocrmypdf  ]

Package: ocrmypdf (13.4.0+dfsg-1)

add an OCR text layer to PDF files

OCRmyPDF generates a searchable PDF/A file from a regular PDF containing only images, allowing it to be searched.

It uses the Tesseract OCR engine and so supports all the languages that Tesseract does.

Some other main features:

  * Places OCR text accurately below the image to ease copy / paste
  * Keeps the exact resolution of the original embedded images
  * When possible, inserts OCR information as a lossless operation
    without rendering vector information
  * Keeps file size about the same
  * If requested deskews and/or cleans the image before performing OCR
  * Validates input and output files
  * Provides debug mode to enable easy verification of the OCR results
  * Processes pages in parallel when more than one CPU core is
    available
  * Battle-tested on thousands of PDFs, a test suite and continuous
    integration.

Other Packages Related to ocrmypdf

  • depends
  • recommends
  • suggests
  • dep: ghostscript (>= 9.18~dfsg~)
    interpreter for the PostScript language and for PDF
  • dep: icc-profiles-free
    ICC color profiles for use with color profile aware software
  • dep: python3
    interactive high-level object-oriented language (default python3 version)
  • dep: python3-coloredlogs
    colored terminal output for Python 3's logging module
  • dep: python3-img2pdf (>= 0.3.0)
    Lossless conversion of raster images to PDF (library)
  • dep: python3-importlib-metadata
    library to access the metadata for a Python package - Python 3.x
    or python3 (>> 3.8)
    interactive high-level object-oriented language (default python3 version)
  • dep: python3-importlib-resources
    Read resources from Python packages
    or python3 (>> 3.9)
    interactive high-level object-oriented language (default python3 version)
  • dep: python3-packaging
    core utilities for python3 packages
  • dep: python3-pdfminer (>= 20181108+dfsg-3)
    PDF parser and analyser (Python3)
  • dep: python3-pikepdf (>= 5.0.1)
    Python library to read and write PDFs with QPDF
  • dep: python3-pil
    Python Imaging Library (Python3)
  • dep: python3-pkg-resources
    Package Discovery and Resource Access using pkg_resources
  • dep: python3-pluggy
    plugin and hook calling mechanisms for Python - 3.x
  • dep: python3-reportlab
    ReportLab library to create PDF documents using Python3
  • dep: python3-tqdm
    fast, extensible progress bar for Python 3 and CLI tool
  • dep: tesseract-ocr (>= 4.0.0)
    Tesseract command line OCR tool
  • dep: zlib1g
    compression library - runtime
  • rec: pngquant
    PNG (Portable Network Graphics) image optimising utility
  • rec: unpaper
    post-processing tool for scanned pages
  • sug: img2pdf
    Lossless conversion of raster images to PDF
  • sug: ocrmypdf-doc
    add an OCR text layer to PDF files - documentation
  • sug: python-watchdog
    Package not available

Download ocrmypdf

Download for all available architectures
Architecture Package Size Installed Size Files
all 112.1 kB548 kB [list of files]