Skip to content
Sections
>> Trisquel >> Packages >> etiona >> graphics >> ocrmypdf
etiona  ] [  etiona-updates  ] [  nabia  ] [  aramo  ]
[ Source: ocrmypdf  ]

Package: ocrmypdf (6.1.2-1ubuntu1)

add an OCR text layer to PDF files

OCRmyPDF generates a searchable PDF/A file from a regular PDF containing only images, allowing it to be searched.

It uses the Tesseract OCR engine and so supports all the languages that Tesseract does.

Some other main features:

  * Places OCR text accurately below the image to ease copy / paste
  * Keeps the exact resolution of the original embedded images
  * When possible, inserts OCR information as a lossless operation
    without rendering vector information
  * Keeps file size about the same
  * If requested deskews and/or cleans the image before performing OCR
  * Validates input and output files
  * Provides debug mode to enable easy verification of the OCR results
  * Processes pages in parallel when more than one CPU core is
    available
  * Battle-tested on thousands of PDFs, a test suite and continuous
    integration.

Other Packages Related to ocrmypdf

  • depends
  • recommends
  • suggests
  • dep: ghostscript (>= 9.18~dfsg~)
    interpreter for the PostScript language and for PDF
  • dep: icc-profiles-free
    ICC color profiles for use with color profile aware software
  • dep: liblept5
    image processing library
  • dep: python3
    interactive high-level object-oriented language (default python3 version)
  • dep: python3-cffi-backend-api-max (>= 9729)
    Package not available
  • dep: python3-cffi-backend-api-min (<= 9729)
    Package not available
  • dep: python3-defusedxml
    XML bomb protection for Python stdlib modules (for Python 3)
  • dep: python3-img2pdf (>= 0.2.1)
    Lossless conversion of raster images to PDF (library)
  • dep: python3-pil
    Python Imaging Library (Python3)
  • dep: python3-pkg-resources
    Package Discovery and Resource Access using pkg_resources
  • dep: python3-pypdf2 (>= 1.26)
    Pure-Python library built as a PDF toolkit (Python 3)
  • dep: python3-reportlab
    ReportLab library to create PDF documents using Python3
  • dep: python3-ruffus (<< 2.6.3+dfsh)
    Python3 computation pipeline library widely used in bioinformatics
    dep: python3-ruffus (>= 2.6.3+dfsg)
  • dep: qpdf
    tools for transforming and inspecting PDF files
  • dep: tesseract-ocr
    Tesseract command line OCR tool
  • dep: zlib1g
    compression library - runtime
  • rec: unpaper
    post-processing tool for scanned pages
  • sug: img2pdf
    Lossless conversion of raster images to PDF
  • sug: ocrmypdf-doc
    add an OCR text layer to PDF files - documentation
  • sug: python-watchdog
    Python API and shell utilities to monitor file system events - Python 2.x

Download ocrmypdf

Download for all available architectures
Architecture Package Size Installed Size Files
all 72.0 kB348 kB [list of files]