Paperless OCR language

:uk: :fr:

My YunoHost server

Hardware: Old laptop or computer
YunoHost version: 11.2.14.1
I have access to my server : Through SSH | through the webadmin
Are you in a special context or did you perform some particular tweaking on your YunoHost instance ? : no
If your request is related to an app, specify its name and version: paperless-ngx v2.8.6~ynh1

Description of my issue

Hi ! I have an issue with Paperless-ngx.

I am trying to set the ocr language of OCR to french and english, or even just french. I followed the paperless documentation, and entered in the language field for the OCR setting “fra+eng”, and after this failed, I tried the simpler “fra”. Both led to the same error message further below.

In the Paperless doc, they seem to indicate that both french and english are installed by default. Is that different for the yunohost version ? If so, how can I add french support ?

[2024-06-06 20:12:05,805] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: example.pdf: Error occurred while consuming document example.pdf:
MissingDependencyError: OCR engine does not have language data for the following requested languages:
fra
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.

Thank you for your time, attention and potential help :heart: !

Version FR

Salut ! J’ai un problème avec Paperless-ngx

J’essaie de configurer le langage de l’OCR pour supporter le français et l’anglais, ou à défaut, juste l’anglais. J’ai suivit la documentation de paperless, en renseigné “fra+eng” dans le champ “language” de la page de configuration de l’OCR. J’ai réessayer avec le réglage plus simple “fra”. Dans un cas comme dans l’autre, j’obtiens l’éreur ci-dessus.

Dans la doc de Paperless, il est indiqué que le français et l’anglais sont installé par défaut. Est-ce différent pour la version Yunohost ? Si oui, comment installer le français ?

Merci pour votre temps, attention, et potentiellement votre aide :heart:

P.S : I could not find a tag for paperless, so I used the “others” tag :sweat_smile:

hi, i hope to remember correct.
in order to get ocr running in paperless it’s mandatory to install tesseract english.

sudo apt-update
sudo apt install tesseract-ocr-eng tesseract-oct-fra

activate ocr in paperless OCR setting “fra+eng”

1 Like

Thx !

Tried that, I get :

sudo apt install tesseract-ocr-eng tesseract-oct-fra
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package tesseract-oct-fra

uhh sorry a typo by me,

sudo apt install tesseract-ocr-fra

that should work :slight_smile:

1 Like

It seems to do the job ! Thank you so much for this simple solution. I guess i’ll have to update this manually via apt update - apt upgrade reguarly, since this is not included in the paperless script ?

no no, you don’t have to upgrade manually via apt, do updates over yunohost

2 Likes

You mean that the paperless updates will update the “tesseract-ocr-fra” package ?

in yunohost docs you can find:

Upgrades

From the webadmin

On the administraton panel, click on Upgrade the system. YunoHost will refresh the system package catalog as well as the application catalog, and display available upgrades.

Click on green upgrade buttons to upgrade the system and applications.

2 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.