P-ngx doesn't detect Tesseract languages

What type of hardware are you using: Virtual machine
What YunoHost version are you running: 12.0.17
What app is this about: paperless-ngx v2.17.1

Describe your issue

I’ve installed p-ngx and the tesseract-ocr-* packages, yet the app repeats the same messages about not finding the correct languages. I’ve tried to put them in the configuration as pol+eng+fra+deu as instructed and pol,eng,fra,deu as Tesseract seems to interpret this value.

Also there seems to not be a paperless-ngx forum tag, as I cannot post without adding another tag.

Share relevant logs or error messages

root@yunohost:~# apt install tesseract-ocr-fra tesseract-ocr-deu tesseract-ocr-pol tesseract-ocr-eng
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tesseract-ocr-fra is already the newest version (1:4.1.0-2).
tesseract-ocr-deu is already the newest version (1:4.1.0-2).
tesseract-ocr-pol is already the newest version (1:4.1.0-2).
tesseract-ocr-eng is already the newest version (1:4.1.0-2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
[2025-07-14 20:08:04,796] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: ${dodument}.pdf: Error occurred while consuming document ${dodument}.pdf: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
Traceback (most recent call last):
  File "/var/www/paperless-ngx/src/paperless_tesseract/parsers.py", line 384, in parse
    ocrmypdf.ocr(**args)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/api.py", line 379, in ocr
    check_options(options, plugin_manager)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 243, in check_options
    _check_plugin_options(options, plugin_manager)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 238, in _check_plugin_options
    check_options_languages(options, ocr_engine_languages)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 81, in check_options_languages
    raise MissingDependencyError(msg)
ocrmypdf.exceptions.MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/asgiref/sync.py", line 327, in main_wrap
    raise exc_info[1]
  File "/var/www/paperless-ngx/src/documents/consumer.py", line 405, in run
    document_parser.parse(self.working_copy, mime_type, self.filename)
  File "/var/www/paperless-ngx/src/paperless_tesseract/parsers.py", line 447, in parse
    raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
documents.parsers.ParseError: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/var/www/paperless-ngx/src/documents/tasks.py", line 183, in consume_file
    msg = plugin.run()
          ^^^^^^^^^^^^
  File "/var/www/paperless-ngx/src/documents/consumer.py", line 437, in run
    self._fail(
  File "/var/www/paperless-ngx/src/documents/consumer.py", line 148, in _fail
    raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: ${dodument}.pdf: Error occurred while consuming document ${dodument}.pdf: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
[2025-07-14 20:16:52,516] [INFO] [_granian.asgi.serve] Stopping worker-1 runtime-1
[2025-07-14 20:16:52,584] [INFO] [_granian.asgi.serve] Stopping worker-1
[2025-07-14 20:17:09,236] [INFO] [paperless.asgi] [init] Paperless-ngx version: v2.17.1
[2025-07-14 20:17:09,238] [INFO] [_granian.asgi.serve] Started worker-1
[2025-07-14 20:17:09,238] [INFO] [_granian.asgi.serve] Started worker-1 runtime-1

It seems it started working after changing the languages to “fra” and then back to the whole set.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.