P-ngx doesn't detect Tesseract languages

Dzwiedziu · July 14, 2025, 7:31pm

What type of hardware are you using: Virtual machine
What YunoHost version are you running: 12.0.17
What app is this about: paperless-ngx v2.17.1

Describe your issue

I’ve installed p-ngx and the tesseract-ocr-* packages, yet the app repeats the same messages about not finding the correct languages. I’ve tried to put them in the configuration as pol+eng+fra+deu as instructed and pol,eng,fra,deu as Tesseract seems to interpret this value.

Also there seems to not be a paperless-ngx forum tag, as I cannot post without adding another tag.

Share relevant logs or error messages

root@yunohost:~# apt install tesseract-ocr-fra tesseract-ocr-deu tesseract-ocr-pol tesseract-ocr-eng
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tesseract-ocr-fra is already the newest version (1:4.1.0-2).
tesseract-ocr-deu is already the newest version (1:4.1.0-2).
tesseract-ocr-pol is already the newest version (1:4.1.0-2).
tesseract-ocr-eng is already the newest version (1:4.1.0-2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

[2025-07-14 20:08:04,796] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: ${dodument}.pdf: Error occurred while consuming document ${dodument}.pdf: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
Traceback (most recent call last):
  File "/var/www/paperless-ngx/src/paperless_tesseract/parsers.py", line 384, in parse
    ocrmypdf.ocr(**args)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/api.py", line 379, in ocr
    check_options(options, plugin_manager)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 243, in check_options
    _check_plugin_options(options, plugin_manager)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 238, in _check_plugin_options
    check_options_languages(options, ocr_engine_languages)
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 81, in check_options_languages
    raise MissingDependencyError(msg)
ocrmypdf.exceptions.MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/asgiref/sync.py", line 327, in main_wrap
    raise exc_info[1]
  File "/var/www/paperless-ngx/src/documents/consumer.py", line 405, in run
    document_parser.parse(self.working_copy, mime_type, self.filename)
  File "/var/www/paperless-ngx/src/paperless_tesseract/parsers.py", line 447, in parse
    raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
documents.parsers.ParseError: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/var/www/paperless-ngx/src/documents/tasks.py", line 183, in consume_file
    msg = plugin.run()
          ^^^^^^^^^^^^
  File "/var/www/paperless-ngx/src/documents/consumer.py", line 437, in run
    self._fail(
  File "/var/www/paperless-ngx/src/documents/consumer.py", line 148, in _fail
    raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: ${dodument}.pdf: Error occurred while consuming document ${dodument}.pdf: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
    https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
[2025-07-14 20:16:52,516] [INFO] [_granian.asgi.serve] Stopping worker-1 runtime-1
[2025-07-14 20:16:52,584] [INFO] [_granian.asgi.serve] Stopping worker-1
[2025-07-14 20:17:09,236] [INFO] [paperless.asgi] [init] Paperless-ngx version: v2.17.1
[2025-07-14 20:17:09,238] [INFO] [_granian.asgi.serve] Started worker-1
[2025-07-14 20:17:09,238] [INFO] [_granian.asgi.serve] Started worker-1 runtime-1

Dzwiedziu · July 16, 2025, 8:55pm

It seems it started working after changing the languages to “fra” and then back to the whole set.

system · July 31, 2025, 8:56pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.