What type of hardware are you using: Virtual machine
What YunoHost version are you running: 12.0.17
What app is this about: paperless-ngx v2.17.1
Describe your issue
I’ve installed p-ngx and the tesseract-ocr-*
packages, yet the app repeats the same messages about not finding the correct languages. I’ve tried to put them in the configuration as pol+eng+fra+deu
as instructed and pol,eng,fra,deu
as Tesseract seems to interpret this value.
Also there seems to not be a paperless-ngx
forum tag, as I cannot post without adding another tag.
Share relevant logs or error messages
root@yunohost:~# apt install tesseract-ocr-fra tesseract-ocr-deu tesseract-ocr-pol tesseract-ocr-eng
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
tesseract-ocr-fra is already the newest version (1:4.1.0-2).
tesseract-ocr-deu is already the newest version (1:4.1.0-2).
tesseract-ocr-pol is already the newest version (1:4.1.0-2).
tesseract-ocr-eng is already the newest version (1:4.1.0-2).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
[2025-07-14 20:08:04,796] [ERROR] [paperless.tasks] ConsumeTaskPlugin failed: ${dodument}.pdf: Error occurred while consuming document ${dodument}.pdf: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
Traceback (most recent call last):
File "/var/www/paperless-ngx/src/paperless_tesseract/parsers.py", line 384, in parse
ocrmypdf.ocr(**args)
File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/api.py", line 379, in ocr
check_options(options, plugin_manager)
File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 243, in check_options
_check_plugin_options(options, plugin_manager)
File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 238, in _check_plugin_options
check_options_languages(options, ocr_engine_languages)
File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/ocrmypdf/_validation.py", line 81, in check_options_languages
raise MissingDependencyError(msg)
ocrmypdf.exceptions.MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/www/paperless-ngx/venv/lib/python3.11/site-packages/asgiref/sync.py", line 327, in main_wrap
raise exc_info[1]
File "/var/www/paperless-ngx/src/documents/consumer.py", line 405, in run
document_parser.parse(self.working_copy, mime_type, self.filename)
File "/var/www/paperless-ngx/src/paperless_tesseract/parsers.py", line 447, in parse
raise ParseError(f"{e.__class__.__name__}: {e!s}") from e
documents.parsers.ParseError: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/var/www/paperless-ngx/src/documents/tasks.py", line 183, in consume_file
msg = plugin.run()
^^^^^^^^^^^^
File "/var/www/paperless-ngx/src/documents/consumer.py", line 437, in run
self._fail(
File "/var/www/paperless-ngx/src/documents/consumer.py", line 148, in _fail
raise ConsumerError(f"{self.filename}: {log_message or message}") from exception
documents.consumer.ConsumerError: ${dodument}.pdf: Error occurred while consuming document ${dodument}.pdf: MissingDependencyError: OCR engine does not have language data for the following requested languages:
pol,eng,fra,deu
Please install the appropriate language data for your OCR engine.
See the online documentation for instructions:
https://ocrmypdf.readthedocs.io/en/latest/languages.html
Note: most languages are identified by a 3-letter ISO 639-2 Code.
For example, English is 'eng', German is 'deu', and Spanish is 'spa'.
Simplified Chinese is 'chi_sim' and Traditional Chinese is 'chi_tra'.
[2025-07-14 20:16:52,516] [INFO] [_granian.asgi.serve] Stopping worker-1 runtime-1
[2025-07-14 20:16:52,584] [INFO] [_granian.asgi.serve] Stopping worker-1
[2025-07-14 20:17:09,236] [INFO] [paperless.asgi] [init] Paperless-ngx version: v2.17.1
[2025-07-14 20:17:09,238] [INFO] [_granian.asgi.serve] Started worker-1
[2025-07-14 20:17:09,238] [INFO] [_granian.asgi.serve] Started worker-1 runtime-1