v1

2026-04-18 02:53:36 +02:00 · 2025-12-05 10:55:37 +01:00
commit 396e290394
1423 changed files with 825479 additions and 0 deletions
--- a/docs/advanced_usage.md
+++ b/docs/advanced_usage.md
@@ -0,0 +1,982 @@
+# Advanced Topics
+
+Paperless offers a couple of features that automate certain tasks and make
+your life easier.
+
+## Matching tags, correspondents, document types, and storage paths {#matching}
+
+Paperless will compare the matching algorithms defined by every tag,
+correspondent, document type, and storage path in your database to see
+if they apply to the text in a document. In other words, if you define a
+tag called `Home Utility` that had a `match` property of `bc hydro` and
+a `matching_algorithm` of `Exact`, Paperless will automatically tag
+your newly-consumed document with your `Home Utility` tag so long as the
+text `bc hydro` appears in the body of the document somewhere.
+
+The matching logic is quite powerful. It supports searching the text of
+your document with different algorithms, and as such, some
+experimentation may be necessary to get things right.
+
+In order to have a tag, correspondent, document type, or storage path
+assigned automatically to newly consumed documents, assign a match and
+matching algorithm using the web interface. These settings define when
+to assign tags, correspondents, document types, and storage paths to
+documents.
+
+The following algorithms are available:
+
+-   **None:** No matching will be performed.
+-   **Any:** Looks for any occurrence of any word provided in match in
+    the PDF. If you define the match as `Bank1 Bank2`, it will match
+    documents containing either of these terms.
+-   **All:** Requires that every word provided appears in the PDF,
+    albeit not in the order provided.
+-   **Exact:** Matches only if the match appears exactly as provided
+    (i.e. preserve ordering) in the PDF.
+-   **Regular expression:** Parses the match as a regular expression and
+    tries to find a match within the document.
+-   **Fuzzy match:** Uses a partial matching based on locating the tag text
+    inside the document, using a [partial ratio](https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html#partial-ratio)
+-   **Auto:** Tries to automatically match new documents. This does not
+    require you to set a match. See the [notes below](#automatic-matching).
+
+When using the _any_ or _all_ matching algorithms, you can search for
+terms that consist of multiple words by enclosing them in double quotes.
+For example, defining a match text of `"Bank of America" BofA` using the
+_any_ algorithm, will match documents that contain either "Bank of
+America" or "BofA", but will not match documents containing "Bank of
+South America".
+
+Then just save your tag, correspondent, document type, or storage path
+and run another document through the consumer. Once complete, you should
+see the newly-created document, automatically tagged with the
+appropriate data.
+
+### Automatic matching {#automatic-matching}
+
+Paperless-ngx comes with a new matching algorithm called _Auto_. This
+matching algorithm tries to assign tags, correspondents, document types,
+and storage paths to your documents based on how you have already
+assigned these on existing documents. It uses a neural network under the
+hood.
+
+If, for example, all your bank statements of your account 123 at the
+Bank of America are tagged with the tag "bofa123" and the matching
+algorithm of this tag is set to _Auto_, this neural network will examine
+your documents and automatically learn when to assign this tag.
+
+Paperless tries to hide much of the involved complexity with this
+approach. However, there are a couple caveats you need to keep in mind
+when using this feature:
+
+-   Changes to your documents are not immediately reflected by the
+    matching algorithm. The neural network needs to be _trained_ on your
+    documents after changes. Paperless periodically (default: once each
+    hour) checks for changes and does this automatically for you.
+-   The Auto matching algorithm only takes documents into account which
+    are NOT placed in your inbox (i.e. have any inbox tags assigned to
+    them). This ensures that the neural network only learns from
+    documents which you have correctly tagged before.
+-   The matching algorithm can only work if there is a correlation
+    between the tag, correspondent, document type, or storage path and
+    the document itself. Your bank statements usually contain your bank
+    account number and the name of the bank, so this works reasonably
+    well, However, tags such as "TODO" cannot be automatically
+    assigned.
+-   The matching algorithm needs a reasonable number of documents to
+    identify when to assign tags, correspondents, storage paths, and
+    types. If one out of a thousand documents has the correspondent
+    "Very obscure web shop I bought something five years ago", it will
+    probably not assign this correspondent automatically if you buy
+    something from them again. The more documents, the better.
+-   Paperless also needs a reasonable amount of negative examples to
+    decide when not to assign a certain tag, correspondent, document
+    type, or storage path. This will usually be the case as you start
+    filling up paperless with documents. Example: If all your documents
+    are either from "Webshop" or "Bank", paperless will assign one
+    of these correspondents to ANY new document, if both are set to
+    automatic matching.
+
+## Hooking into the consumption process {#consume-hooks}
+
+Sometimes you may want to do something arbitrary whenever a document is
+consumed. Rather than try to predict what you may want to do, Paperless
+lets you execute scripts of your own choosing just before or after a
+document is consumed using a couple of simple hooks.
+
+Just write a script, put it somewhere that Paperless can read & execute,
+and then put the path to that script in `paperless.conf` or
+`docker-compose.env` with the variable name of either
+[`PAPERLESS_PRE_CONSUME_SCRIPT`](configuration.md#PAPERLESS_PRE_CONSUME_SCRIPT) or [`PAPERLESS_POST_CONSUME_SCRIPT`](configuration.md#PAPERLESS_POST_CONSUME_SCRIPT).
+
+!!! info
+
+    These scripts are executed in a **blocking** process, which means that
+    if a script takes a long time to run, it can significantly slow down
+    your document consumption flow. If you want things to run
+    asynchronously, you'll have to fork the process in your script and
+    exit.
+
+### Pre-consumption script {#pre-consume-script}
+
+Executed after the consumer sees a new document in the consumption
+folder, but before any processing of the document is performed. This
+script can access the following relevant environment variables set:
+
+| Environment Variable    | Description                                                  |
+| ----------------------- | ------------------------------------------------------------ |
+| `DOCUMENT_SOURCE_PATH`  | Original path of the consumed document                       |
+| `DOCUMENT_WORKING_PATH` | Path to a copy of the original that consumption will work on |
+| `TASK_ID`               | UUID of the task used to process the new document (if any)   |
+
+!!! note
+
+    Pre-consume scripts which modify the document should only change
+    the `DOCUMENT_WORKING_PATH` file or a second consume task may
+    be triggered, leading to failures as two tasks work on the
+    same document path
+
+!!! warning
+
+    If your script modifies `DOCUMENT_WORKING_PATH` in a non-deterministic
+    way, this may allow duplicate documents to be stored
+
+A simple but common example for this would be creating a simple script
+like this:
+
+`/usr/local/bin/ocr-pdf`
+
+```bash
+#!/usr/bin/env bash
+pdf2pdfocr.py -i ${DOCUMENT_WORKING_PATH}
+```
+
+`/etc/paperless.conf`
+
+```bash
+...
+PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
+...
+```
+
+This will pass the path to the document about to be consumed to
+`/usr/local/bin/ocr-pdf`, which will in turn call
+[pdf2pdfocr.py](https://github.com/LeoFCardoso/pdf2pdfocr) on your
+document, which will then overwrite the file with an OCR'd version of
+the file and exit. At which point, the consumption process will begin
+with the newly modified file.
+
+The script's stdout and stderr will be logged line by line to the
+webserver log, along with the exit code of the script.
+
+### Post-consumption script {#post-consume-script}
+
+Executed after the consumer has successfully processed a document and
+has moved it into paperless. It receives the following environment
+variables:
+
+| Environment Variable         | Description                                    |
+| ---------------------------- | ---------------------------------------------- |
+| `DOCUMENT_ID`                | Database primary key of the document           |
+| `DOCUMENT_FILE_NAME`         | Formatted filename, not including paths        |
+| `DOCUMENT_TYPE`              | The document type (if any)                     |
+| `DOCUMENT_CREATED`           | Date & time when document created              |
+| `DOCUMENT_MODIFIED`          | Date & time when document was last modified    |
+| `DOCUMENT_ADDED`             | Date & time when document was added            |
+| `DOCUMENT_SOURCE_PATH`       | Path to the original document file             |
+| `DOCUMENT_ARCHIVE_PATH`      | Path to the generate archive file (if any)     |
+| `DOCUMENT_THUMBNAIL_PATH`    | Path to the generated thumbnail                |
+| `DOCUMENT_DOWNLOAD_URL`      | URL for document download                      |
+| `DOCUMENT_THUMBNAIL_URL`     | URL for the document thumbnail                 |
+| `DOCUMENT_OWNER`             | Username of the document owner (if any)        |
+| `DOCUMENT_CORRESPONDENT`     | Assigned correspondent (if any)                |
+| `DOCUMENT_TAGS`              | Comma separated list of tags applied (if any)  |
+| `DOCUMENT_ORIGINAL_FILENAME` | Filename of original document                  |
+| `TASK_ID`                    | Task UUID used to import the document (if any) |
+
+The script can be in any language, A simple shell script example:
+
+```bash title="post-consumption-example"
+--8<-- "./scripts/post-consumption-example.sh"
+```
+
+!!! note
+
+    The post consumption script cannot cancel the consumption process.
+
+!!! warning
+
+    The post consumption script should not modify the document files
+    directly.
+
+The script's stdout and stderr will be logged line by line to the
+webserver log, along with the exit code of the script.
+
+### Docker {#docker-consume-hooks}
+
+To hook into the consumption process when using Docker, you
+will need to pass the scripts into the container via a host mount
+in your `docker-compose.yml`.
+
+Assuming you have
+`/home/paperless-ngx/scripts/post-consumption-example.sh` as a
+script which you'd like to run.
+
+You can pass that script into the consumer container via a host mount:
+
+```yaml
+...
+webserver:
+  ...
+  volumes:
+    ...
+    - /home/paperless-ngx/scripts:/path/in/container/scripts/ # (1)!
+  environment: # (3)!
+    ...
+    PAPERLESS_POST_CONSUME_SCRIPT: /path/in/container/scripts/post-consumption-example.sh # (2)!
+...
+```
+
+1. The external scripts directory is mounted to a location inside the container.
+2. The internal location of the script is used to set the script to run
+3. This can also be set in `docker-compose.env`
+
+Troubleshooting:
+
+-   Monitor the Docker Compose log
+    `cd ~/paperless-ngx; docker compose logs -f`
+-   Check your script's permission e.g. in case of permission error
+    `sudo chmod 755 post-consumption-example.sh`
+-   Pipe your scripts's output to a log file e.g.
+    `echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log`
+
+## File name handling {#file-name-handling}
+
+By default, paperless stores your documents in the media directory and
+renames them using the identifier which it has assigned to each
+document. You will end up getting files like `0000123.pdf` in your media
+directory. This isn't necessarily a bad thing, because you normally
+don't have to access these files manually. However, if you wish to name
+your files differently, you can do that by adjusting the
+[`PAPERLESS_FILENAME_FORMAT`](configuration.md#PAPERLESS_FILENAME_FORMAT) configuration option
+or using [storage paths (see below)](#storage-paths). Paperless adds the
+correct file extension e.g. `.pdf`, `.jpg` automatically.
+
+This variable allows you to configure the filename (folders are allowed)
+using placeholders. For example, configuring this to
+
+```bash
+PAPERLESS_FILENAME_FORMAT={{ created_year }}/{{ correspondent }}/{{ title }}
+```
+
+will create a directory structure as follows:
+
+```
+2019/
+  My bank/
+    Statement January.pdf
+    Statement February.pdf
+2020/
+  My bank/
+    Statement January.pdf
+    Letter.pdf
+    Letter_01.pdf
+  Shoe store/
+    My new shoes.pdf
+```
+
+!!! warning
+
+    Do not manually move your files in the media folder. Paperless remembers
+    the last filename a document was stored as. If you do rename a file,
+    paperless will report your files as missing and won't be able to find
+    them.
+
+!!! tip
+
+    Paperless checks the filename of a document whenever it is saved. Changing (or deleting)
+    a [storage path](#storage-paths) will automatically be reflected in the file system. However,
+    when changing `PAPERLESS_FILENAME_FORMAT` you will need to manually run the
+    [`document renamer`](administration.md#renamer) to move any existing documents.
+
+### Placeholders {#filename-format-variables}
+
+Paperless provides the following variables for use within filenames:
+
+-   `{{ asn }}`: The archive serial number of the document, or "none".
+-   `{{ correspondent }}`: The name of the correspondent, or "none".
+-   `{{ document_type }}`: The name of the document type, or "none".
+-   `{{ tag_list }}`: A comma separated list of all tags assigned to the
+    document.
+-   `{{ title }}`: The title of the document.
+-   `{{ created }}`: The full date (ISO 8601 format, e.g. `2024-03-14`) the document was created.
+-   `{{ created_year }}`: Year created only, formatted as the year with
+    century.
+-   `{{ created_year_short }}`: Year created only, formatted as the year
+    without century, zero padded.
+-   `{{ created_month }}`: Month created only (number 01-12).
+-   `{{ created_month_name }}`: Month created name, as per locale
+-   `{{ created_month_name_short }}`: Month created abbreviated name, as per
+    locale
+-   `{{ created_day }}`: Day created only (number 01-31).
+-   `{{ added }}`: The full date (ISO format) the document was added to
+    paperless.
+-   `{{ added_year }}`: Year added only.
+-   `{{ added_year_short }}`: Year added only, formatted as the year without
+    century, zero padded.
+-   `{{ added_month }}`: Month added only (number 01-12).
+-   `{{ added_month_name }}`: Month added name, as per locale
+-   `{{ added_month_name_short }}`: Month added abbreviated name, as per
+    locale
+-   `{{ added_day }}`: Day added only (number 01-31).
+-   `{{ owner_username }}`: Username of document owner, if any, or "none"
+-   `{{ original_name }}`: Document original filename, minus the extension, if any, or "none"
+-   `{{ doc_pk }}`: The paperless identifier (primary key) for the document.
+
+!!! warning
+
+    When using file name placeholders, in particular when using `{tag_list}`,
+    you may run into the limits of your operating system's maximum path lengths.
+    In that case, files will retain the previous path instead and the issue logged.
+
+!!! tip
+
+    These variables are all simple strings, but the format can be a full template.
+    See [Filename Templates](#filename-templates) for even more advanced formatting.
+
+Paperless will try to conserve the information from your database as
+much as possible. However, some characters that you can use in document
+titles and correspondent names (such as `: \ /` and a couple more) are
+not allowed in filenames and will be replaced with dashes.
+
+If paperless detects that two documents share the same filename,
+paperless will automatically append `_01`, `_02`, etc to the filename.
+This happens if all the placeholders in a filename evaluate to the same
+value.
+
+If there are any errors in the placeholders included in `PAPERLESS_FILENAME_FORMAT`,
+paperless will fall back to using the default naming scheme instead.
+
+!!! caution
+
+    As of now, you could potentially tell paperless to store your files anywhere
+    outside the media directory by setting
+
+    ```
+    PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
+    ```
+
+    However, keep in mind that inside docker, if files get stored outside of
+    the predefined volumes, they will be lost after a restart.
+
+#### Empty placeholders
+
+You can affect how empty placeholders are treated by changing the
+[`PAPERLESS_FILENAME_FORMAT_REMOVE_NONE`](configuration.md#PAPERLESS_FILENAME_FORMAT_REMOVE_NONE) setting.
+
+Enabling this results in all empty placeholders resolving to "" instead of "none" as stated above. Spaces
+before empty placeholders are removed as well, empty directories are omitted.
+
+### Storage paths
+
+When a single storage layout is not sufficient for your use case, storage paths allow for more complex
+structure to set precisely where each document is stored in the file system.
+
+-   Each storage path is a [`PAPERLESS_FILENAME_FORMAT`](configuration.md#PAPERLESS_FILENAME_FORMAT) and
+    follows the rules described above
+-   Each document is assigned a storage path using the matching algorithms described above, but can be
+    overwritten at any time
+
+For example, you could define the following two storage paths:
+
+1.  Normal communications are put into a folder structure sorted by
+    `year/correspondent`
+2.  Communications with insurance companies are stored in a flat
+    structure with longer file names, but containing the full date of
+    the correspondence.
+
+```
+By Year = {{ created_year }}/{{ correspondent }}/{{ title }}
+Insurances = Insurances/{{ correspondent }}/{{ created_year }}-{{ created_month }}-{{ created_day }} {{ title }}
+```
+
+If you then map these storage paths to the documents, you might get the
+following result. For simplicity, `By Year` defines the same
+structure as in the previous example above.
+
+```text
+2019/                                   # By Year
+   My bank/
+     Statement January.pdf
+     Statement February.pdf
+
+Insurances/                             # Insurances
+   Healthcare 123/
+     2022-01-01 Statement January.pdf
+     2022-02-02 Letter.pdf
+     2022-02-03 Letter.pdf
+   Dental 456/
+     2021-12-01 New Conditions.pdf
+```
+
+!!! tip
+
+    Defining a storage path is optional. If no storage path is defined for a
+    document, the global [`PAPERLESS_FILENAME_FORMAT`](configuration.md#PAPERLESS_FILENAME_FORMAT) is applied.
+
+### Filename Templates {#filename-templates}
+
+The filename formatting uses [Jinja templates](https://jinja.palletsprojects.com/en/3.1.x/templates/) to build the filename.
+This allows for complex logic to be included in the format, including [logical structures](https://jinja.palletsprojects.com/en/3.1.x/templates/#list-of-control-structures)
+and [filters](https://jinja.palletsprojects.com/en/3.1.x/templates/#id11) to manipulate the [variables](#filename-format-variables)
+provided. The template is provided as a string, potentially multiline, and rendered into a single line.
+
+In addition, the entire Document instance is available to be utilized in a more advanced way, as well as some variables which only make sense to be accessed
+with more complex logic.
+
+#### Custom Jinja2 Filters
+
+##### Custom Field Access
+
+The `get_cf_value` filter retrieves a value from custom field data with optional default fallback.
+
+###### Syntax
+
+```jinja2
+{{ custom_fields | get_cf_value('field_name') }}
+{{ custom_fields | get_cf_value('field_name', 'default_value') }}
+```
+
+###### Parameters
+
+-   `custom_fields`: This _must_ be the provided custom field data
+-   `name` (str): Name of the custom field to retrieve
+-   `default` (str, optional): Default value to return if field is not found or has no value
+
+###### Returns
+
+-   `str | None`: The field value, default value, or `None` if neither exists
+
+###### Examples
+
+```jinja2
+<!-- Basic usage -->
+{{ custom_fields | get_cf_value('department') }}
+
+<!-- With default value -->
+{{ custom_fields | get_cf_value('phone', 'Not provided') }}
+```
+
+##### Datetime Formatting
+
+The `datetime` filter formats a datetime string or datetime object using Python's strftime formatting.
+
+###### Syntax
+
+```jinja2
+{{ datetime_value | datetime('%Y-%m-%d %H:%M:%S') }}
+```
+
+###### Parameters
+
+-   `value` (str | datetime): Date/time value to format (strings will be parsed automatically)
+-   `format` (str): Python strftime format string
+
+###### Returns
+
+-   `str`: Formatted datetime string
+
+###### Examples
+
+```jinja2
+<!-- Format datetime object -->
+{{ created | datetime('%B %d, %Y at %I:%M %p') }}
+<!-- Output: "January 15, 2024 at 02:30 PM" -->
+
+<!-- Custom formatting -->
+{{ custom_fields | get_cf_value('Date Field') | datetime('%A, %B %d, %Y') }}
+<!-- Output: "Monday, January 15, 2024" -->
+```
+
+See the [strftime format code documentation](https://docs.python.org/3.13/library/datetime.html#strftime-and-strptime-format-codes)
+for the possible codes and their meanings.
+
+##### Date Localization
+
+The `localize_date` filter formats a date or datetime object into a localized string using Babel internationalization.
+This takes into account the provided locale for translation. Since this must be used on a date or datetime object,
+you must access the field directly, i.e. `document.created`.
+An ISO string can also be provided to control the output format.
+
+###### Syntax
+
+```jinja2
+{{ date_value | localize_date('medium', 'en_US') }}
+{{ datetime_value | localize_date('short', 'fr_FR') }}
+```
+
+###### Parameters
+
+-   `value` (date | datetime | str): Date, datetime object or ISO string to format (datetime should be timezone-aware)
+-   `format` (str): Format type - either a Babel preset ('short', 'medium', 'long', 'full') or custom pattern
+-   `locale` (str): Locale code for localization (e.g., 'en_US', 'fr_FR', 'de_DE')
+
+###### Returns
+
+-   `str`: Localized, formatted date string
+
+###### Examples
+
+```jinja2
+<!-- Preset formats -->
+{{ document.created | localize_date('short', 'en_US') }}
+<!-- Output: "1/15/24" -->
+
+{{ document.created | localize_date('medium', 'en_US') }}
+<!-- Output: "Jan 15, 2024" -->
+
+{{ document.created | localize_date('long', 'en_US') }}
+<!-- Output: "January 15, 2024" -->
+
+{{ document.created | localize_date('full', 'en_US') }}
+<!-- Output: "Monday, January 15, 2024" -->
+
+<!-- Different locales -->
+{{ document.created | localize_date('medium', 'fr_FR') }}
+<!-- Output: "15 janv. 2024" -->
+
+{{ document.created | localize_date('medium', 'de_DE') }}
+<!-- Output: "15.01.2024" -->
+
+<!-- Custom patterns -->
+{{ document.created | localize_date('dd/MM/yyyy', 'en_GB') }}
+<!-- Output: "15/01/2024" -->
+```
+
+See the [supported format codes](https://unicode.org/reports/tr35/tr35-dates.html#Date_Format_Patterns) for more options.
+
+### Format Presets
+
+-   **short**: Abbreviated format (e.g., "1/15/24")
+-   **medium**: Medium-length format (e.g., "Jan 15, 2024")
+-   **long**: Long format with full month name (e.g., "January 15, 2024")
+-   **full**: Full format including day of week (e.g., "Monday, January 15, 2024")
+
+#### Additional Variables
+
+-   `{{ tag_name_list }}`: A list of tag names applied to the document, ordered by the tag name. Note this is a list, not a single string
+-   `{{ custom_fields }}`: A mapping of custom field names to their type and value. A user can access the mapping by field name or check if a field is applied by checking its existence in the variable.
+
+!!! tip
+
+    To access a custom field which has a space in the name, use the `get_cf_value` filter.  See the examples below.
+    This helps get fields by name and handle a default value if the named field is not attached to a Document.
+
+#### Examples
+
+This example will construct a path based on the archive serial number range:
+
+```jinja
+somepath/
+{% if document.archive_serial_number >= 0 and document.archive_serial_number <= 200 %}
+  asn-000-200/{{title}}
+{% elif document.archive_serial_number >= 201 and document.archive_serial_number <= 400 %}
+  asn-201-400
+  {% if document.archive_serial_number >= 201 and document.archive_serial_number < 300 %}
+    /asn-2xx
+  {% elif document.archive_serial_number >= 300 and document.archive_serial_number < 400 %}
+    /asn-3xx
+  {% endif %}
+{% endif %}
+/{{ title }}
+```
+
+For a document with an ASN of 205, it would result in `somepath/asn-201-400/asn-2xx/Title.pdf`, but
+a document with an ASN of 355 would be placed in `somepath/asn-201-400/asn-3xx/Title.pdf`.
+
+```jinja
+{% if document.mime_type == "application/pdf" %}
+  pdfs
+{% elif document.mime_type == "image/png" %}
+  pngs
+{% else %}
+  others
+{% endif %}
+/{{ title }}
+```
+
+For a PDF document, it would result in `pdfs/Title.pdf`, but for a PNG document, the path would be `pngs/Title.png`.
+
+To use custom fields:
+
+```jinja
+{% if "Invoice" in custom_fields %}
+  invoices/{{ custom_fields.Invoice.value }}
+{% else %}
+  not-invoices/{{ title }}
+{% endif %}
+```
+
+If the document has a custom field named "Invoice" with a value of 123, it would be filed into the `invoices/123.pdf`, but a document without the custom field
+would be filed to `not-invoices/Title.pdf`
+
+If the custom field is named "Invoice Number", you would access the value of it via the `get_cf_value` filter due to quirks of the Django Template Language:
+
+```jinja
+"invoices/{{ custom_fields|get_cf_value('Invoice Number') }}"
+```
+
+You can also use a custom `datetime` filter to format dates:
+
+```jinja
+invoices/
+{{ custom_fields|get_cf_value("Date Field","2024-01-01")|datetime('%Y') }}/
+{{ custom_fields|get_cf_value("Date Field","2024-01-01")|datetime('%m') }}/
+{{ custom_fields|get_cf_value("Date Field","2024-01-01")|datetime('%d') }}/
+Invoice_{{ custom_fields|get_cf_value("Select Field") }}_{{ custom_fields|get_cf_value("Date Field","2024-01-01")|replace("-", "") }}.pdf
+```
+
+This will create a path like `invoices/2022/01/01/Invoice_OptionTwo_20220101.pdf` if the custom field "Date Field" is set to January 1, 2022 and "Select Field" is set to `OptionTwo`.
+
+You can also use a custom `slugify` filter to slufigy text:
+
+```jinja
+{{ title | slugify }}
+```
+
+## Automatic recovery of invalid PDFs {#pdf-recovery}
+
+Paperless will attempt to "clean" certain invalid PDFs with `qpdf` before processing if, for example, the mime_type
+detection is incorrect. This can happen if the PDF is not properly formatted or contains errors.
+
+## Celery Monitoring {#celery-monitoring}
+
+The monitoring tool
+[Flower](https://flower.readthedocs.io/en/latest/index.html) can be used
+to view more detailed information about the health of the celery workers
+used for asynchronous tasks. This includes details on currently running,
+queued and completed tasks, timing and more. Flower can also be used
+with Prometheus, as it exports metrics. For details on its capabilities,
+refer to the [Flower](https://flower.readthedocs.io/en/latest/index.html)
+documentation.
+
+Flower can be enabled with the setting [PAPERLESS_ENABLE_FLOWER](configuration.md#PAPERLESS_ENABLE_FLOWER).
+To configure Flower further, create a `flowerconfig.py` and
+place it into the `src/paperless` directory. For a Docker
+installation, you can use volumes to accomplish this:
+
+```yaml
+services:
+    # ...
+    webserver:
+        environment:
+            - PAPERLESS_ENABLE_FLOWER
+        ports:
+            - 5555:5555 # (2)!
+        # ...
+        volumes:
+            - /path/to/my/flowerconfig.py:/usr/src/paperless/src/paperless/flowerconfig.py:ro # (1)!
+```
+
+1. Note the `:ro` tag means the file will be mounted as read only.
+2. By default, Flower runs on port 5555, but this can be configured.
+
+## Custom Container Initialization
+
+The Docker image includes the ability to run custom user scripts during
+startup. This could be utilized for installing additional tools or
+Python packages, for example. Scripts are expected to be shell scripts.
+
+To utilize this, mount a folder containing your scripts to the custom
+initialization directory, `/custom-cont-init.d` and place
+scripts you wish to run inside. For security, the folder must be owned
+by `root` and should have permissions of `a=rx`. Additionally, scripts
+must only be writable by `root`.
+
+Your scripts will be run directly before the webserver completes
+startup. Scripts will be run by the `root` user.
+If you would like to switch users, the utility `gosu` is available and
+preferred over `sudo`.
+
+This is an advanced functionality with which you could break functionality
+or lose data. If you experience issues, please disable any custom scripts
+and try again before reporting an issue.
+
+For example, using Docker Compose:
+
+```yaml
+services:
+    # ...
+    webserver:
+        # ...
+        volumes:
+            - /path/to/my/scripts:/custom-cont-init.d:ro # (1)!
+```
+
+1. Note the `:ro` tag means the folder will be mounted as read only. This is for extra security against changes
+
+## MySQL Caveats {#mysql-caveats}
+
+### Case Sensitivity
+
+The database interface does not provide a method to configure a MySQL
+database to be case-sensitive. A case-**in**sensitive database prevents a user from creating a
+tag `Name` and `NAME` as they are considered the same.
+
+However, there is a downside to turning on case sensitivity, as it also makes searches case-sensitive,
+so for example a document with the title `Invoice` won't be found when searching for `invoice`.
+
+Per Django documentation, making a database case-sensitive requires manual intervention.
+To enable case sensitive tables, you can execute the following command
+against each table:
+
+`ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
+
+You can also set the default for new tables (this does NOT affect
+existing tables) with:
+
+`ALTER DATABASE <db_name> CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;`
+
+!!! warning
+
+    Using mariadb version 10.4+ is recommended. Using the `utf8mb3` character set on
+    an older system may fix issues that can arise while setting up Paperless-ngx but
+    `utf8mb3` can cause issues with consumption (where `utf8mb4` does not).
+
+For more information on this topic, you can refer to [this](https://code.djangoproject.com/ticket/9682) Django issue.
+
+### Missing timezones
+
+MySQL as well as MariaDB do not have any timezone information by default (though some
+docker images such as the official MariaDB image take care of this for you) which will
+cause unexpected behavior with date-based queries.
+
+To fix this, execute one of the following commands:
+
+MySQL: `mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p`
+
+MariaDB: `mariadb-tzinfo-to-sql /usr/share/zoneinfo | mariadb -u root mysql -p`
+
+## Barcodes {#barcodes}
+
+Paperless is able to utilize barcodes for automatically performing some tasks.
+
+At this time, the library utilized for detection of barcodes supports the following types:
+
+-   AN-13/UPC-A
+-   UPC-E
+-   EAN-8
+-   Code 128
+-   Code 93
+-   Code 39
+-   Codabar
+-   Interleaved 2 of 5
+-   QR Code
+-   SQ Code
+
+You may check for updates on the [zbar library homepage](https://github.com/mchehab/zbar).
+For usage in Paperless, the type of barcode does not matter, only the contents of it.
+
+For how to enable barcode usage, see [the configuration](configuration.md#barcodes).
+The two settings may be enabled independently, but do have interactions as explained
+below.
+
+### Document Splitting {#document-splitting}
+
+When enabled, Paperless will look for a barcode with the configured value and create a new document
+starting from the next page. The page with the barcode on it will _not_ be retained. It
+is expected to be a page existing only for triggering the split.
+
+### Archive Serial Number Assignment
+
+When enabled, the value of the barcode (as an integer) will be used to set the document's
+archive serial number, allowing quick reference back to the original, paper document.
+
+If document splitting via barcode is also enabled, documents will be split when an ASN
+barcode is located. However, differing from the splitting, the page with the
+barcode _will_ be retained. This allows application of a barcode to any page, including
+one which holds data to keep in the document.
+
+### Tag Assignment
+
+When enabled, Paperless will parse barcodes and attempt to interpret and assign tags.
+
+See the relevant settings [`PAPERLESS_CONSUMER_ENABLE_TAG_BARCODE`](configuration.md#PAPERLESS_CONSUMER_ENABLE_TAG_BARCODE)
+and [`PAPERLESS_CONSUMER_TAG_BARCODE_MAPPING`](configuration.md#PAPERLESS_CONSUMER_TAG_BARCODE_MAPPING)
+for more information.
+
+## Automatic collation of double-sided documents {#collate}
+
+!!! note
+
+    If your scanner supports double-sided scanning natively, you do not need this feature.
+
+This feature is turned off by default, see [configuration](configuration.md#collate) on how to turn it on.
+
+### Summary
+
+If you have a scanner with an automatic document feeder (ADF) that only scans a single side,
+this feature makes scanning double-sided documents much more convenient by automatically
+collating two separate scans into one document, reordering the pages as necessary.
+
+### Usage example
+
+Suppose you have a double-sided document with 6 pages (3 sheets of paper). First,
+put the stack into your ADF as normal, ensuring that page 1 is scanned first. Your ADF
+will now scan pages 1, 3, and 5. Then you (or your scanner, if it supports it) upload
+the scan into the correct sub-directory of the consume folder (`double-sided` by default;
+keep in mind that Paperless will _not_ automatically create the directory for you.)
+Paperless will then process the scan and move it into an internal staging area.
+
+The next step is to turn your stack upside down (without reordering the sheets of paper),
+and scan it once again, your ADF will now scan pages 6, 4, and 2, in that order. Once this
+scan is copied into the sub-directory, Paperless will collate the previous scan with the
+new one, reversing the order of the pages on the second, "even numbered" scan. The
+resulting document will have the pages 1-6 in the correct order, and this new file will
+then be processed as normal.
+
+!!! tip
+
+    When scanning the even numbered pages, you can omit the last empty pages, if there are
+    any. For example, if page 6 is empty, you only need to scan pages 2 and 4. _Do not_ omit
+    empty pages in the middle of the document.
+
+### Things that could go wrong
+
+Paperless will notice when the first, "odd numbered" scan has less pages than the second
+scan (this can happen when e.g. the ADF skipped a few pages in the first pass). In that
+case, Paperless will remove the staging copy as well as the scan, and give you an error
+message asking you to restart the process from scratch, by scanning the odd pages again,
+followed by the even pages.
+
+It's important that the scan files get consumed in the correct order, and one at a time.
+You therefore need to make sure that Paperless is running while you upload the files into
+the directory; and if you're using [polling](configuration.md#polling), make sure that
+`CONSUMER_POLLING` is set to a value lower than it takes for the second scan to appear,
+like 5-10 or even lower.
+
+Another thing that might happen is that you start a double sided scan, but then forget
+to upload the second file. To avoid collating the wrong documents if you then come back
+a day later to scan a new double-sided document, Paperless will only keep an "odd numbered
+pages" file for up to 30 minutes. If more time passes, it will consider the next incoming
+scan a completely new "odd numbered pages" one. The old staging file will get discarded.
+
+### Interaction with "subdirs as tags"
+
+The collation feature can be used together with the [subdirs as tags](configuration.md#consume_config)
+feature (but this is not a requirement). Just create a correctly named double-sided subdir
+in the hierarchy and upload your scans there. For example, both `double-sided/foo/bar` as
+well as `foo/bar/double-sided` will cause the collated document to be treated as if it
+were uploaded into `foo/bar` and receive both `foo` and `bar` tags, but not `double-sided`.
+
+### Interaction with document splitting
+
+You can use the [document splitting](#document-splitting) feature, but if you use a normal
+single-sided split marker page, the split document(s) will have an empty page at the front (or
+whatever else was on the backside of the split marker page.) You can work around that by having
+a split marker page that has the split barcode on _both_ sides. This way, the extra page will
+get automatically removed.
+
+## SSO and third party authentication with Paperless-ngx
+
+Paperless-ngx has a built-in authentication system from Django but you can easily integrate an
+external authentication solution using one of the following methods:
+
+### Remote User authentication
+
+This is a simple option that uses remote user authentication made available by certain SSO
+applications. See the relevant configuration options for more information:
+[PAPERLESS_ENABLE_HTTP_REMOTE_USER](configuration.md#PAPERLESS_ENABLE_HTTP_REMOTE_USER),
+[PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME](configuration.md#PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME)
+and [PAPERLESS_LOGOUT_REDIRECT_URL](configuration.md#PAPERLESS_LOGOUT_REDIRECT_URL)
+
+### OpenID Connect and social authentication
+
+Version 2.5.0 of Paperless-ngx added support for integrating other authentication systems via
+the [django-allauth](https://github.com/pennersr/django-allauth) package. Once set up, users
+can either log in or (optionally) sign up using any third party systems you integrate. See the
+relevant [configuration settings](configuration.md#PAPERLESS_SOCIALACCOUNT_PROVIDERS) and
+[django-allauth docs](https://docs.allauth.org/en/latest/socialaccount/configuration.html)
+for more information.
+
+To associate an existing Paperless-ngx account with a social account, first login with your
+regular credentials and then choose "My Profile" from the user dropdown in the app and you
+will see options to connect social account(s). If enabled, signup options will be available
+on the login page.
+
+As an example, to set up login via Github, the following environment variables would need to be
+set:
+
+```conf
+PAPERLESS_APPS="allauth.socialaccount.providers.github"
+PAPERLESS_SOCIALACCOUNT_PROVIDERS='{"github": {"APPS": [{"provider_id": "github","name": "Github","client_id": "<CLIENT_ID>","secret": "<CLIENT_SECRET>"}]}}'
+```
+
+Or, to use OpenID Connect ("OIDC"), via Keycloak in this example:
+
+```conf
+PAPERLESS_APPS="allauth.socialaccount.providers.openid_connect"
+PAPERLESS_SOCIALACCOUNT_PROVIDERS='
+{"openid_connect": {"APPS": [{"provider_id": "keycloak","name": "Keycloak","client_id": "paperless","secret": "<CLIENT_SECRET>","settings": { "server_url": "https://<KEYCLOAK_SERVER>/realms/<REALM>/.well-known/openid-configuration"}}]}}'
+```
+
+More details about configuration option for various providers can be found in the [allauth documentation](https://docs.allauth.org/en/latest/socialaccount/providers/index.html#provider-specifics).
+
+### Disabling Regular Login
+
+Once external auth is set up, 'regular' login can be disabled with the [PAPERLESS_DISABLE_REGULAR_LOGIN](configuration.md#PAPERLESS_DISABLE_REGULAR_LOGIN) setting and / or users can be automatically
+redirected with the [PAPERLESS_REDIRECT_LOGIN_TO_SSO](configuration.md#PAPERLESS_REDIRECT_LOGIN_TO_SSO) setting.
+
+## Decryption of encrypted emails before consumption {#gpg-decryptor}
+
+Paperless-ngx can be configured to decrypt gpg encrypted emails before consumption.
+
+### Requirements
+
+You need a recent version of `gpg-agent >= 2.1.1` installed on your host.
+Your host needs to be setup for decrypting your emails via `gpg-agent`, see this [tutorial](https://www.digitalocean.com/community/tutorials/how-to-use-gpg-to-encrypt-and-sign-messages#encrypt-and-decrypt-messages-with-gpg) for instance.
+Test your setup and make sure that you can encrypt and decrypt files using your key
+
+```
+gpg --encrypt --armor -r person@email.com name_of_file
+gpg --decrypt name_of_file.asc
+```
+
+### Setup
+
+First, enable the [PAPERLESS_ENABLE_GPG_DECRYPTOR environment variable](configuration.md#PAPERLESS_ENABLE_GPG_DECRYPTOR).
+
+Then determine your local `gpg-agent` socket by invoking
+
+```
+gpgconf --list-dir agent-socket
+```
+
+on your host. A possible output is `~/.gnupg/S.gpg-agent`.
+Also find the location of your public keyring.
+
+If using docker, you'll need to add the following volume mounts to your `docker-compose.yml` file:
+
+```yaml
+webserver:
+    volumes:
+        - /home/user/.gnupg/pubring.gpg:/usr/src/paperless/.gnupg/pubring.gpg
+        - <path to gpg-agent socket>:/usr/src/paperless/.gnupg/S.gpg-agent
+```
+
+For a 'bare-metal' installation no further configuration is necessary. If you
+want to use a separate `GNUPG_HOME`, you can do so by configuring the [PAPERLESS_EMAIL_GNUPG_HOME environment variable](configuration.md#PAPERLESS_EMAIL_GNUPG_HOME).
+
+### Troubleshooting
+
+-   Make sure, that `gpg-agent` is running on your host machine
+-   Make sure, that encryption and decryption works from inside the container using the `gpg` commands from above.
+-   Check that all files in `/usr/src/paperless/.gnupg` have correct permissions
+
+```shell
+paperless@9da1865df327:~/.gnupg$ ls -al
+drwx------ 1 paperless paperless   4096 Aug 18 17:52 .
+drwxr-xr-x 1 paperless paperless   4096 Aug 18 17:52 ..
+srw------- 1 paperless paperless      0 Aug 18 17:22 S.gpg-agent
+-rw------- 1 paperless paperless 147940 Jul 24 10:23 pubring.gpg
+```