Skip to main content

Duplicate detection

file duplicates, data duplicates, safeguard against invoice duplicates

Written by Jan Korecky
Updated over 2 weeks ago

What this article covers:

  • Duplicate detection overview

  • File duplicates

  • Data duplicates

  • How duplicate matching works


Duplicate detection overview

Datamolino automatically checks for duplicate documents at two stages: immediately on upload, and after extraction. Understanding how each type works helps you decide when action is needed and whether you'll be charged.

πŸ‘‰ What types of duplicates are there?

  • File duplicates - detected immediately on upload, before any extraction occurs. These are byte-for-byte identical files, such as the same email forwarded twice.

  • Data duplicates - detected after extraction is complete. These are documents with the same invoice data but uploaded as different files.


File duplicates

File duplicates are detected immediately - before any processing begins. Every time a file is uploaded, Datamolino checks whether an identical file already exists in the same folder.

πŸ‘‰ Where do I find them?

File duplicates appear in Import History. From there, you can choose to delete the file or process it.

πŸ‘‰ Will I be charged for file duplicates?

No - not unless you choose to process the file. File duplicates are held in Import History without charge.


Data duplicates

Data duplicates are detected after processing is complete. Unlike file duplicates, these are documents that share the same invoice data but were uploaded as different files - for example, a digital PDF from a supplier and a scanned paper copy of the same invoice.

πŸ‘‰ Where to find data duplicates?

The documents are labelled Data Duplicate and moved to the Needs Review tab. Open the document and you'll see a "Data duplicate detected" warning with a link to the original document.

From here, you can:

  • Dismiss the warning and export the document anyway

  • Delete the duplicate if it is not needed (via the three line menu in the corner)

πŸ‘‰ Will I be charged for data duplicates?

Yes. Datamolino must process the document before it can compare the data, so the charge applies before the duplicate flag is detected.

πŸ‘‰ What about Auto-Export?

Data duplicates are not automatically exported. If you use Auto-Export, you must resolve duplicates manually before they can be exported.


How data duplicate matching works

By default, Datamolino compares two fields: supplier name and invoice number. If both match an existing document in the folder, the document is flagged as a data duplicate.

πŸ‘‰ Can I make it more specific?

Yes, you can add up to two further conditions on top of these. The extra conditions are:

  • Total - the invoice amount must also match

  • Issue date - the invoice date must also match

You can enable one or both. If you enable one more, all three conditions must be met for a document to be flagged as a duplicate. If you enable both, all four conditions must be met.
​

To access it, open the folder menu and go to Accounting and Automation β†’ Workflow β†’ Duplicate detection.

Note: This setting is only available to folder administrators.

πŸ‘‰ When would I set additional detection criteria?

Two common scenarios:

  • A supplier re-issues an invoice with a corrected amount. Without enabling Total, this would be flagged as a duplicate because the supplier name and invoice number are the same. Adding Total means it is only flagged if the amounts also match.

  • You process receipts where Datamolino captures generic text in the invoice number field. Adding Issue date helps distinguish between separate transactions from the same supplier.

Did this answer your question?