Duplicate detection
Jan Korecky avatar
Written by Jan Korecky
Updated over a week ago

Datamolino automatically detects duplicates when the same file is uploaded more than once into the same folder and thus enabling you to export your documents to your accounting with peace of mind. It makes working with Datamolino even more efficient.

How does it work?

We recognise duplicates on two levels:

  • File duplicates - not yet extracted files, located inside the Import history; identical file uploaded twice, e.g. the same email forwarded twice.

  • Data duplicates - detected after extraction, yellow icon inside your folder; the same content but NOT identical file, e.g. first uploaded as a digital file emailed by the supplier and later uploaded as a scan of the same invoice received on paper.

File duplicates

File duplicates are recognised immediately. With every upload, Datamolino detects if the identical file was uploaded before. You can find the file duplicates in the Import History and choose to delete or process the file. This helps if you accidentally uploaded an invoice to Datamolino twice. You only pay for duplicate invoices if you choose to process the 'File Duplicate'.

This is how the file duplicates show in the Import history:


Data duplicates

Data duplicates are recognised after processing is finished. Although the files are visually the same, they might have been created in a different time and format. For example, you upload a digital file that you received in an email from your supplier and later upload a scan of the same invoice that you received on paper through post. In such case, we first need to process the documents and recognise its content. By comparing the invoice numbers and supplier name we are able to tell you if this invoice is a duplicate, however, we will have to process the file which means you will be charged for this invoice. 

If you are using our 'Auto-Export' feature, please note that Duplicates are not automatically exported and you need to resolve the duplicates manually.


You will also get a link to the original invoice to review the file. You can choose to dismiss the warning and export the file anyway. 

Released on February 19, 2016.

Enable additional parameters

As a default setting, Datamolino detects duplicates based on the supplier name and invoice number match. If needed, you can enable two additional parameters to narrow down duplicate detection results:

  • Total

  • Issue date

You can add one of them or both depending on your use case. In practice, this means that all three (or four) conditions need to be met in order to flag the document as a duplicate.

For example, if a supplier re-issues an invoice with different amounts and you do not want such documents to be flagged as duplicates, you can add 'Total'. This means that as long as the invoice has different amounts, it won't be flagged as a duplicate. Or if you upload receipts where Datamolino captures generic text instead of a specific number (on receipts often missing), you may add one more element to recognise duplicates.

Note: Released in April, 2022.

Did this answer your question?