Skip to main content

What data Datamolino extracts from a document

What fields Datamolino reads from invoices, credit notes, and receipts, and what affects which fields appear.

Written by Lubica Jakubac

How extraction works

When a document arrives in Datamolino, it's processed automatically and data is captured. You can see the extracted values in the document view next to the original file.

The document moves through a short series of processing states before its fields become available. Once it reaches the Ready state, the captured data is shown and you can review, edit, or export it.

👉 Which document types does Datamolino extract data from?

Datamolino extracts data from purchase invoices, sales invoices, credit notes, receipts, purchase orders, and invoice statements. The set of fields read from each document depends on the document type - for example, credit notes are treated as negative amounts, and receipts typically have fewer header fields than invoices.

👉 What about bank statements?

Bank statements use a separate processing pipeline that reads transactions rather than invoice header fields. They are not covered in this article.


Header fields Datamolino reads

Header fields identify who issued the document and when. Datamolino captures these for every supported document type, although a few only apply to invoices and credit notes.

  • Supplier name - the business issuing the document.

  • Supplier VAT number

  • Invoice number or reference number.

  • Issue date - the date printed on the document.

  • Due date - when payment is due (invoices and credit notes).

  • Currency - the currency the document is issued in.

  • Description - taken from the first line on the document

If a field is not printed on the document, Datamolino leaves it blank rather than guessing. You can fill it in manually - learn more about how to edit document fields.


Totals and tax fields

Datamolino reads the sums on the document so they can be reconciled and exported to your accounting software.

  • Sub-total (net) - the amount before tax.

  • Tax amount - the VAT or sales tax printed on the document.

  • Total - the gross amount payable.

When the document carries multiple tax rates, the breakdown is captured per rate so it can be exported as a tax summary. Credit notes are flagged separately so the amounts are treated as negative on export.

If the captured totals do not add up - for example, captured sub-total plus tax does not match the total calculated with coding - the document is marked with a checksum error and cannot be exported until the figures are corrected. Learn more about checksum errors and how to fix them.


Line items

In addition to the document totals, Datamolino can extract the line items printed on the document - one row per item with its own description, quantity, unit price, tax, and amount.

Line item extraction is a separate capability and is only available if your plan includes it. If your plan does not include line items, the section is hidden on the document and only document-level totals are captured.

👉 Are line items extracted from every document?

Line items are captured from all document types - including receipts, scans, and PDF invoices - but quality affects results. Blurry scans, low-quality images, and handwritten documents are less likely to yield complete or accurate line item data.

Where line items are captured, you can review and adjust them in the line items table on the document. Learn more about how to enter, edit, and work with line items.

👉 What is captured per line item?

For each row, Datamolino reads the line item description and the amounts. Where the document prints them, it also reads the quantity, unit price, tax rate, and net or gross totals per row.


What can affect extraction

The same field is not always extracted the same way for every document. A few things influence what you see on screen.

👉 Does my plan affect what is extracted?

Yes. The biggest difference is line items - they are only displayed if your plan includes line item extraction. Header fields and document totals are displayed on all plans.

👉 Does my accounting software affect what is extracted?

The fields Datamolino reads off the document are the same regardless of which accounting software your folder is connected to. What changes is how those fields are presented and which extra coding fields appear next to them - for example, Xero shows account codes and tracking categories, QuickBooks shows class and customer, and FreeAgent shows different fields again. The export format you choose (Totals, Line Items, or Tax Summary) also controls which amount fields are visible at the document level versus the item level.

👉 What if a field is missing or wrong?

Any extracted value can be corrected directly on the document. If a field is consistently missing or wrong for a particular supplier, contact support@datamolino.com - the team can look into it and improve extraction for that supplier. Learn more about how to edit document fields.

👉 When can I see the extracted data?

Extracted data is available once the document reaches the Ready state. Before that, the document is still being processed and fields are not yet visible or editable. Learn more about document statuses.

Did this answer your question?