File types that can be processed
Datamolino accepts the file types most often used for invoices, receipts, and statements.
PDF - the most common format for invoices. Multi-page PDFs can be split into individual documents.
Images - JPEG, PNG, HEIC, HEIF, TIFF, BMP, and GIF. Photos taken on a phone are accepted.
Word and Excel - doc, docx, xls, xlsx.
HTML and plain text - rendered into a document for processing.
Email messages - .eml and TNEF (Outlook winmail) attachments.
Archives - ZIP, RAR, and 7Z. The contents are extracted and processed individually. Office files inside archives are recognised.
For best results with scanned receipts, use PDFs with embedded text where possible. Datamolino can read scanned images too, but a well-aligned, high-contrast page extracts more reliably.
Image quality and size requirements
For image files, the picture has to be readable. Datamolino enforces a minimum and maximum size to avoid processing images that are too small to read or too large to handle.
Minimum dimensions: at least 160 pixels on both sides. Images smaller than this are skipped as "invalid file size".
Maximum dimensions: no more than 10,000 pixels on either side. Larger images are skipped as "invalid file size".
Within those limits, sharper images extract better. Avoid blurry photos, glare on the receipt, and pictures taken at sharp angles.
š My photo was skipped as too small. What should I do?
Retake the photo closer to the document, or scan it at a higher resolution. If you photographed the document on your phone, the default camera setting usually produces an image well above 160 pixels - check that the file was not compressed or thumb-nailed before upload.
File types that are skipped
Some file types are recognised but not processed. They are skipped at upload with a clear reason in Import history.
Spreadsheets and structured data files: CSV, XML, ODS, ICS - these are skipped as "ignored file" because they are not invoice formats.
Code and config files: CSS, JS, BIN, EXE, SH, BAT, DMG - skipped as "ignored file".
RTF, vCards, drawings, archives in unusual formats: also skipped.
Unknown file types: if Datamolino cannot identify the file (MIME type and extension both unrecognised), it is skipped as "unknown file".
Virus-infected files are accepted into Datamolino but never processed - they are marked "infected" and the download is blocked.
Learn more about finding skipped files in Import history.
How archives are handled
ZIP, RAR, and 7Z archives are extracted and the files inside are processed individually. Each extracted file inherits the document type and SPLIT setting from the parent upload.
Archives nested inside archives are not extracted - the inner archive is skipped with reason "embedded archive".
Failed extractions skip the whole archive with reason "archive extraction failed". Individual files in a failed archive cannot be recovered.
An empty archive is skipped with reason "empty".
Learn more about batch upload and SPLIT.
Why scan quality matters
Datamolino reads invoices automatically and pulls out fields like supplier, date, totals, and VAT. The cleaner the source document, the more fields are extracted correctly the first time.
Best practices for scanned and photographed documents:
Scan straight on - skewed pages are harder to read.
Use good lighting. Avoid shadows across the document.
Capture the entire document, including the edges. Cropped images may miss totals or headers.
Use the original PDF if you have it. Re-scanning a printed invoice almost always loses quality.
Avoid handwriting on the receipt itself - notes on top of the printed text can confuse extraction.
If a document is processed but several fields are blank or wrong, replacing the scan with a clearer copy and re-uploading usually gives better results.
