In our last newsletter; we discussed the importance of “knowing your numbers” — the foundation of effective cost management. But for many organizations; turning that idea into action is a real challenge. One of the biggest obstacles? Simply getting your data into a format that’s usable — something you can analyze and turn into useful; real insights.
In my experience; the most valuable data often comes from invoices. They contain detailed descriptions of purchases; item numbers; and transaction-level insights that are gold for cost analysis. The catch? Most of this data is locked away in PDF files.
For many of my clients; this means I’m regularly collecting and extracting data from stacks of PDF invoices; mostly on a quarterly basis.
For new project where we are trying to get an initial view of the spend; it often leads to thousands of pdf invoices to be converted to a consolidated digital format.
It’s not ideal; but it’s where the most actionable insights tend to live. Purchase Order (PO) data can help; too — but it’s often patchy or incomplete; and usually requires a fair bit of cleanup before it's useful.
Where to Begin? Start with Your Suppliers
If you’re not sure where to start; begin with your largest suppliers. Many can provide a consolidated download of your transactions in Excel or CSV format — and that’s a real win. With smaller suppliers you may not have that option; but it’s always worth asking. You’d be surprised what they can offer when prompted.
That said; in about 90% of the projects I work on; in the end we’ve still had to go back to the PDF invoice data. It’s more work upfront; but well worth the effort in the long run.
A Game-Changing Tool: Docparser
A few years ago; I faced the same challenge: piles of PDF invoices and no easy way to extract the data. I’ve tried outsourcing the task offshore; but I tend to end up spending more time fixing the results than it saved me.
That’s when a colleague suggested I give Docparser a go; a tool that extracts structured data from PDFs (and other formats). It’s not perfect by any means; but it’s powerful — especially with electronic PDFs and consistent invoice layouts. Best of all; their support team is really responsive and helpful and in my case I usually get a response overnight.
Once the rules have been set up for a particular client; it’s just a case of uploading the documents and the extraction is complete in 5 minutes. Once set up 300–400 invoices in just a few minutes. For me it’s a massive boost in productivity — and a real game changer for anyone dealing with large volumes of documents they need to get specific data from.
Need a Hand? Let’s Talk
If your data is stuck in PDFs or you’re not sure where to begin; you’re not alone — and it’s not the end of the road. There are tools and strategies that can help. Whether you want to tackle it yourself or get some support; I’m always happy to have a chat or roll up my sleeves and help you get started.

Grant Morrow
Principal Consultant
+61 415 203 575
gmorrow@eragroup.com



































































































