The Humble PDF and the Birth of Manual Data Entry
For much of the past thirty years, the PDF has been doing a job it was never designed to do.
When Adobe introduced the Portable Document Format in the early 1990s, its purpose was narrow and practical. A document should look the same wherever it is opened. Fonts, spacing, pagination, and layout should remain intact, regardless of system or software. What you see is what you get.
At the time, this solved a real problem.
As work became more digital, documents needed to move between different computers, organisations, and jurisdictions without losing meaning. The PDF achieved that by freezing the page. It prioritised visual fidelity over structure, appearance over logic.
What it did not attempt to do was make information easy for machines to work with.
That distinction seemed minor at first. It wouldn’t stay that way.
A format built for certainty, not computation
The PDF was designed as a final form. It assumed the document had already been created, reviewed, and agreed. Its role was to preserve, not to participate.
From a technical perspective, this made sense. Early PDFs encoded text and layout in ways optimised for rendering rather than reuse. Even now, two PDFs that appear identical on screen can be encoded very differently beneath the surface. Some contain selectable text, others rely on embedded images. Tables may look aligned while offering no underlying structure at all.
The format did exactly what it set out to do. The problem is not that the PDF failed. It’s that the world around it changed.
How PDFs became freight infrastructure
In logistics and global trade, the conditions were right for PDFs to spread quietly.
Freight involves many independent actors operating across borders. Carriers, freight forwarders, shippers, ports, customs authorities, banks, and insurers all tend to rely on different systems, regulatory frameworks, and technical standards. Historically, there has been no single platform or shared data model connecting them.
The PDF required none of that alignment.
Invoices, bills of lading, packing lists, certificates, and customs documents could be created locally, sent by email, and opened by anyone. They looked official, were easy to archive, and were widely accepted by regulators and counterparties.
Over time, PDFs became neutral ground. They allowed trade to function without forcing systems to integrate or standards to converge.
That convenience carried consequences that were easy to overlook.
When digital workflows remained manual
As freight operations digitised, documents did not disappear. They multiplied.
Systems began recording bookings, movements, and statuses, but documents remained the practical source of truth. Data was copied from PDFs into transport management systems, customs portals, accounting platforms, and carrier interfaces. Values were retyped, checked, and reconciled by hand.
Trade facilitation research published by organisations such as the World Bank and the United Nations Economic Commission for Europe has repeatedly shown that administrative processing, rather than physical transport, is a major source of friction in global trade.
This work is careful and skilled. Errors carry financial, legal, and operational risk. Yet much of it remains invisible, absorbed into daily operations and rarely measured directly.
The result is a quiet contradiction. Workflows are described as digital, but depend heavily on human interpretation and repetition. Information moves quickly. Understanding often does not.
In practice, many organisations still rely on people to translate between documents and systems that cannot reliably share context.
Why automation struggles with PDFs
For years, PDFs have been the awkward middle layer in freight operations. Everyone uses them. No one really wants to.
Automation around PDFs is not new. Optical character recognition (OCR), which converts text in scanned images into machine-readable characters, has been trying to make sense of them for decades. Document parsers followed. The problem is not that extraction is impossible. It is that real-world documents are messy.
One carrier sends a clean digital export. Another sends a scan of a scan. Fonts shift. Tables break across pages. There is formatting buried in the file that you cannot see, but that quietly interferes with structured extraction. What looks obvious to a person is often inconsistent to a system.
Historically, that meant supervision, someone checking the output, someone handling the exceptions. The work did not disappear, it just moved.
What has changed over the past year is the capability of large language models. The improvements have been noticeable, not theoretical. Recent updates have materially increased accuracy when pulling structured data from inconsistent PDFs. It is still not simple. But it is far better than it was.
PDFs are not going anywhere. They remain the practical standard for organisations that do not share systems. The difference now is that we can work with them more effectively than before.
Good-enough tools and unintended permanence
The persistence of the PDF is not a failure of innovation. It reflects how infrastructure tends to evolve in the real world.
Tools that work well enough tend to last. Over time, expectations change faster than the tools themselves. What began as a document format gradually became a transport layer for information it was never designed to carry.
Once embedded in regulation, contracts, and operational routines, replacement becomes difficult. Removing the PDF would require more than new technology. It would require new agreements, shared standards, and trust between parties.
Seen in that light, its endurance is unsurprising.
Understanding the work beneath the surface
Manual data entry is often framed as inefficiency. In freight, it is more accurately a form of compensation.
Documents persist because systems cannot fully rely on one another’s data. Human judgement fills the gaps. People reconcile discrepancies, interpret context, and ensure that what moves matches what was agreed.
This work keeps trade moving.
Before trying to eliminate it, it’s worth understanding why it exists.
The PDF did not create this reality. It made it possible to scale.
A quiet foundation worth noticing
The PDF was meant to be temporary. A practical bridge between incompatible systems.
Thirty years later, it remains part of the foundation of global trade.
Recognising that history matters. Not because the PDF needs defending, but because meaningful change in freight operations depends on understanding how work is actually done, not just how it looks on screen.


