AI that reads a sea-service letter — turning a messy PDF into structured vessel time
John C. Thomas
Founder, BlueWave Projects
A sea-service letter is one of the messiest documents in the maritime world. It is how a mariner proves the time they have spent aboard a vessel — and it is the gate to every USCG license and endorsement upgrade. It is also, almost always, an unstructured mess: a PDF or a phone photo of a letter on a company's letterhead, every operator formatting it differently, the vessel name buried in a paragraph, the tonnage written three ways, the dates in whatever style the office manager preferred.
Mariners hate entering this by hand. We built AI that reads it for them. Here is how, and why it is harder than it looks.
Why this is not a solved problem
"Extract fields from a document" sounds like solved territory — and for a structured form, it is. A sea-service letter is the opposite of a form. There is no fixed layout. One operator writes "M/V Pacific Hunter, 1,600 GRT, served as Master from 03/2022 to 11/2023, coastwise." The next writes a three-paragraph narrative with the vessel, tonnage, route, and days scattered through it. The next sends a slightly rotated scan where the tonnage is inside a header logo.
The fields a mariner actually needs out of it are specific and consequential:
Get the tonnage or the days wrong and you can misadvise someone about an endorsement they are not actually eligible for yet. The cost of a confident wrong answer here is real.
The approach: vision in, structured data out
The pipeline is short, but every stage earns its place. The document is normalized — a photo de-skewed and made legible, a PDF rendered. Then a vision-capable model reads the whole document with layout awareness, so the tonnage in a header and the dates in a paragraph are both in play — not as flat OCR text.
The model is not asked to summarize the letter. It is given a strict schema — vessel, tonnage, capacity, route, start date, end date, computed days — and required to return data conforming to it, with a field left explicitly empty when the letter genuinely does not state it. That last part matters: the system has to be willing to say "the letter does not specify horsepower" instead of inventing a plausible number. An invented field is worse than a missing one.
Finally, the extracted values are validated against what is mechanically checkable — dates in order, days consistent with the range, tonnage within plausible bounds — and anything that fails is flagged for the human rather than silently accepted.
The human stays in the loop, on purpose
We do not auto-submit anything. The extraction is a draft the mariner reviews. The win is not "the AI did it"; the win is that the mariner goes from a blank form and a drawer full of letters to a pre-filled draft they correct in a minute instead of transcribing for twenty. The model does the tedious reading; the human does the judgment.
This is the same philosophy we apply everywhere we put a model in production: the AI shows up where a human is otherwise stuck doing transcription, and a deterministic layer checks its work. The model is fast and tireless and occasionally wrong; the validation and the human review are what make it trustworthy.
What I would tell another team building document AI
If you have a pile of messy real-world documents you wish were structured data, [we have built this before](https://bluewaveprojects.com/booking).
More from BlueWave
RoomPlan vs Matterport vs Polycam: which one belongs in your contractor's toolkit?
8 min
Hawaii complianceHawaii GET tax for contractors: how the §237-13(3)(B) sub-deduction actually works
6 min
WorkflowHow to scope a renovation in 60 seconds (and why your hand-written estimate keeps losing jobs)
5 min