The Preflight Checklist Your Procurement Team Has Never Run
Why AI vendor risk demands a pilot's cognitive posture, not a compliance officer's checklist, and what that looks like in practice.
Jason Walker
State CISO, Florida
Picture this: a helicopter pilot walks to the aircraft, glances at the maintenance log, and asks the crew chief, "Is this thing safe?" The crew chief says, "Passed inspection last quarter." The pilot climbs in and spins up the rotors.
No professional aviator operates that way. Not once. Not ever.
And yet that is almost exactly how most organizations assess third-party AI risk right now. A vendor submits a SOC 2 report dated eight months ago, someone in procurement checks a box, and an autonomous agent with access to sensitive customer records gets greenlighted into production. We have handed the controls to a system we do not fully understand, and our "preflight" was a document review.
The aviation metaphor is not decorative here. It points at a cognitive problem that sits underneath every technical and policy conversation about AI risk, and until organizations address it at that level, no framework is going to save them.
What Most People Get Wrong
The dominant assumption in vendor risk management is that the goal is to determine whether a vendor is safe or unsafe. Safe vendors get approved. Unsafe vendors get rejected or remediated. The compliance team runs a questionnaire, scores the responses, and produces a rating. Leadership sees green, yellow, or red. Decision made.
This model worked tolerably well when the thing being assessed was a payroll processor or a cloud storage provider. Those systems fail in predictable ways. A database goes down. An API returns an error code. The failure is detectable, often immediate, and bounded.
AI systems do not fail that way. A large language model integrated into a customer service workflow does not crash when it goes wrong. It continues operating. It produces outputs that look plausible, sometimes are plausible, and occasionally are dangerously wrong in ways that only surface weeks later in a legal filing or a regulatory inquiry. The failure mode is stochastic, probabilistic, and deeply context-dependent. A point-in-time assessment cannot capture that. A questionnaire score cannot represent it.
The compliance mindset asks: did the vendor pass? The risk quantification mindset asks: what does it cost when this specific thing fails, and how often should I expect that to happen?
Those are completely different questions. And most procurement teams, most legal departments, and most executives have never been trained to ask the second one.
The Cognitive Posture That Actually Works
Managing 35 state agencies, I watch this play out constantly. An agency wants to deploy an AI-driven document processing tool. The vendor has solid references, a clean security posture on paper, and an enthusiastic sales team. Someone from the agency's IT shop asks me if we should approve it.
My answer is always a set of questions, not a rating. What data does this system touch? What is the financial exposure if that data is reconstructed by a third party? What is the operational cost if this tool hallucinates on a benefits determination and that determination gets acted on before a human catches it? What is the remediation timeline if we need to pull this out of a workflow at scale? What is the regulatory penalty structure if this tool produces a biased output in a protected classification decision?
None of those questions have yes or no answers. They all have dollar figures attached to them, or they should. And here is where the pilot analogy earns its keep.
A preflight checklist does not ask "is this aircraft safe?" It assigns a go/no-go threshold to each specific system. Hydraulic pressure: within range or not. Rotor blade condition: within tolerance or not. Fuel quantity: above minimums or not. Each item has a measurable criterion and a defined consequence for failure. The pilot accepts residual risk not by ignoring it, but by knowing its shape, its likelihood, and its cost. Situational awareness is not the absence of risk. It is the precise mapping of risk so that decisions can be made with integrity.
That is the cognitive posture that AI risk demands, and it is far more familiar to anyone with a military or aviation background than to the average compliance professional. In the cockpit, vague assurances are a liability. Quantified parameters are the only currency that matters.
What This Looks Like in Practice
When my team evaluates an AI vendor now, we tier the assessment by consequence, not by vendor size or contract value. A productivity tool that embeds a writing assistant into an email client gets a lightweight review focused on identity management and data routing. A tool that makes or informs eligibility decisions for state services gets a different treatment entirely: structured analysis of failure cost by category, human-in-the-loop control verification, and a defined financial threshold above which the risk is not acceptable regardless of vendor reputation.
That last piece matters. "Not acceptable regardless of vendor reputation" is a sentence most procurement conversations never reach, because the frame is always approval or rejection, not quantified exposure against an explicit tolerance. Defining a tolerance forces executives to have a real conversation about appetite. It surfaces disagreements that a green-yellow-red rating buries.
On the research side, studying FAIR quantification methods at the doctoral level, the pattern I keep returning to is how rarely organizations define loss event frequency before they define controls. They build the fence, then argue about whether the fence is tall enough, never having agreed on what they are keeping out or how often it tries to get in. Quantification disciplines that process. It forces specificity. It makes the implicit explicit, which is the only way to have an honest risk conversation with a board or a legislative oversight committee.
What You Should Do Differently Starting Now
Stop asking your AI vendors if they are secure. Start asking your own teams what a specific failure event costs you. Build that number before the vendor conversation starts, so you are negotiating against a defined position rather than a vague sense of discomfort.
For each AI integration you are currently running or evaluating, define three things: the most likely failure mode, the financial exposure attached to that failure, and the threshold above which you would pull the system from production. If you cannot answer all three, you are not ready to deploy the system. You are not ready to approve the vendor. And you are, to push the metaphor one final time, climbing into an aircraft where nobody ran the preflight.
The framework exists. The methodology exists. The missing ingredient is the will to ask precise questions instead of comfortable ones.
Pilots call that airmanship. Risk professionals should start calling it by the same name.