The Questions That Come Before the Platform
If a skills score touches pay or promotion, you're legally running a selection procedure, whether anyone told you or not. The questions I'd ask before buying anything.
☕ 6-minute read
This week, we took apart the quiz, then the system that the quiz lives in. Those who have replied kept asking a pretty similar question: Okay, so what do we do instead?
It’s a very fair question. Two critique pieces in a row earn that. So this one is the practical turn: the questions I’d want answered before any skills measurement gets built or bought, and the order I’d work them in.
I’ll be upfront about where this comes from. I once helped spend months on a skills taxonomy before anyone in the room asked what decision it would change. Nobody could name one. That project taught me more about skills strategy than the taxonomy ever did.
🧭 TL;DR
The first question is never “how do we measure skill?” It’s “What decision will this measurement change?” If the answer is a nicer dashboard, you have a reporting project, not a skills strategy.
Rigor should match stakes. The moment a skills score touches pay, promotion, or hiring, it legally becomes a selection procedure with a much higher burden of proof. Almost nobody in L&D checks this.
Inferred skills are not validated skills. Put a human confirmation gate between what the AI guesses and any decision that matters, and build manager observation muscle before you depend on it.
🎯 The question that reorders everything
What decision does this measurement feed, and what does it cost to be wrong?
That one question sorts the whole field. Recommending a learning path is low-stakes. If the inference engine guesses wrong, someone gets a mediocre course suggestion. Fuzzy data is fine there, and the AI-driven tools are honestly good at it.
Deciding pay, promotion, or who stays in a restructure is a different universe. Same skills data, completely different burden. Most of the skills conversations I’ve sat in on never separated the two, and that blur is where programs quietly become dangerous.
So before the demo, before the RFP, I’d make everyone in the room finish this sentence: “Because of this data, we will decide ___ differently.” No answer, no project.
⚖️ The legal line almost nobody checks
Here’s the part that surprised me most when I went digging, and I’ve been doing this work a long time.
Under US employment law, any assessment that gates a consequential decision is a selection procedure. If it produces an adverse impact, the employer has to show it’s job-related and validated. That’s not a new law; it dates to 1978, but skills platforms have made it newly easy to trip over. An AI infers a proficiency score; that score feeds into a promotion decision, and congratulations, your L&D tool is now a selection instrument that has never been validated.
And the risk didn’t go away when federal enforcement softened this year. It moved to private lawsuits and a growing patchwork of state laws. There’s a case currently working through federal court in which the software vendor itself, not just the employer, is being held liable for discriminatory screening. Buying from a big name doesn’t transfer the risk.
I’m not a lawyer, and this isn’t legal advice. The takeaway is simpler than the law: know which of your skills measure consequential decisions, and treat those few with a completely different level of rigor. Development data can be fuzzy. Decision data cannot.
🎮 Two failure modes to design against
Even with the stakes sorted, two predictable things break these systems.
The first is gaming. There’s an old rule in measurement: when a measure becomes a target, it stops being a good measure. The moment a skills score drives budget or headcount, every manager has a reason to inflate it, every team has a reason to teach to it. We’ve all watched completion rates get gamed. Skills scores are next, and the more money attached, the faster it happens.
The second is causation. When the dashboard shows skills went up and the KPI improved, we declare victory. But almost nobody runs a comparison group, so “the training drove the result” is usually a correlation wearing a results-shaped costume. The methods that fix this are old, economists were using them in training programs in the 1970s. We just don’t use them. Even a single matched comparison group for your flagship program puts you ahead of most of the field.
🚪 The gate, and the muscle
Two process moves carry most of the weight here.
First, the gate. AI-inferred skills should never flow straight into a consequential decision. The cleanest pattern I’ve seen in the current platforms: inferred skills land in a holding state, and a human has to confirm them before anything downstream can use them. Whatever system you run, you can insist on that gate. An inference is a hypothesis. Confirmed data.
Second, the muscle. Every one of these systems quietly assumes a manager who can watch someone work, tell the difference between good and great, and bridge the gap. We covered this in the last piece, and it bears repeating as a sequencing rule: build manager observation and coaching practice before you depend on it, with real reps and feedback, not a module. The gate only works if the human in it knows what they’re looking at.
💡 What This All Means
The honest sequence, the one I’d run if I were standing this up today: name the decision, sort it by stakes, put a confirmation gate between inference and consequence, build the manager muscle, and only then go shopping for the platform.
One more thing, because I think it’s where credibility lives. The field doesn’t yet have a settled standard for most of this. The stakes-tiering idea is borrowed from medical education. There’s no authoritative playbook for skills data governance. Anyone who tells you otherwise is selling something. Saying “here’s what we know, here’s what we’re borrowing, here’s what nobody has proven yet” out loud is what separates a leader from a brochure.
We spent two weeks pulling apart what doesn’t work. This is the part we get to build.
🔧 From the workbench
These questions are becoming the intake layer of the L&D AI Operating System I’ve been building, every measurement request starts with “what decision changes” before anything gets designed. More on that soon.
If someone forwarded this to you, the full Learning, Upgraded newsletter is at learningupgraded.com. This wraps the three-part skills measurement series, and the replies have been shaping where it goes next.
What’s the last decision your skills data changed at your org?
—Eian


