Structured Problem Solving

The discipline: you don’t reach “solution” until you’ve found the root cause — not the first plausible cause, the root cause. The systemic condition that, if fixed, prevents recurrence. The tools in this page enforce that discipline.

A3 Thinking

A one-page problem-solving document (named for 11×17” paper). The page constraint forces disciplined thinking before implementation.

Eight sections:

Section	What it requires
1. Theme / Problem Statement	Quantified. “Zone 3 mis-pick rate 1.8% vs ≤0.5% target, $2,400/week in re-pick cost”
2. Background	Why it matters to the operation; client SLA; risk if trend continues
3. Current Condition	Data and process map. Not anecdote — pick error counts by picker, SKU, shift, day
4. Root Cause Analysis	Cannot proceed to countermeasures until this section is complete and agreed
5. Countermeasures	Specific actions, not aspirations. Named, dated. “Separate SKUs A/B by ≥36 in by [date]“
6. Implementation Plan	Named individuals, specific dates. No collective nouns
7. Results Verification	Compare post-implementation data to baseline. Build a checkpoint into the plan
8. Follow-Up Actions	Standardize what worked; identify next PDCA cycle for residual gap

When to use: When the same problem appears in the Tier 2 huddle for the third time. The discipline of completing sections 1–4 before touching section 5 is what separates A3 thinking from a meeting with notes.

Half an A3 is not an A3 — it’s an email with a fancy template.

5 Whys — All 5 Levels Deep

The trap: stopping at the proximate cause (usually Why 3), producing a local fix that doesn’t prevent the next occurrence.

Worked example — Zone 3 mis-picks:

Why	Answer	Type of fix
1. Why did picks go wrong?	Pickers selected wrong item from adjacent slot	Observable; most teams stop here
2. Why wrong item?	Slot labels for SKUs A/B are similar, 6 inches apart on same shelf	Proximate cause; most DC problem-solving stops here
3. Why adjacent slots?	Similar-looking SKUs were slotted adjacent	Getting closer
4. Why no separation?	Q1 slotting logic had no check for item similarity before finalizing assignments	Process failure
5. Why no similarity check?	Slotting process designed 5 years ago for velocity/cube; look-alike risk not identified because SKU proliferation added 340 new items to Zone 3 in 18 months	Root cause

Fix at Why 2: Better labels, separate these two SKUs — 2 hours of work, helps Zone 3 for these two SKUs only. Fix at Why 5: Add a look-alike proximity check to the slotting process for every zone — one day of work, prevents this class of error across the building for every future slotting event.

That is the difference between a local fix and a systemic fix.

Fishbone (Ishikawa) — DC-Adapted Categories

Brainstorming tool for root cause generation. Forces consideration of multiple cause categories before selecting which to investigate. Prevents anchoring on the first plausible explanation.

DC-adapted 6 categories (vs. manufacturing 6M):

Category	DC Examples
Manpower	Training gaps on seasonal temps; insufficient QC staffing during surge; supervisor vacation coverage with junior leads
Method	Scan-verify step not consistently followed during surge; wave release timing creating a rush; no standard work for look-alike zones
Material	SKU packaging changed — three items now look identical; item master dimensions not updated
Machine	RF scanner batteries dying mid-shift causing scan bypasses; print-and-apply applying labels at 3° skew → barcodes unscannable → manual override entries
Measurement	Returns classified by customer-stated reason (“wrong item”) rather than root cause (mis-pick vs. mis-ship vs. customer error) — masks whether this is a pick or pack problem
Environment	Poor lighting in Zone 4 creating read errors on look-alike packaging; 89°F in pack area during June affecting concentration

The rule: The fishbone generates hypotheses — it does not solve problems. Build it in the meeting room; verify it on the floor. Branches with the most data support → inputs to 5 Whys.

Pareto Analysis

Always run the Pareto. If you’re having a CI prioritization conversation and nobody has a Pareto on the table, stop the meeting and go build one.

The 80/20 principle is empirically reliable enough in DC operations to treat as a starting assumption.

DC applications:

Pareto of	What it reveals
Mis-picks by SKU	Typically 15–20 SKUs generate 70–80% of all mis-picks → targeted intervention only on those
Customer complaints by reason	Prioritizes between process failures (wrong item vs. missing item are different root causes)
Emergency replenishment trips by zone	Zone generating 3× the trips has a slotting problem (face qty, min/max triggers, or velocity shift)
Overtime hours by function	Function generating 55% of total overtime has a capacity or process mismatch not reflected in the schedule

FMEA — Getting Ahead of Failures

Failure Mode and Effects Analysis is the proactive tool — use it before a new process goes live, not after the failure occurs.

Structure for each process step:

RPN = Occurrence (O) × Severity (S) × Detection (D) — each rated 1–10

O: How likely the failure is to occur
S: How bad the downstream effect
D: How likely to be caught before causing damage (10 = nearly impossible to detect)

High-RPN DC examples:

Failure Mode	O	S	D	RPN	Mitigation
WMS location mismatch during rack move	6	8	5	240	Cycle count validation before rack move is released
Barcode scan bypass during scanner fault	7	7	4	196	Scanner fault auto-triggers zone hold
Look-alike SKU slotted adjacent	5	6	6	180	Proximity check in slotting SOP

Best FMEA applications in DC:

New automation system go-live
Slotting redesigns moving large numbers of SKUs simultaneously
New carrier onboarding (label spec mismatch)
Seasonal surge prep (temp workforce doubles in 3 weeks)

CAPA — Closing the Loop

Corrective Action and Preventive Action is the formal handoff from investigation to accountability.

Trigger: Root cause is agreed upon via 5 Whys or A3.

Two elements CAPA adds:

Named owner — not “the team,” not “operations,” a specific person
Verification step — someone independently confirms the action was taken and that it worked

“Did we do the action?” and “did the action fix the problem?” are different questions. The second one is the one that matters.

The CAPA tracker (maintained by CI engineer) lists: problem, root cause, action, owner, due date, verification method.

Actions open at 30-day follow-up → on CAPA tracker
Actions open at 60 days → escalate to DC manager agenda
Actions open at 90 days → structural escalation

See Tier Huddle System for how CAPA tracker connects to the daily management cadence.

Basic content

Subscribe to read the rest

This article is part of our Basic library — practitioner-level guidance, frameworks, and decision tools written from real projects.

Start a 3-day free trial See plans & pricing

$9/mo Basic · $13/mo Pro · cancel anytime