Skip to content
Search

Structured Problem Solving

The discipline: you don’t reach “solution” until you’ve found the root cause — not the first plausible cause, the root cause. The systemic condition that, if fixed, prevents recurrence. The tools in this page enforce that discipline.

A one-page problem-solving document (named for 11×17” paper). The page constraint forces disciplined thinking before implementation.

Eight sections:

SectionWhat it requires
1. Theme / Problem StatementQuantified. “Zone 3 mis-pick rate 1.8% vs ≤0.5% target, $2,400/week in re-pick cost”
2. BackgroundWhy it matters to the operation; client SLA; risk if trend continues
3. Current ConditionData and process map. Not anecdote — pick error counts by picker, SKU, shift, day
4. Root Cause AnalysisCannot proceed to countermeasures until this section is complete and agreed
5. CountermeasuresSpecific actions, not aspirations. Named, dated. “Separate SKUs A/B by ≥36 in by [date]“
6. Implementation PlanNamed individuals, specific dates. No collective nouns
7. Results VerificationCompare post-implementation data to baseline. Build a checkpoint into the plan
8. Follow-Up ActionsStandardize what worked; identify next PDCA cycle for residual gap

When to use: When the same problem appears in the Tier 2 huddle for the third time. The discipline of completing sections 1–4 before touching section 5 is what separates A3 thinking from a meeting with notes.

Half an A3 is not an A3 — it’s an email with a fancy template.

The trap: stopping at the proximate cause (usually Why 3), producing a local fix that doesn’t prevent the next occurrence.

Worked example — Zone 3 mis-picks:

WhyAnswerType of fix
1. Why did picks go wrong?Pickers selected wrong item from adjacent slotObservable; most teams stop here
2. Why wrong item?Slot labels for SKUs A/B are similar, 6 inches apart on same shelfProximate cause; most DC problem-solving stops here
3. Why adjacent slots?Similar-looking SKUs were slotted adjacentGetting closer
4. Why no separation?Q1 slotting logic had no check for item similarity before finalizing assignmentsProcess failure
5. Why no similarity check?Slotting process designed 5 years ago for velocity/cube; look-alike risk not identified because SKU proliferation added 340 new items to Zone 3 in 18 monthsRoot cause

Fix at Why 2: Better labels, separate these two SKUs — 2 hours of work, helps Zone 3 for these two SKUs only. Fix at Why 5: Add a look-alike proximity check to the slotting process for every zone — one day of work, prevents this class of error across the building for every future slotting event.

That is the difference between a local fix and a systemic fix.

Fishbone (Ishikawa) — DC-Adapted Categories

Section titled “Fishbone (Ishikawa) — DC-Adapted Categories”

Brainstorming tool for root cause generation. Forces consideration of multiple cause categories before selecting which to investigate. Prevents anchoring on the first plausible explanation.

DC-adapted 6 categories (vs. manufacturing 6M):

CategoryDC Examples
ManpowerTraining gaps on seasonal temps; insufficient QC staffing during surge; supervisor vacation coverage with junior leads
MethodScan-verify step not consistently followed during surge; wave release timing creating a rush; no standard work for look-alike zones
MaterialSKU packaging changed — three items now look identical; item master dimensions not updated
MachineRF scanner batteries dying mid-shift causing scan bypasses; print-and-apply applying labels at 3° skew → barcodes unscannable → manual override entries
MeasurementReturns classified by customer-stated reason (“wrong item”) rather than root cause (mis-pick vs. mis-ship vs. customer error) — masks whether this is a pick or pack problem
EnvironmentPoor lighting in Zone 4 creating read errors on look-alike packaging; 89°F in pack area during June affecting concentration

The rule: The fishbone generates hypotheses — it does not solve problems. Build it in the meeting room; verify it on the floor. Branches with the most data support → inputs to 5 Whys.

Always run the Pareto. If you’re having a CI prioritization conversation and nobody has a Pareto on the table, stop the meeting and go build one.

The 80/20 principle is empirically reliable enough in DC operations to treat as a starting assumption.

DC applications:

Pareto ofWhat it reveals
Mis-picks by SKUTypically 15–20 SKUs generate 70–80% of all mis-picks → targeted intervention only on those
Customer complaints by reasonPrioritizes between process failures (wrong item vs. missing item are different root causes)
Emergency replenishment trips by zoneZone generating 3× the trips has a slotting problem (face qty, min/max triggers, or velocity shift)
Overtime hours by functionFunction generating 55% of total overtime has a capacity or process mismatch not reflected in the schedule

Failure Mode and Effects Analysis is the proactive tool — use it before a new process goes live, not after the failure occurs.

Structure for each process step:

RPN = Occurrence (O) × Severity (S) × Detection (D) — each rated 1–10

  • O: How likely the failure is to occur
  • S: How bad the downstream effect
  • D: How likely to be caught before causing damage (10 = nearly impossible to detect)

High-RPN DC examples:

Failure ModeOSDRPNMitigation
WMS location mismatch during rack move685240Cycle count validation before rack move is released
Barcode scan bypass during scanner fault774196Scanner fault auto-triggers zone hold
Look-alike SKU slotted adjacent566180Proximity check in slotting SOP

Best FMEA applications in DC:

  • New automation system go-live
  • Slotting redesigns moving large numbers of SKUs simultaneously
  • New carrier onboarding (label spec mismatch)
  • Seasonal surge prep (temp workforce doubles in 3 weeks)

Corrective Action and Preventive Action is the formal handoff from investigation to accountability.

Trigger: Root cause is agreed upon via 5 Whys or A3.

Two elements CAPA adds:

  1. Named owner — not “the team,” not “operations,” a specific person
  2. Verification step — someone independently confirms the action was taken and that it worked

“Did we do the action?” and “did the action fix the problem?” are different questions. The second one is the one that matters.

The CAPA tracker (maintained by CI engineer) lists: problem, root cause, action, owner, due date, verification method.

  • Actions open at 30-day follow-up → on CAPA tracker
  • Actions open at 60 days → escalate to DC manager agenda
  • Actions open at 90 days → structural escalation

See Tier Huddle System for how CAPA tracker connects to the daily management cadence.

Basic content

Subscribe to read the rest

This article is part of our Basic library — practitioner-level guidance, frameworks, and decision tools written from real projects.

$9/mo Basic · $13/mo Pro · cancel anytime