Pod and workload triage
| Scenario | What it usually means | What to check next |
|---|---|---|
| Pod unhealthy but node healthy | The problem is often local to the workload, configuration or image. | Review pod details, owner workload, image tag and recent events. |
| Pod warnings but restart count stable | Warnings may be historical or low impact. | Compare warning age with current readiness before escalating. |
| Desired exceeds ready | Rollout may be blocked, partial or recovering slowly. | Open workload details and review the owning namespace and related events. |
WarningRepeated warnings matter less than restart growth, readiness loss or a widening gap between desired and ready replicas.
Operator actionMove from pod details to workload details before concluding that the issue is isolated. Owner-level context often explains image, policy or rollout behavior.
Node and scheduling review
| Scenario | Likely cause | Follow-up |
|---|---|---|
| Node pressure without pod failure | The node is stressed but workloads may still be serving. | Review allocatable values, recent scheduling events and whether only one namespace is affected. |
| Scheduling issues with healthy images | Placement or capacity is usually more likely than registry failure. | Inspect node conditions, taints, namespace activity and related events. |
| One node stands out in alerts | The issue may be localized to runtime, pressure or an ownership hot spot. | Use related object tracing to identify which workloads cluster on that node. |
ExampleA node can remain Ready while memory pressure appears in conditions. In that case, use recent events and allocatable values together before deciding whether the issue is transient or persistent.
Events and access-related signals
| Scenario | Interpretation | Next step |
|---|---|---|
| Many warnings, little state change | The event stream is noisy but object health may be stable. | Filter by namespace or source in Observability and compare against current object state. |
| Access refusal with correct response shape | Routing is likely functioning and the denial is policy-driven. | Check WS110 / WS111 / WS112 access behavior and review access-related events. |
| Related object exists but details are limited | The view may show only the most relevant fields for the object class. | Cross-check the namespace, owner and recent events before escalating. |
Escalation cues
- Escalate to platform when readiness drops across multiple workloads or nodes.
- Escalate to registry owners when image metadata is inconsistent across consumers.
- Escalate to security or access owners when policy denials are repeated, unexpected or affect standard entry paths.
- Escalate to networking when the response shape itself is wrong or the expected endpoint is not reached.