Topics covered in this article:
Problem Statement
Pod(s) stuck in init phase while installing or upgrading Director OnPrem setup. This problem might also be encountered when the pods are restarted.
Symptoms
Ingester Pod unable to come to ready state(i.e stuck in 0/1) due to which Ruler pod will be stuck in init
state.
NAME READY STATUS RESTARTS AGE
alertmanager-7d856964c-q4gz4 1/1 Running 0 12m
alertstore-b8dfc554d-7j7nd 1/1 Running 0 12m
alertstore-tablemanager-5b69b4b975-72767 1/1 Running 0 12m
cassandra-0 1/1 Running 0 12m
chat-server-cd9c7f54c-wgk6m 1/1 Running 0 12m
cloud-agent-5df98757c7-hrvp5 1/1 Running 0 12m
configs-65b5b67ff8-wj4bt 1/1 Running 0 12m
configs-db-985874f5b-2n2qc 1/1 Running 0 12m
consul-8658594966-h8769 1/1 Running 0 12m
directoronprem-nginx-ingress-controller-9fk4s 1/1 Running 0 12m
directoronprem-nginx-ingress-default-backend-7cc5ddb957-89ghm 1/1 Running 0 12m
directoronprem-nginx-ingress-default-backend-7cc5ddb957-ppwb5 1/1 Running 0 12m
distributor-5b96cfc474-cjxft 1/1 Running 0 12m
elastalert-779f895f-ssnwp 1/1 Running 0 12m
ingester-796876d6c-skhmt 0/1 Running 0 12m
maya-grafana-f64698964-w2l66 2/2 Running 0 12m
maya-io-54464cb789-jvrjf 1/1 Running 0 12m
maya-ui-69ff7587fd-v5489 1/1 Running 0 12m
memcached-64f94cd599-qg7wh 1/1 Running 0 12m
mysql-0 2/2 Running 0 12m
od-elasticsearch-logging-0 1/1 Running 0 12m
od-kibana-logging-96ccc7f6c-b8kp2 1/1 Running 0 12m
querier-647b5c8f7d-nlvff 1/1 Running 0 12m
ruler-5df869984b-wv8lj 0/1 Init:3/6 0 4m9s
table-manager-6457fdddb4-4k5sc 1/1 Running 0 12m
Describing the ingester pod throws 503 error.
kubectl describe pod ingester-796876d6c-skhmt -n <director_namespace>
Sample Output:
Warning Unhealthy 9s kubelet, gke-mayavirtual-test-default-pool-7ea56867-7ogk Readiness probe failed: HTTP probe failed with statuscode: 503
Troubleshooting
It is sometimes observed that during install or post upgrade of Director OnPrem ingester pod does not come in ready state and the other related pods (in the above case ruler-5df869984b-wv8lj ) will be in init state.
In such a situation the consule pod needs to be restarted. Execute:
kubectl delete pod <pod_name> -n <director_namespace>
Example Command:
kubectl delete pod consul-8658594966-h8769 -n director
In case the problem still persists, please feel free to contact our support.