Troubleshooting Red Hat OpenShift problems
Use the troubleshooting information to diagnose and fix problems with Red Hat® OpenShift®.
RHEL subscription
If you encounter a problem with your subscription, use the following command to run a query.
subscription-manager list --available --all
Load-balancer
To check that load balancing is working, use the following command from the bastion node. Load balancing is successful if the result is a full set of headers.
wget --no-check-certificate https://api.ocp.dallas.ibm.local:6443
Red Hat CoreOS
You do not need to connect to the node by using SSH, but if you need to, you can do so through the bastion node. The following example connects to the bootstrap server by using the following command:
ssh core@192.168.133.9
If you get an error, then try the following commands that disable the checking of the fingerprint:
ssh -o StrictHostKeyChecking=no core@192.168.133.9
This example shows how to connect to the control-plane node from the bastion node to view the logs and change the permission to make them readable. The directory name might be different.
ssh -i /root/.ssh/id_rsa core@192.168.133.12
sudo su
chmod 777 /var/log/pods/b2810e842791d83d48a4684295b7cd01/etcd-member/0.log
This example shows how to download the log to the bastion node and then on to the jump-server or remote device for readability and the ability to parse the logs. The directory name might be different.
scp -i /root/.ssh/id_rsa core@192.168.133.10:/var/log/pods/b2810e842791d83d48a4684295b7cd01/etcd-member/0.log 0.log
Red Hat OpenShift
Gets a list of nodes and their status:
oc get nodes
When you use kubectl, a preference takes effect while it determines which kubeconfig file is used.
- use the
--kubeconfig
flag, if specified. - use the KUBECONFIG environment variable, if specified.
- use the
$HOME/.kube/config
file.
To export the kubeconfig that is created by the Red Hat OpenShift Installer to an environment variable, use the following command:
export KUBECONFIG=/opt/ocpinstall/auth/kubeconfig
Deleting deployment
If you encounter a problem with your Terraform deployment, you can delete your deployment with the following command.
terraform destroy
In some cases, you might have issues with Terraform to finish the automation. In these cases, you might need to delete your deployment manually by using vCenter and by deleting the Terraform state files.
- Remove the
ocp
folder and its contents in vCenter. - Remove the
ocp
resource group. - Remove the
rm /opt/ocpinstall/installer/upi/vsphere/terraform.tfstate
Terraform state file.
After these steps, you can fix your deployment issue and redeploy the Red Hat OpenShift platform.
Generating new ignition files
Your ignition files are valid for 24 hours. You can generate the .ign
files by completing the following steps:
-
Remove old state, configuration, and
ign
files:cd /opt/ocpinstall rm -R .openshift_install.log .openshift_install_state.json auth *.ign metadata.json
-
Copy Red Hat OpenShift the
install-config
backup to yaml:cp install-config.bak install-config.yaml openshift-install create ignition-configs --dir=/opt/ocpinstall/
-
Copy bootstrap.ign to nginx home folder.
cp bootstrap.ign /usr/share/nginx/html
-
Replace the primary section
cat master.ign
and the worker sectioncat worker.ign
interraform.tfvars
.nano /opt/ocpinstall/installer/upi/vsphere/terraform.tfvars
:// Ignition config for the control plane machines. You should copy the contents of the master.ign generated by$ control_plane_ignition = <<END_OF_MASTER_IGNITION <replace with new master.ign> END_OF_MASTER_IGNITION // Ignition config for the compute machines. You should copy the contents of the worker.ign generated by the i$ compute_ignition = <<END_OF_WORKER_IGNITION <replace with new worker.ign> END_OF_WORKER_IGNITION
Taking a snapshot of Red Hat OpenShift
You might want to stop and resume Red Hat OpenShift Cluster VMs during development or testing. You must consider the following before you take a snapshot.
During the installation of Red Hat OpenShift 4.x clusters, a bootstrap certificate is created that is used on the control-plane nodes to create certificate signing requests (CSRs) for kubelet client certificates (one for each node or kubelet). This certificate is used to identify each kubelet on any node. Because these certificates cannot be revoked, they are made with a short expiration time of 24 hours after cluster installation. All nodes other than the control-plane nodes have a service account token that is revocable. The bootstrap certificate is valid only for 24 hours after cluster installation. After the initial 24 hours, the certificate expires every 30 days.
The first control-plane kubelet lasts for 24 hours before it is re-created. If you are taking a snapshot immediately after deployment, the control-plane kubelet does not yet have a 30-day client certificate. Then, the missing kubelet client certificate refresh window renders the cluster unusable because the bootstrap credential cannot be used when the cluster is back up. Practically, this process requires an Red Hat OpenShift 4 cluster to be running for at least 25 hours after installation before it can be shut down.
You can check the validity of the certificate by running the following command in the bastion host after deployment:
ssh -i ~/.ssh/id_rsa -o StrictHostKeyChecking=no core@192.168.133.10 -- sudo openssl x509 -text -noout -in /var/lib/kubelet/pki/kubelet-client-current.pem
Run the following command to check the lifetime of the certificate of the output.
Warning: Identity file id_rsa_crc not accessible: No such file or directory.
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
6a:73:78:19:f3:1e:8f:0c:a9:51:b8:53:f4:eb:29:8d:49:fa:7e:fd
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = kube-csr-signer_@1573016574
Validity
Not Before: Nov 6 08:22:00 2019 GMT
Not After : Dec 6 04:57:43 2019 GMT
For more information about shutting down the cluster after installation, see Enabling Red Hat OpenShift 4 Clusters to Stop and Resume Cluster VMs.
After the initial 24 h certificate renewal, the cluster snapshot is enabled to resume at any time in the next 30 days. After the 30 days, the certificate validity will make the cluster snapshot unusable.