IBM Cloud Docs
Troubleshooting Red Hat OpenShift problems

Troubleshooting Red Hat OpenShift problems

Use the troubleshooting information to diagnose and fix problems with Red Hat® OpenShift®.

RHEL subscription

If you encounter a problem with your subscription, use the following command to run a query.

subscription-manager list --available --all

Load-balancer

To check that load balancing is working, use the following command from the bastion node. Load balancing is successful if the result is a full set of headers.

wget --no-check-certificate https://api.ocp.dallas.ibm.local:6443

Red Hat CoreOS

You do not need to connect to the node by using SSH, but if you need to, you can do so through the bastion node. The following example connects to the bootstrap server by using the following command:

ssh core@192.168.133.9

If you get an error, then try the following commands that disable the checking of the fingerprint:

ssh -o StrictHostKeyChecking=no core@192.168.133.9

This example shows how to connect to the control-plane node from the bastion node to view the logs and change the permission to make them readable. The directory name might be different.

ssh -i /root/.ssh/id_rsa core@192.168.133.12
sudo su
chmod 777 /var/log/pods/b2810e842791d83d48a4684295b7cd01/etcd-member/0.log

This example shows how to download the log to the bastion node and then on to the jump-server or remote device for readability and the ability to parse the logs. The directory name might be different.

scp -i /root/.ssh/id_rsa core@192.168.133.10:/var/log/pods/b2810e842791d83d48a4684295b7cd01/etcd-member/0.log 0.log

Red Hat OpenShift

Gets a list of nodes and their status:

oc get nodes

When you use kubectl, a preference takes effect while it determines which kubeconfig file is used.

  • use the --kubeconfig flag, if specified.
  • use the KUBECONFIG environment variable, if specified.
  • use the $HOME/.kube/config file.

To export the kubeconfig that is created by the Red Hat OpenShift Installer to an environment variable, use the following command:

export KUBECONFIG=/opt/ocpinstall/auth/kubeconfig

Deleting deployment

If you encounter a problem with your Terraform deployment, you can delete your deployment with the following command.

terraform destroy

In some cases, you might have issues with Terraform to finish the automation. In these cases, you might need to delete your deployment manually by using vCenter and by deleting the Terraform state files.

  1. Remove the ocp folder and its contents in vCenter.
  2. Remove the ocp resource group.
  3. Remove the rm /opt/ocpinstall/installer/upi/vsphere/terraform.tfstate Terraform state file.

After these steps, you can fix your deployment issue and redeploy the Red Hat OpenShift platform.

Generating new ignition files

Your ignition files are valid for 24 hours. You can generate the .ign files by completing the following steps:

  1. Remove old state, configuration, and ign files:

    cd /opt/ocpinstall
    rm -R .openshift_install.log .openshift_install_state.json auth *.ign metadata.json
    
  2. Copy Red Hat OpenShift the install-config backup to yaml:

    cp install-config.bak install-config.yaml
    openshift-install create ignition-configs --dir=/opt/ocpinstall/
    
  3. Copy bootstrap.ign to nginx home folder.

    cp bootstrap.ign /usr/share/nginx/html
    
  4. Replace the primary section cat master.ign and the worker section cat worker.ign in terraform.tfvars. nano /opt/ocpinstall/installer/upi/vsphere/terraform.tfvars:

    // Ignition config for the control plane machines. You should copy the contents of the master.ign generated by$
    control_plane_ignition = <<END_OF_MASTER_IGNITION
    <replace with new master.ign>
    END_OF_MASTER_IGNITION
    
    // Ignition config for the compute machines. You should copy the contents of the worker.ign generated by the i$
    compute_ignition = <<END_OF_WORKER_IGNITION
    <replace with new worker.ign>
    END_OF_WORKER_IGNITION
    

Taking a snapshot of Red Hat OpenShift

You might want to stop and resume Red Hat OpenShift Cluster VMs during development or testing. You must consider the following before you take a snapshot.

During the installation of Red Hat OpenShift 4.x clusters, a bootstrap certificate is created that is used on the control-plane nodes to create certificate signing requests (CSRs) for kubelet client certificates (one for each node or kubelet). This certificate is used to identify each kubelet on any node. Because these certificates cannot be revoked, they are made with a short expiration time of 24 hours after cluster installation. All nodes other than the control-plane nodes have a service account token that is revocable. The bootstrap certificate is valid only for 24 hours after cluster installation. After the initial 24 hours, the certificate expires every 30 days.

The first control-plane kubelet lasts for 24 hours before it is re-created. If you are taking a snapshot immediately after deployment, the control-plane kubelet does not yet have a 30-day client certificate. Then, the missing kubelet client certificate refresh window renders the cluster unusable because the bootstrap credential cannot be used when the cluster is back up. Practically, this process requires an Red Hat OpenShift 4 cluster to be running for at least 25 hours after installation before it can be shut down.

You can check the validity of the certificate by running the following command in the bastion host after deployment:

ssh -i ~/.ssh/id_rsa -o StrictHostKeyChecking=no core@192.168.133.10 -- sudo openssl x509 -text -noout -in /var/lib/kubelet/pki/kubelet-client-current.pem

Run the following command to check the lifetime of the certificate of the output.

Warning: Identity file id_rsa_crc not accessible: No such file or directory.
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            6a:73:78:19:f3:1e:8f:0c:a9:51:b8:53:f4:eb:29:8d:49:fa:7e:fd
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN = kube-csr-signer_@1573016574
        Validity
            Not Before: Nov  6 08:22:00 2019 GMT
            Not After : Dec  6 04:57:43 2019 GMT

For more information about shutting down the cluster after installation, see Enabling Red Hat OpenShift 4 Clusters to Stop and Resume Cluster VMs.

After the initial 24 h certificate renewal, the cluster snapshot is enabled to resume at any time in the next 30 days. After the 30 days, the certificate validity will make the cluster snapshot unusable.