feat: enhance upgrade guide

This commit is contained in:
kurokobo 2022-02-05 16:59:52 +09:00
parent d9086682cb
commit 041aa9725a
3 changed files with 183 additions and 136 deletions

View file

@ -415,32 +415,25 @@ kubectl apply -f awx-secret-tls.yaml
## Additional Guides ## Additional Guides
- [📁 **Deploy Private Git Repository on Kubernetes**](git) - [📁 **Deploy Private Git Repository on Kubernetes**](git)
- To use AWX with SCM, this repository includes the manifests to deploy [Gitea](https://gitea.io/en-us/). - The guide to use AWX with SCM. This repository includes the manifests to deploy [Gitea](https://gitea.io/en-us/).
- See [📝`git/README.md`](git) for instructions.
- [📁 **Deploy Private Container Registry on Kubernetes**](registry) - [📁 **Deploy Private Container Registry on Kubernetes**](registry)
- To use Execution Environments in AWX (AWX-EE), we have to push the container image built with `ansible-builder` to the container registry. - The guide to use Execution Environments in AWX (AWX-EE).
- If we don't want to push our container images to Docker Hub or other cloud services, we can deploy a private container registry on K3s. - If we want to use our own Execution Environment built with Ansible Builder and don't want to push it to the public container registry e.g. Docker Hub, we can deploy a private container registry on K3s.
- See [📝`registry/README.md`](registry) for instructions.
- [📁 **Deploy Private Galaxy NG on Docker or Kubernetes** (Experimental)](galaxy) - [📁 **Deploy Private Galaxy NG on Docker or Kubernetes** (Experimental)](galaxy)
- Deploy our own Galaxy NG instance. - The guide to deploy our own Galaxy NG instance.
- **Note that the containerized implementation of Galaxy NG is not supported at this time.** - **Note that the containerized implementation of Galaxy NG is not officialy supported at this time.**
- **All information on the page is for development, testing and study purposes only.** - **All information on the guide is for development, testing and study purposes only.**
- See [📝`galaxy/README.md`](galaxy) for instructions.
- [📁 **Use SSL Certificate from Public ACME CA**](acme) - [📁 **Use SSL Certificate from Public ACME CA**](acme)
- To use a certificate from public ACME CA such as Let's Encrypt or ZeroSSL instead of Self-Signed certificate. - The guide to use a certificate from public ACME CA such as Let's Encrypt or ZeroSSL instead of Self-Signed certificate.
- See [📝`acme/README.md`](acme) for instructions.
- [📁 **Use Ansible Builder**](builder) - [📁 **Use Ansible Builder**](builder)
- Use Ansible Builder to build our own Execution Environment. - The guide to use Ansible Builder to build our own Execution Environment.
- See [📝`builder/README.md`](builder) for instructions.
- [📁 **Use Ansible Runner**](runner) - [📁 **Use Ansible Runner**](runner)
- Use Ansible Runner to run playbook using Execution Environment. - The guide to use Ansible Runner to run playbook using Execution Environment.
- See [📝`runner/README.md`](runner) for instructions.
- [📁 **Use Customized Pod Specification for your Execution Environment**](containergroup) - [📁 **Use Customized Pod Specification for your Execution Environment**](containergroup)
- We can customize the specification of the Pod of the Execution Environment using **Container Group**. - The guide to use customized Pod of the Execution Environment using **Container Group**.
- See [📝`containergroup/README.md`](containergroup) for instructions.
- [📁 **Tips**](tips) - [📁 **Tips**](tips)
- [📝Expose `/etc/hosts` to Pods on K3s](tips/expose-hosts.md) - [📝Expose `/etc/hosts` to Pods on K3s](tips/expose-hosts.md)
- [📝Redirect HTTP to HTTPS](tips/https-redirection.md) - [📝Redirect HTTP to HTTPS](tips/https-redirection.md)
- [📝Uninstall deployed resouces](tips/uninstall.md) - [📝Uninstall deployed resouces](tips/uninstall.md)
- [📝Deploy older version of AWX Operator](tips/deploy-older-operator.md) - [📝Deploy older version of AWX Operator](tips/deploy-older-operator.md)
- [📝Upgrade AWX Operator from 0.13.0 or earlier to 0.14.0 or later](tips/upgrade-operator.md) - [📝Upgrade AWX Operator and AWX](tips/upgrade-operator.md)

View file

@ -4,4 +4,4 @@
- [📝Redirect HTTP to HTTPS](https-redirection.md) - [📝Redirect HTTP to HTTPS](https-redirection.md)
- [📝Uninstall deployed resouces](uninstall.md) - [📝Uninstall deployed resouces](uninstall.md)
- [📝Deploy older version of AWX Operator](deploy-older-operator.md) - [📝Deploy older version of AWX Operator](deploy-older-operator.md)
- [📝Upgrade AWX Operator from 0.13.0 or earlier to 0.14.0 or later](upgrade-operator.md) - [📝Upgrade AWX Operator and AWX](upgrade-operator.md)

View file

@ -1,64 +1,194 @@
<!-- omit in toc --> <!-- omit in toc -->
# Upgrade AWX Operator from 0.13.0 or earlier to 0.14.0 or later # Upgrade AWX Operator and AWX
[As described in the documentation](https://github.com/ansible/awx-operator/blob/0.14.0/README.md#v0140), AWX Operator changed from cluster scope to namespace scope in `0.14.0`. Also, the Operator SDK `1.x` is used. This guide provides the procedure for the following three types of upgrading AWX Operator.
This means that upgrading from `0.13.0` or earlier to `0.14.0` or later requires a bit of finesse, such as cleaning the old AWX Operator. - Upgrade from `0.14.0` or later (e.g. from `0.14.0` to `0.15.0`)
- Upgrade from `0.13.0` (e.g. from `0.13.0` to `0.14.0`)
- Upgrade from `0.12.0` or earlier (e.g. from `0.12.0` to `0.13.0`)
Note that once you upgrade AWX Operator, your AWX will also be upgraded automatically to the version bundled with the upgraded AWX Operator as shown below.
| AWX Operator | AWX |
| - | - |
| 0.16.1 | 19.5.1 |
| 0.15.0 | 19.5.0 |
| 0.14.0 | 19.4.0 |
| 0.13.0 | 19.3.0 |
| 0.12.0 | 19.2.2 |
| 0.11.0 | 19.2.1 |
| 0.10.0 | 19.2.0 |
| 0.9.0 | 19.1.0 |
| 0.8.0 | 19.0.0 |
| 0.7.0 | 18.0.0 |
| 0.6.0 | 15.0.0 |
[There is `image_version` parameter for AWX resource to change which image will be used](https://github.com/ansible/awx-operator#deploying-a-specific-version-of-awx), but it appears that using a version of AWX other than the one bundled with the AWX Operator [is currently not supported](https://github.com/ansible/awx-operator#deploying-a-specific-version-of-awx). Conversely, if you want to upgrade AWX, you need to plan to upgrade AWX Operator first.
<!-- omit in toc --> <!-- omit in toc -->
## Table of Contents ## Table of Contents
- [Environment](#environment) - [✅ Take a backup of the old AWX instance](#-take-a-backup-of-the-old-awx-instance)
- [Procedure](#procedure) - [📝 Upgrade from `0.14.0` or later (e.g. from `0.14.0` to `0.15.0`)](#-upgrade-from-0140-or-later-eg-from-0140-to-0150)
- [Take a backup of the old AWX instance](#take-a-backup-of-the-old-awx-instance) - [📝 Upgrade from `0.13.0` (e.g. from `0.13.0` to `0.14.0`)](#-upgrade-from-0130-eg-from-0130-to-0140)
- [Delete the old AWX Operator](#delete-the-old-awx-operator) - [📝 Upgrade from `0.12.0` or earlier (e.g. from `0.12.0` to `0.13.0`)](#-upgrade-from-0120-or-earlier-eg-from-0120-to-0130)
- [(Optional) Delete the old AWX instance](#optional-delete-the-old-awx-instance) - [❓ Troubleshooting](#-troubleshooting)
- [Deploy the new AWX Operator](#deploy-the-new-awx-operator) - [New Pod gets stuck in `Pending` state](#new-pod-gets-stuck-in-pending-state)
- [Wait for the AWX instance to be upgraded](#wait-for-the-awx-instance-to-be-upgraded)
## Environment ## ✅ Take a backup of the old AWX instance
In this case, we will upgrade `0.13.0` to `0.14.0`.
The AWX Operator `0.13.0` resides in the `default` namespace and the related AWX instance resides in the `awx` namespace, as described in this repository prior to `0.13.0`. After the upgrade, everything related to the AWX Operator `0.14.0` will reside in the `awx` namespace.
| Phase / Resource | AWX Operator | AWX Instance |
| ---------------- | ------------------------------- | --------------------------- |
| Before Upgrade | `0.13.0` in `default` namespace | `19.3.0` in `awx` namespace |
| After Upgrade | `0.14.0` in `awx` namespace | `19.4.0` in `awx` namespace |
## Procedure
This is an overview of the procedures to be described in this case.
1. Take a backup of the old AWX instance
2. Delete the old AWX Operator
3. (Optional) Delete the old AWX instance
4. Deploy the new AWX Operator
5. Wait for the AWX instance to be upgraded
### Take a backup of the old AWX instance
Before performing the upgrade, make sure that you have a backup of your old AWX. Before performing the upgrade, make sure that you have a backup of your old AWX.
Refer [📝README: Backing up using AWX Operator](../README.md#backing-up-using-awx-operator) to take backup using AWX Operator. Refer [📝README: Backing up using AWX Operator](../README.md#backing-up-using-awx-operator) to take backup using AWX Operator.
### Delete the old AWX Operator ## 📝 Upgrade from `0.14.0` or later (e.g. from `0.14.0` to `0.15.0`)
First, remove the old AWX Operator that is running in the `default` namespace. In addition, remove Service Account, Cluster Role, and Cluster Role Binding that are required for old AWX Operator to work. If you are using AWX Operator `0.14.0` or later and want to upgrade to newer version, simply, deploy the new version of AWX Operator to the same namespace where the old AWX Operator is running.
```bash ```bash
kubectl delete deployment awx-operator # Prepare required files
kubectl delete serviceaccount awx-operator cd ~
kubectl delete clusterrolebinding awx-operator git clone https://github.com/ansible/awx-operator.git
kubectl delete clusterrole awx-operator cd awx-operator
git checkout 0.15.0 # Checkout the version to upgrade to
# Deploy AWX Operator
export NAMESPACE=awx # Specify the namespace where the old AWX Operator exists
make deploy
```
This will upgrade the AWX Operator first, after that, AWX will be also upgraded as well.
To monitor the progress of the deployment, check the logs of `deployments/awx-operator-controller-manager`:
```bash
kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
```
When the deployment completes successfully, the logs end with:
```txt
$ kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
...
----- Ansible Task Status Event StdOut (awx.ansible.com/v1beta1, Kind=AWX, awx/awx) -----
PLAY RECAP *********************************************************************
localhost : ok=56 changed=0 unreachable=0 failed=0 skipped=35 rescued=0 ignored=0
----------
```
## 📝 Upgrade from `0.13.0` (e.g. from `0.13.0` to `0.14.0`)
If you are using AWX Operator `0.13.0` and want to upgrade to newer version, you should consider the big changes in AWX Operator in `0.14.0`. [As described in the documentation](https://github.com/ansible/awx-operator/blob/0.14.0/README.md#v0140), in `0.14.0`, AWX Operator changed from cluster scope to namespace scope. Also, the Operator SDK `1.x` is used.
This means that upgrading from `0.13.0` to `0.14.0` or later requires a bit of finesse, such as cleaning the old AWX Operator. **If you are using `0.12.0` or earlier and want to upgrade to `0.14.0` or later, I recommend you to [upgrade to `0.13.0` first](#-upgrade-from-0120-or-earlier-eg-from-0120-to-0130) and then come back to here to avoid unintended issue.**
In this guide, for example, perform upgrading from `0.13.0` to `0.14.0`. The AWX Operator `0.13.0` or earlier resides in the `default` namespace by default and the related AWX instance resides in the `awx` namespace, as described in this repository. After the upgrade, everything related to the AWX Operator `0.14.0` will reside in the `awx` namespace.
| Phase | AWX Operator | AWX Instance |
| ---------------- | ------------------------------- | --------------------------- |
| Before Upgrade | `0.13.0` in `default` namespace | `19.3.0` in `awx` namespace |
| After Upgrade | `0.14.0` in `awx` namespace | `19.4.0` in `awx` namespace |
To upgrade AWX Operator, remove the old AWX Operator that is running in the `default` namespace first. In addition, remove Service Account, Cluster Role, and Cluster Role Binding that are required for old AWX Operator to work.
```bash
kubectl -n default delete deployment awx-operator
kubectl -n default delete serviceaccount awx-operator
kubectl -n default delete clusterrolebinding awx-operator
kubectl -n default delete clusterrole awx-operator
``` ```
Since we removed only old AWX Operator, the old CRDs are still exist. Therefore, the old `awx` resource which means old AWX instance is still running in the `awx` namespace. Since we removed only old AWX Operator, the old CRDs are still exist. Therefore, the old `awx` resource which means old AWX instance is still running in the `awx` namespace.
### (Optional) Delete the old AWX instance Finally, deploy the new AWX Operator to the `awx` namespace.
This step should be performed if the K3s node does not have enough free resources to deploy a new AWX instance. ```bash
# Prepare required files
cd ~
git clone https://github.com/ansible/awx-operator.git
cd awx-operator
git checkout 0.14.0 # Checkout the version to upgrade to
# Deploy AWX Operator
export NAMESPACE=awx # Specify the namespace where the old AWX instance exists
make deploy
```
This will update the CRDs in the cluster and create the required Service Account, Roles, etc. in the `awx` namespace. Also, AWX Operator will start working. Once AWX Operator is up and running, it will start rolling out a new version of the AWX instance automatically based on the old `awx` resource definition.
To monitor the progress of the deployment, check the logs of `deployments/awx-operator-controller-manager`:
```bash
kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
```
When the deployment completes successfully, the logs end with:
```txt
$ kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
...
----- Ansible Task Status Event StdOut (awx.ansible.com/v1beta1, Kind=AWX, awx/awx) -----
PLAY RECAP *********************************************************************
localhost : ok=56 changed=0 unreachable=0 failed=0 skipped=35 rescued=0 ignored=0
----------
```
## 📝 Upgrade from `0.12.0` or earlier (e.g. from `0.12.0` to `0.13.0`)
If you are using `0.12.0` or earlier and want to upgrade to newer version, simply, deploy the new version of AWX Operator. This procedure can be applicable for upgrading to up to `0.13.0`. **If you want to upgrade to `0.14.0` or later, I recommend you to upgrade to `0.13.0` by following this procedure first and then [perform upgrading to `0.14.0` or later](#-upgrade-from-0130-eg-from-0130-to-0140).**
```bash
# Specify the version to upgrade to in the URL
kubectl apply -f https://raw.githubusercontent.com/ansible/awx-operator/0.13.0/deploy/awx-operator.yaml
```
This will upgrade the AWX Operator first, after that, AWX will be also upgraded as well.
To monitor the progress of the deployment, check the logs of `deployment/awx-operator`:
```bash
kubectl logs -f deployment/awx-operator
```
When the deployment completes successfully, the logs end with:
```txt
$ kubectl logs -f deployment/awx-operator
...
--------------------------- Ansible Task Status Event StdOut -----------------
PLAY RECAP *********************************************************************
localhost : ok=54 changed=0 unreachable=0 failed=0 skipped=37 rescued=0 ignored=0
-------------------------------------------------------------------------------
```
## ❓ Troubleshooting
Some hists for when you got stuck during upgrade.
### New Pod gets stuck in `Pending` state
If the K3s node does not have enough free resources to deploy a new AWX instance, the new Pod for AWX gets stuck in `Pending` state.
```bash
$ kubectl -n awx get pods
NAME READY STATUS RESTARTS AGE
awx-7d74496d7d-d66dw 4/4 Running 0 19d
awx-84d5c45999-55gb4 0/4 Pending 0 10s 👈👈👈
```
Try running `kubectl -n awx describe pod <Pod Name>` and check the `Events` section at the end for the cause.
```bash
$ kubectl -n awx describe pod awx-84d5c45999-55gb4
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 106s default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. 👈👈👈
Warning FailedScheduling 105s default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory. 👈👈👈
```
This means that the node does not have enough CPU or memory resources to start the Pod.
During the AWX upgrade, a rollout of the Deployment resource will be performed and temporarily two AWX Pods will be running. This means that the required Resource Requests for CPU and memory will be doubled. During the AWX upgrade, a rollout of the Deployment resource will be performed and temporarily two AWX Pods will be running. This means that the required Resource Requests for CPU and memory will be doubled.
@ -70,7 +200,7 @@ kubectl -n awx delete deployment awx
Ensure that it is not the `awx` resource that should be deleted, but the `deployment` resource. If we accidentally delete the `awx` resource or any Secrets, we will not be able to upgrade successfully. Ensure that it is not the `awx` resource that should be deleted, but the `deployment` resource. If we accidentally delete the `awx` resource or any Secrets, we will not be able to upgrade successfully.
Now only PostgreSQL exists in our `awx` namespace. After a few minutes of waiting, our AWX Operator will successfully launch the new Deployment and the Pod for AWX.
```bash ```bash
$ kubectl -n awx get all $ kubectl -n awx get all
@ -84,79 +214,3 @@ service/awx-service ClusterIP 10.43.248.150 <none> 80/TCP 8m51
NAME READY AGE NAME READY AGE
statefulset.apps/awx-postgres 1/1 8m58s statefulset.apps/awx-postgres 1/1 8m58s
``` ```
### Deploy the new AWX Operator
Finally, deploy the new AWX Operator to the awx namespace.
```bash
# Prepare required files
cd ~
git clone https://github.com/ansible/awx-operator.git
cd awx-operator
git checkout 0.14.0
# Deploy AWX Operator
export NAMESPACE=awx
make deploy
```
This will update the CRDs and create the required Service Account, Roles, etc. in the `awx` namespace. Also, AWX Operator will start working.
### Wait for the AWX instance to be upgraded
Once AWX Operator is up and running, it will start rolling out a new version of the AWX instance automatically based on the old `awx` resource definition.
We can monitor the progress in the logs of `deployments/awx-operator-controller-manager`. Once this completed, the logs end with:
```txt
$ kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
...
----- Ansible Task Status Event StdOut (awx.ansible.com/v1beta1, Kind=AWX, awx/awx) -----
PLAY RECAP *********************************************************************
localhost : ok=56 changed=0 unreachable=0 failed=0 skipped=35 rescued=0 ignored=0
----------
```
Now your new AWX instance and AWX Operator exist in `awx` namespace.
```bash
$ kubectl -n awx get awx,all,ingress,secrets
NAME AGE
awx.awx.ansible.com/awx 13m
NAME READY STATUS RESTARTS AGE
pod/awx-postgres-0 1/1 Running 0 13m
pod/awx-operator-controller-manager-68d787cfbd-59wr8 2/2 Running 0 3m42s
pod/awx-84d5c45999-xdspl 4/4 Running 0 3m23s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/awx-operator-controller-manager-metrics-service ClusterIP 10.43.81.63 <none> 8443/TCP 3m42s
service/awx-postgres ClusterIP None <none> 5432/TCP 13m
service/awx-service ClusterIP 10.43.248.150 <none> 80/TCP 13m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/awx-operator-controller-manager 1/1 1 1 3m42s
deployment.apps/awx 1/1 1 1 3m23s
NAME DESIRED CURRENT READY AGE
replicaset.apps/awx-operator-controller-manager-68d787cfbd 1 1 1 3m42s
replicaset.apps/awx-84d5c45999 1 1 1 3m23s
NAME READY AGE
statefulset.apps/awx-postgres 1/1 13m
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress.networking.k8s.io/awx-ingress <none> awx.example.com 192.168.0.100 80, 443 13m
NAME TYPE DATA AGE
secret/default-token-gq4k7 kubernetes.io/service-account-token 3 13m
secret/awx-admin-password Opaque 1 13m
secret/awx-broadcast-websocket Opaque 1 13m
secret/awx-secret-tls kubernetes.io/tls 2 13m
secret/awx-postgres-configuration Opaque 6 13m
secret/awx-secret-key Opaque 1 13m
secret/awx-token-vpc22 kubernetes.io/service-account-token 3 13m
secret/awx-operator-controller-manager-token-6m4k9 kubernetes.io/service-account-token 3 3m42s
secret/awx-app-credentials Opaque 3 13m
```