You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article covers the necessary and sufficient steps to run and configure RAC 21.3 on a kubernetes 1.26 cluster. There were no precedents for April 2023 work outside of Oracle Cloud, so thousands of experiments had to be carried out, many hypotheses were discarded and accepted.
Today, the world's first rollout of RAC in kubernetes and without docker.
uname -a
Linux r02 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
OEL 8.5 is already certified for Oracle RAC, 7.9 does not need to be installed.
containerd --version
containerd github.com/containerd/containerd v1.6.4 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
runc --version
runc version 1.0.2
spec: 1.0.2-dev
go: go1.16.7
libseccomp: 2.5.1
The main tasks of setting up a RAC rollout:
Creating and configuring subnets in a kubernetes cluster
Settings:
kubernetes worker node kernel
RAC container kernel, namespaced and read only.
Selecting and configuring the RAM used by RAC
Setting up shared RAC storage in kubernetes.
This rollout is not the only possible configuration, it is the simplest example of a working solution. Deployment of development and test environments can be achieved in a relatively simple way. For production setup, you should use other approaches in configuring and reserving shared partitions.
Create networks and adapters
Public and private RAC networks, historically required from 2 adapters, virtual ones are also suitable, will not work with fewer adapters.
In my opinion, any CNI can be configured to add additional adapters to a container at startup.
To reduce implementation time, I use cncf approved multus solution. When deploying multus CNI in a kubernetes cluster, I advise you to pay attention to the possibility of deploying only on nodes where RAC containers will run. Even in the absence of annotations in manifests, no containers may start due to incorrect settings after rolling out multus CNI. Immediately pay attention to the file /etc/cni/net.d/00-multus.conf
This may work in future versions of containerd, but today every time a kubernetes node is started, this file is updated and cniVersion must be 0.3.1 to work correctly.
I specifically wrote the mtu 1500 for a private network knowing jumbo packets are required with the mtu 9000 and we would get a warning when deploying the grid in the logs. There is nothing wrong with this, it's just that this setup will require a lot of work that is not included in the current plans. It may well turn out that 20% speedup when using jumbo packets will not affect the overall performance.
spec. securityContext:sysctls: this is where you can declare some of the kernel parameters for use in the container, the rest can be set at the level of the kubernetes worker node.
To do this, use featureGates on the worker node, add the line to the /var/lib/kubelet/config.yaml file:
allowedUnsafeSysctls: [kernel.shm*, kernel.sem, net.*]
Add to file /etc/kubernetes/manifests/kube-apiserver.yaml
--feature-gates=ProcMountType=true
The kubernetes server API will restart within a few minutes, the cluster API resources will be unavailable. At this point, it is important to understand what and why you are doing, otherwise you may lose your kubernetes cluster.
Restart kubelet, swap should still be disabled up to this point.
You can not perform this step, just add all sysctl at the host level to /etc/sysctl.conf and sysctl -p, a matter of perfectionism. Everything required by the official Oracle documentation should also be added there.
To speed up image loading (20Gi), it is better to configure any local image registry.
Then:
podman pull container-registry.oracle.com/database/rac:21.3.0.0
podman push --tls-verify=false docker-service.docker-registry:5000/container-registry.oracle.com/database/rac:21.3.0.0
To set up a mirror registry in config.toml add the name and port of your registry:
In this example, podman is running in its own container and the storage port is used on the kubernetes cluster network, while containerd is running on the host network, so we see the port from the podman deployment service in kubernetes.
securityContext:
privileged: true
There is no mistake here, it is at the container level that the second securityContext block is declared. When compared with the official Oracle docker configuration, a significant difference in the operation of docker and kubernetes.
Configuring RAM settings
The total amount of RAM on the described stand is not very large - 20Gb, the recommended minimum amount of 16Gb for a pod seems normal. Also, kube-proxy, flannel, multus are required to work on the node, container registry is not required to work on the node, saves time only.
It is very important to decide on what equipment you can and should use huge pages. In my personal opinion, based on Oracle recommendations, the officially defined value vm.nr_hugepages=16384 should not be used on servers with less than 64Gb of physical RAM. You can try, most likely the Oracle database will not start. Just don't use it:
sysctl vm.nr_hugepages=0
The Oracle database server will automatically switch to regular pages, which should be sufficient. Then you should correlate the size of the SGA and the allocated area /dev/shm. Here they are 3.8Gb and 4Gb, respectively.
Don't forget to disable transparent huge pages in all cases:
sudo echo never > /sys/kernel/mm/transparent_hugepage/enabled
It is better to do this at the GRUB level in advance.
Rename ntp.conf file to avoid cvu error. It is not clear why he appeared in this image.
Encrypting the initial password and placing the private key is described in the Oracle documentation, just put those files here.
Next comes a trick. It is not yet possible to display all the necessary kernel parameters, so that cvu check is clean, copy the plot to a regular file system
sudo cp -r /proc/sys/net /home/<…>
In this batch file, mount --bind will successfully mount this piece in /proc of the container and we won't see any cvu errors or container logs.
kernel.sched_rt_runtime_us=-1 on this "-1" here, to be honest, everything is established :) This parameter allows container root processes to run processes with RT priority. ASM from version 19.3 will not work differently. 18.3 still works.
The system must be booted with DefaultCPUAccounting=no in /etc/systemd/system.conf Otherwise realtime won't work in the container.
/etc/security/limits.conf contains an unrealistic memlock value of 128Gb, the database will not start, so for a 20Gb system for everything, we will allocate 8Gb for ASM and 8Gb for Oracle.
Chronyd replaced ntp and it starts here.
This file should be executed in the first 60 seconds of the container's operation, for which time is allocated there.
Deploy 1 or more nfs servers to share a catalog with storage files. If you have specialized NAS equipment, you can miss a lot here.
Prepare 2 or more files on nfs servers according to Oracle documentation.
I use a slightly different but very fast solution:
cd /home/oracle-rac-213/
sudo fallocate -z -o 0 -l 23G asm_disk01.img
sudo fallocate -z -o 0 -l 23G asm_disk02.img
sudo fallocate -z -o 0 -l 23G asm_disk03.img
sudo fallocate -z -o 0 -l 23G asm_disk04.img
sudo fallocate -z -o 0 -l 23G asm_disk05.img
The presence of zero blocks is stored in the file system metadata.
On the kubernetes worker node, mount the shared directory:
mkdir -p /mnt/ora
mount -t nfs 192.168.1.4:/home/oracle-rac-213 /mnt/ora
Mount each file as a device:
losetup -f /mnt/ora/asm_disk01.img
… Etc.
I do not explicitly specify the DNS server, Oracle's normal mode of operation, it will focus on resolv.conf, then cluster dns. The main thing in this chain is to indicate the correspondence of host names from this list and their addresses, for SCAN, specify all 3 addresses 172.16.1.70 - 172.16.1.72. I used a dns server from a network domain external to the kubernetes cluster, so I had to specify everything 2 times with and without a domain. The domain name in the container is formed according to the rules of kubernetes: oracle-rac-213.svc.cluster.local
Your system will require additional configuration for accessing the database from outside the cluster. CMAN is an option and can and should be configured additionally.
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@oracle-rac-213-0 ~] $ crsctl status resource -w "TYPE co ’ora’" -t
export ORACLE_HOME=/u01/app/oracle/product/21.3.0/dbhome_1
export TNS_ADMIN=$ORACLE_HOME/network/admin
sqlplus sys/sys@orclcdb as sysdba
SQL*Plus: Release 21.0.0.0.0 - Production on Wed Apr 5 18:56:50 2023
Version 21.3.0.0.0
Copyright (c) 1982, 2021, Oracle. All rights reserved.
Connected to:
Oracle Database 21c Enterprise Edition Release 21.0.0.0.0 - Production
Version 21.3.0.0.0
SQL> show parameter dispatchers
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
dispatchers string (PROTOCOL=TCP) (SERVICE=ORCLCD
BXDB)
max_dispatchers integer
SQL> alter session set container=CDB$ROOT;
Session altered.
SQL> exec DBMS_XDB_CONFIG.SETHTTPSPORT(5500);
PL/SQL procedure successfully completed.
SQL> alter session set container=orclpdb;
Session altered.
SQL> exec DBMS_XDB_CONFIG.SETHTTPSPORT(5501);
PL/SQL procedure successfully completed.
Don't forget to specify in advance in the kubernetes manifest all the required OEM ports for all PDBs.
This solution will also work in cloud kubernetes clusters on provider virtual machines, as long as managed kubernetes can block API server, kubelet, CRI settings.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear Sirs!
This article covers the necessary and sufficient steps to run and configure RAC 21.3 on a kubernetes 1.26 cluster. There were no precedents for April 2023 work outside of Oracle Cloud, so thousands of experiments had to be carried out, many hypotheses were discarded and accepted.
Today, the world's first rollout of RAC in kubernetes and without docker.
uname -a
Linux r02 5.4.17-2136.300.7.el8uek.x86_64 #2 SMP Fri Oct 8 16:23:01 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
OEL 8.5 is already certified for Oracle RAC, 7.9 does not need to be installed.
containerd --version
containerd github.com/containerd/containerd v1.6.4 212e8b6fa2f44b9c21b2798135fc6fb7c53efc16
runc --version
runc version 1.0.2
spec: 1.0.2-dev
go: go1.16.7
libseccomp: 2.5.1
The main tasks of setting up a RAC rollout:
This rollout is not the only possible configuration, it is the simplest example of a working solution. Deployment of development and test environments can be achieved in a relatively simple way. For production setup, you should use other approaches in configuring and reserving shared partitions.
statefulset.yaml
Due to the deprecation of podSecurityPolicy from version 1.23, some of the settings are done in the namespace.
Create networks and adapters
Public and private RAC networks, historically required from 2 adapters, virtual ones are also suitable, will not work with fewer adapters.
In my opinion, any CNI can be configured to add additional adapters to a container at startup.
To reduce implementation time, I use cncf approved multus solution. When deploying multus CNI in a kubernetes cluster, I advise you to pay attention to the possibility of deploying only on nodes where RAC containers will run. Even in the absence of annotations in manifests, no containers may start due to incorrect settings after rolling out multus CNI. Immediately pay attention to the file /etc/cni/net.d/00-multus.conf
This may work in future versions of containerd, but today every time a kubernetes node is started, this file is updated and cniVersion must be 0.3.1 to work correctly.
Add network definitions:
I specifically wrote the mtu 1500 for a private network knowing jumbo packets are required with the mtu 9000 and we would get a warning when deploying the grid in the logs. There is nothing wrong with this, it's just that this setup will require a lot of work that is not included in the current plans. It may well turn out that 20% speedup when using jumbo packets will not affect the overall performance.
spec. securityContext:sysctls: this is where you can declare some of the kernel parameters for use in the container, the rest can be set at the level of the kubernetes worker node.
To do this, use featureGates on the worker node, add the line to the /var/lib/kubelet/config.yaml file:
allowedUnsafeSysctls: [kernel.shm*, kernel.sem, net.*]
Add to file /etc/kubernetes/manifests/kube-apiserver.yaml
--feature-gates=ProcMountType=true
The kubernetes server API will restart within a few minutes, the cluster API resources will be unavailable. At this point, it is important to understand what and why you are doing, otherwise you may lose your kubernetes cluster.
Restart kubelet, swap should still be disabled up to this point.
You can not perform this step, just add all sysctl at the host level to /etc/sysctl.conf and sysctl -p, a matter of perfectionism. Everything required by the official Oracle documentation should also be added there.
To speed up image loading (20Gi), it is better to configure any local image registry.
Then:
podman pull container-registry.oracle.com/database/rac:21.3.0.0
podman push --tls-verify=false docker-service.docker-registry:5000/container-registry.oracle.com/database/rac:21.3.0.0
To set up a mirror registry in config.toml add the name and port of your registry:
In this example, podman is running in its own container and the storage port is used on the kubernetes cluster network, while containerd is running on the host network, so we see the port from the podman deployment service in kubernetes.
securityContext:
privileged: true
There is no mistake here, it is at the container level that the second securityContext block is declared. When compared with the official Oracle docker configuration, a significant difference in the operation of docker and kubernetes.
Configuring RAM settings
The total amount of RAM on the described stand is not very large - 20Gb, the recommended minimum amount of 16Gb for a pod seems normal. Also, kube-proxy, flannel, multus are required to work on the node, container registry is not required to work on the node, saves time only.
It is very important to decide on what equipment you can and should use huge pages. In my personal opinion, based on Oracle recommendations, the officially defined value vm.nr_hugepages=16384 should not be used on servers with less than 64Gb of physical RAM. You can try, most likely the Oracle database will not start. Just don't use it:
sysctl vm.nr_hugepages=0
The Oracle database server will automatically switch to regular pages, which should be sufficient. Then you should correlate the size of the SGA and the allocated area /dev/shm. Here they are 3.8Gb and 4Gb, respectively.
Don't forget to disable transparent huge pages in all cases:
sudo echo never > /sys/kernel/mm/transparent_hugepage/enabled
It is better to do this at the GRUB level in advance.
Preparing to run install grid infrastructure
In common_scripts create sh file:
Rename ntp.conf file to avoid cvu error. It is not clear why he appeared in this image.
Encrypting the initial password and placing the private key is described in the Oracle documentation, just put those files here.
Next comes a trick. It is not yet possible to display all the necessary kernel parameters, so that cvu check is clean, copy the plot to a regular file system
sudo cp -r /proc/sys/net /home/<…>
In this batch file, mount --bind will successfully mount this piece in /proc of the container and we won't see any cvu errors or container logs.
kernel.sched_rt_runtime_us=-1 on this "-1" here, to be honest, everything is established :) This parameter allows container root processes to run processes with RT priority. ASM from version 19.3 will not work differently. 18.3 still works.
The system must be booted with DefaultCPUAccounting=no in /etc/systemd/system.conf Otherwise realtime won't work in the container.
/etc/security/limits.conf contains an unrealistic memlock value of 128Gb, the database will not start, so for a 20Gb system for everything, we will allocate 8Gb for ASM and 8Gb for Oracle.
Chronyd replaced ntp and it starts here.
This file should be executed in the first 60 seconds of the container's operation, for which time is allocated there.
kubectl -n oracle-rac-213 exec -it oracle-rac-213-0 -- bash /oradata/scripts/ini.sh
Shared Storage
Deploy 1 or more nfs servers to share a catalog with storage files. If you have specialized NAS equipment, you can miss a lot here.
Prepare 2 or more files on nfs servers according to Oracle documentation.
I use a slightly different but very fast solution:
cd /home/oracle-rac-213/
sudo fallocate -z -o 0 -l 23G asm_disk01.img
sudo fallocate -z -o 0 -l 23G asm_disk02.img
sudo fallocate -z -o 0 -l 23G asm_disk03.img
sudo fallocate -z -o 0 -l 23G asm_disk04.img
sudo fallocate -z -o 0 -l 23G asm_disk05.img
The presence of zero blocks is stored in the file system metadata.
On the kubernetes worker node, mount the shared directory:
mkdir -p /mnt/ora
mount -t nfs 192.168.1.4:/home/oracle-rac-213 /mnt/ora
Mount each file as a device:
losetup -f /mnt/ora/asm_disk01.img
… Etc.
Now we can declare all persistent volumes:
Everything must be applied.
Persistent volume claims are declared in the statefulset manifest.
Settings
I do not explicitly specify the DNS server, Oracle's normal mode of operation, it will focus on resolv.conf, then cluster dns. The main thing in this chain is to indicate the correspondence of host names from this list and their addresses, for SCAN, specify all 3 addresses 172.16.1.70 - 172.16.1.72. I used a dns server from a network domain external to the kubernetes cluster, so I had to specify everything 2 times with and without a domain. The domain name in the container is formed according to the rules of kubernetes: oracle-rac-213.svc.cluster.local
Enabling swap
sudo fallocate -l 32G /swap.img
sudo chmod 600 /swap.img
sudo mkswap /swap.img
sudo swapon /swap.img
Launch and monitoring
Apply all manifests from namespace creation to statefulset.
I am sync through argocd:
Services for accessing the database and OEM can be obtained in my repository https://github.com/itoracl/oracle-rac-213, as well as all the above manifests.
Your system will require additional configuration for accessing the database from outside the cluster. CMAN is an option and can and should be configured additionally.
Let's execute inside the container:
[grid@oracle-rac-213-0 ~] $ crsctl check crs
[grid@oracle-rac-213-0 ~] $ crsctl status resource -w "TYPE co ’ora’" -t
Configuring Oracle Cloud Enterprise Manager Ports
Don't forget to specify in advance in the kubernetes manifest all the required OEM ports for all PDBs.
This solution will also work in cloud kubernetes clusters on provider virtual machines, as long as managed kubernetes can block API server, kubelet, CRI settings.
I hope I haven't any forgotten.
Happy linux!
Beta Was this translation helpful? Give feedback.
All reactions