When I started this series I mentioned one thing clearly that the tasks which we perform everyday or on a regular basis should be automated. By doing that we can save time for other important tasks.
Linux Server Patching is one of those tasks which a DevOps engineer does on a regular basis on their ON PREM
, Cloud
and on Virtual
Servers. Remember the time when there were no such intelligent CM tools in the market Linux admins used to have sleepless nights during patching hundreds of servers.
Ansible makes it seamless for us now!
In this article I will cover following things -
- Kernel Patching on Linux servers
- Upgrading all the installed packages on Linux servers
Let's start doing the practical.
Kernel Patching on Linux servers
Let's directly start by having a look at my playbook.
---
#### Ansible Playbook to perform Kernel Patching on RHEL/CentOS and Ubuntu/Debian Servers ####
- hosts: workers
become: yes
serial: 4
tasks:
- name: Task 1 - verify web/database processes are not running
shell: if ps -eaf | egrep 'apache|http|nginx|mysql|postgresql|mariadb'|grep -v grep > /dev/null ;then echo 'process_running';else echo 'process_not_running';fi
ignore_errors: true
register: app_process_check
- name: Task 2 - decision point to start patching
fail: msg="{{ inventory_hostname }} have running Application. Please stop the application processes first, then attempt patching."
when: app_process_check.stdout == "process_running"
- name: Task 3 - upgrade kernel package on RHEL/CentOS server
yum:
name="kernel"
state=latest
when: app_process_check.stdout == "process_not_running" and ansible_distribution == 'CentOS' or ansible_distribution == 'Red Hat Enterprise Linux'
register: yum_update
- name: Task 4 - upgrade kernel package on Ubuntu server
apt:
update_cache: yes
force_apt_get: yes
cache_valid_time: 3600
name: linux-image-generic
state: latest
when: app_process_check.stdout == "process_not_running" and ansible_distribution == 'Ubuntu' or ansible_distribution == 'Debian'
register: apt_update
- name: Task 5 - check if reboot required after kernel update on CentOS/RedHat servers
shell: KERNEL_NEW=$(rpm -q --last kernel |head -1 | awk '{print $1}' | sed 's/kernel-//'); KERNEL_NOW=$(uname -r); if [[ $KERNEL_NEW != $KERNEL_NOW ]]; then echo "reboot_needed"; else echo "reboot_not_needed"; fi
when: ansible_distribution == 'CentOS' or ansible_distribution == 'Red Hat Enterprise Linux'
ignore_errors: true
register: reboot_required
- name: Task 6 - Check if a reboot is required after kernel update on Ubuntu/Debian servers
register: reboot_required_file
stat: path=/var/run/reboot-required get_md5=no
when: ansible_distribution == 'Ubuntu' or ansible_distribution == 'Debian'
- name: Task 7 - Reboot CentOS/RedHat systems if kernel updated
command: shutdown -r +1 "Rebooting CentOS/RedHat Servers After Kernel Patching"
async: 0
poll: 0
when: reboot_required.stdout == "reboot_needed" and (ansible_distribution == 'CentOS' or ansible_distribution == 'Red Hat Enterprise Linux')
register: reboot_started
ignore_errors: true
- name: Task 8 - Reboot Ubuntu/Debian Servers if kernel updated
reboot:
msg: "Rebooting Ubuntu/Debian Servers After Kernel Patching"
connect_timeout: 5
reboot_timeout: 300
pre_reboot_delay: 0
post_reboot_delay: 30
test_command: uptime
when: reboot_required_file.stat.exists and (ansible_distribution == 'Ubuntu' or ansible_distribution == 'Debian')
register: reboot_started_ubuntu
ignore_errors: true
- name: Task 9 - pause for 180 secs
pause:
minutes: 3
- name: Task 10 - check if all the systems responding to ssh
local_action:
module: wait_for
host={{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}
port=22
search_regex=OpenSSH
delay=15
timeout=300
state=started
You can set a number, a percentage, or a list of numbers of hosts you want to manage at a time with
serial
parameter. Ansible completes the play on the specified number or percentage of hosts before starting the next batch of hosts.
Let us now understand all the tasks one by one.
Task 1 -> verify web/database processes are not running
We are verifying here that if either Web or Databases services are running on the client node we will not consider that system as part of our upgrade process. Not necessarily though but it is advisable to stop the critical services before you upgrade the system. If you don't want to put this condition you can exclude tasks 1 & 2 and modify tasks 3 & 4 accordingly (by removing one of the when
conditions).
Task 2 -> What if web/database processes are running on the servers
The play will fail on those nodes where web/database processes are running.
Task 3 -> Upgrade kernel package on RHEL/CentOS servers
This task will start upgrading kernel
on RHEL/CentOS servers where the above applications aren't running.
Task 4 -> Upgrade kernel package on Ubuntu/Debian servers
This task will start upgrading kernel
on Ubuntu/Debian servers where the above applications aren't running.
We have used few apt
module's parameters here.
update_cache=yes
–> Run the equivalent of apt-get update
command on all servers.
force_apt_get=yes
–> Force usage of apt-get
instead of aptitude
.
cache_valid_time=3600
–> Update the apt cache if its older than the cache_valid_time (in seconds). We are setting it to 3600 seconds.
Task 5 -> Check if reboot required after kernel update on CentOS/RedHat servers
This task will check for the reboot requirement on CentOS/RedHat servers after Kernel
upgrade.
Task 6 -> Check if reboot required after kernel update on Ubuntu/Debian servers
This task will check for the reboot requirement on Ubuntu/Debian servers after Kernel
upgrade by verifying the existence of /var/run/reboot-required
file.
We are saving the result of stat
module's output which we are going to use it to decide rebooting of the server later.
get_md5=no
–> This is to verify the checksum of the file using one the algorithms (sha1, sha256, sha512 etc.).
Task 7 -> Reboot CentOS/RedHat systems if kernel updated and reboot required
This task will instruct the Ansible to reboot CentOS/RedHat systems if kernel updated and reboot required.
In this task I am initiating reboot process by firing the shutdown
command.
Task 8 -> Reboot Ubuntu/Debian systems if kernel updated and reboot required
This task will instruct the Ansible to reboot Ubuntu/Debian systems if kernel updated and reboot required.
In this task I am using Ansible's reboot
module to initiating reboot process. So that you understand both the ways to do this task.
connect_timeout
-> Maximum seconds to wait for a successful connection to the managed hosts before trying again.
reboot_timeout
-> Maximum seconds to wait for machine to reboot and respond to a test command.
pre_reboot_delay
-> Seconds to wait before reboot.
post_reboot_delay
-> Seconds to wait after the reboot command was successful before attempting to validate the system rebooted successfully.
test_command
-> Command to run on the rebooted host and expect success from to determine the machine is ready for further tasks.
Task 9 -> pause for 180 secs
This task is to wait for 3 minutes for servers to come up after the reboot.
here we are using Ansible's pause
module . It pauses playbook execution for a set amount of time, or until a prompt is acknowledged.
You can use
ctrl+c
if you wish to advance a pause earlier than it is set to expire or if you need to abort a playbook run entirely. To continue early pressctrl+c
and thenC
. To abort a playbook pressctrl+c
and thenA
.
Task 10 -> check if all the systems responding to SSH
This task will ensure that after the pause of 3 minutes the systems are accessible through SSH.
Here we are using local_action
module. When we use this module, Ansible will run the module mentioned under it on the controller node. That module here is wait_for
.
Time to execute the playbook.
linux_patching $ ansible-playbook -i myinventory linux_server_patching.yml -kK
Let us decode the playbook run output.
- In Task 2
worker1
was skipped as one of the Web/DB services were running over there. - Task 3 skipped
worker4
andworker3
as it was for CentOS/RHEL hosts andworker2
was inok
state because it was having latest kernel package running. - Task 4 skipped
worker2
because of OS difference andworker3
was inok
state because it was having latest kernel package running. But hereworker4
status ischanged
that means here the Kernel upgrade task ran. - In Task 7
worker2
was skipped as it did not require to reboot and fatal errors forworker3
andworker4
can be ignored as that task wasn't relevant for those twoUbuntu
nodes. - In Task 8
worker3
was skipped as it did not require to reboot butworker4
status ischanged
that means this node required a reboot. Fatal errors forworker2
can be ignored as that task wasn't relevant for thatCentOS
node. - Task 9 was to have a pause of 3 mins to complete the reboot process on nodes. You can always "Continue Early" or "Abort" the pause.
- Task 10 confirmed that after all the tasks run and reboot clients were reachable over SSH.
I have taken the snapshot of worker4
node before and after upgrade kernel versions and uptime to confirm the reboot.
Before the upgrade ->
After the upgrade ->
We can convert this playbook into an Ansible Role as well. Why I dint not do is because here it was just about installation of one
kernel
package so does not make much sense of creating a full fledged role. But when you want to install multiple packages of different versions on client nodes having a role is always recommended.
Updating all packages on Ubuntu / Debian Linux Servers
At time you want to upgrade all the packages installed on the system you can use the following playbook.
Here is the playbook to achieve this.
---
- name: Updating all packages on Ubuntu / Debian Linux Servers
hosts: workers
become: true
become_user: root
tasks:
- name: Update apt repo and package cache
apt:
update_cache: yes
force_apt_get: yes
cache_valid_time: 3600
- name: Upgrade all packages
apt:
upgrade: dist
force_apt_get: yes
- name: Check if a reboot is needed on all servers
register: reboot_required_file_existence
stat: path=/var/run/reboot-required get_md5=no
- name: Reboot servers if kernel is updated
reboot:
msg: "Rebooting the servers after applying Kernel Updates"
connect_timeout: 5
reboot_timeout: 300
pre_reboot_delay: 0
post_reboot_delay: 30
test_command: uptime
when: reboot_required_file_existence.stat.exists
- name: check if all the systems responding to ssh
local_action:
module: wait_for
host={{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}
port=22
search_regex=OpenSSH
delay=15
timeout=300
state=started
Similarly this can be achieved on CentOS and RHEL based systems by using yum
module.
I have already covered individual package installations on Linux servers in couple of my earlier articles if you are following the series from starting. :)
That's all for this article now.
Hope you like the article. Stay Tuned for more.
Thank you. Happy learning!