Ansible at Work - Patching Linux Servers

Ansible at Work - Patching Linux Servers

When I started this series I mentioned one thing clearly that the tasks which we perform everyday or on a regular basis should be automated. By doing that we can save time for other important tasks.

Linux Server Patching is one of those tasks which a DevOps engineer does on a regular basis on their ON PREM, Cloud and on Virtual Servers. Remember the time when there were no such intelligent CM tools in the market Linux admins used to have sleepless nights during patching hundreds of servers.

Ansible makes it seamless for us now!

In this article I will cover following things -

  • Kernel Patching on Linux servers
  • Upgrading all the installed packages on Linux servers

Let's start doing the practical.

Kernel Patching on Linux servers

Let's directly start by having a look at my playbook.

---
#### Ansible Playbook to perform Kernel Patching on RHEL/CentOS and Ubuntu/Debian Servers ####

- hosts: workers
  become: yes
  serial: 4

  tasks:

    - name:  Task 1 - verify web/database processes are not running
      shell: if ps -eaf | egrep 'apache|http|nginx|mysql|postgresql|mariadb'|grep -v grep > /dev/null ;then echo 'process_running';else echo 'process_not_running';fi
      ignore_errors: true
      register: app_process_check


    - name:  Task 2 - decision point to start patching
      fail: msg="{{ inventory_hostname }} have running Application. Please stop the application processes first, then attempt patching."
      when: app_process_check.stdout == "process_running"


    - name:  Task 3 - upgrade kernel package on RHEL/CentOS server
      yum:
       name="kernel"
       state=latest
      when: app_process_check.stdout == "process_not_running" and ansible_distribution == 'CentOS' or ansible_distribution == 'Red Hat Enterprise Linux'
      register: yum_update

    - name:  Task 4 - upgrade kernel package on Ubuntu server
      apt:
        update_cache: yes
        force_apt_get: yes
        cache_valid_time: 3600
        name: linux-image-generic
        state: latest
      when: app_process_check.stdout == "process_not_running" and ansible_distribution == 'Ubuntu' or ansible_distribution == 'Debian'
      register: apt_update

    - name: Task 5 - check if reboot required after kernel update on CentOS/RedHat servers
      shell: KERNEL_NEW=$(rpm -q --last kernel |head -1 | awk '{print $1}' | sed 's/kernel-//'); KERNEL_NOW=$(uname -r); if [[ $KERNEL_NEW != $KERNEL_NOW ]]; then echo "reboot_needed"; else echo "reboot_not_needed"; fi
      when: ansible_distribution == 'CentOS' or ansible_distribution == 'Red Hat Enterprise Linux'
      ignore_errors: true
      register: reboot_required

    - name: Task 6 - Check if a reboot is required after kernel update on Ubuntu/Debian servers
      register: reboot_required_file
      stat: path=/var/run/reboot-required get_md5=no
      when: ansible_distribution == 'Ubuntu' or ansible_distribution == 'Debian'

    - name: Task 7 - Reboot CentOS/RedHat systems if kernel updated
      command: shutdown -r +1  "Rebooting CentOS/RedHat Servers After Kernel Patching"
      async: 0
      poll: 0
      when: reboot_required.stdout == "reboot_needed" and (ansible_distribution == 'CentOS' or ansible_distribution == 'Red Hat Enterprise Linux')
      register: reboot_started
      ignore_errors: true

    - name: Task 8 - Reboot Ubuntu/Debian Servers if kernel updated
      reboot:
        msg: "Rebooting Ubuntu/Debian Servers After Kernel Patching"
        connect_timeout: 5
        reboot_timeout: 300
        pre_reboot_delay: 0
        post_reboot_delay: 30
        test_command: uptime
      when: reboot_required_file.stat.exists and (ansible_distribution == 'Ubuntu' or ansible_distribution == 'Debian')
      register: reboot_started_ubuntu
      ignore_errors: true

    - name: Task 9 - pause for 180 secs
      pause:
        minutes: 3

    - name: Task 10 - check if all the systems responding to ssh
      local_action:
        module: wait_for
          host={{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}
          port=22
          search_regex=OpenSSH
          delay=15
          timeout=300
          state=started

You can set a number, a percentage, or a list of numbers of hosts you want to manage at a time with serial parameter. Ansible completes the play on the specified number or percentage of hosts before starting the next batch of hosts.

Let us now understand all the tasks one by one.

Task 1 -> verify web/database processes are not running

We are verifying here that if either Web or Databases services are running on the client node we will not consider that system as part of our upgrade process. Not necessarily though but it is advisable to stop the critical services before you upgrade the system. If you don't want to put this condition you can exclude tasks 1 & 2 and modify tasks 3 & 4 accordingly (by removing one of the when conditions).

Task 2 -> What if web/database processes are running on the servers

The play will fail on those nodes where web/database processes are running.

Task 3 -> Upgrade kernel package on RHEL/CentOS servers

This task will start upgrading kernel on RHEL/CentOS servers where the above applications aren't running.

Task 4 -> Upgrade kernel package on Ubuntu/Debian servers

This task will start upgrading kernel on Ubuntu/Debian servers where the above applications aren't running. We have used few apt module's parameters here.

update_cache=yes –> Run the equivalent of apt-get update command on all servers.

force_apt_get=yes –> Force usage of apt-get instead of aptitude.

cache_valid_time=3600 –> Update the apt cache if its older than the cache_valid_time (in seconds). We are setting it to 3600 seconds.

Task 5 -> Check if reboot required after kernel update on CentOS/RedHat servers

This task will check for the reboot requirement on CentOS/RedHat servers after Kernel upgrade.

Task 6 -> Check if reboot required after kernel update on Ubuntu/Debian servers

This task will check for the reboot requirement on Ubuntu/Debian servers after Kernel upgrade by verifying the existence of /var/run/reboot-required file. We are saving the result of stat module's output which we are going to use it to decide rebooting of the server later.

get_md5=no –> This is to verify the checksum of the file using one the algorithms (sha1, sha256, sha512 etc.).

Task 7 -> Reboot CentOS/RedHat systems if kernel updated and reboot required

This task will instruct the Ansible to reboot CentOS/RedHat systems if kernel updated and reboot required. In this task I am initiating reboot process by firing the shutdown command.

Task 8 -> Reboot Ubuntu/Debian systems if kernel updated and reboot required

This task will instruct the Ansible to reboot Ubuntu/Debian systems if kernel updated and reboot required. In this task I am using Ansible's reboot module to initiating reboot process. So that you understand both the ways to do this task.

connect_timeout -> Maximum seconds to wait for a successful connection to the managed hosts before trying again.

reboot_timeout -> Maximum seconds to wait for machine to reboot and respond to a test command.

pre_reboot_delay -> Seconds to wait before reboot.

post_reboot_delay -> Seconds to wait after the reboot command was successful before attempting to validate the system rebooted successfully.

test_command -> Command to run on the rebooted host and expect success from to determine the machine is ready for further tasks.

Task 9 -> pause for 180 secs

This task is to wait for 3 minutes for servers to come up after the reboot. here we are using Ansible's pause module . It pauses playbook execution for a set amount of time, or until a prompt is acknowledged.

You can use ctrl+c if you wish to advance a pause earlier than it is set to expire or if you need to abort a playbook run entirely. To continue early press ctrl+c and then C. To abort a playbook press ctrl+c and then A.

Task 10 -> check if all the systems responding to SSH

This task will ensure that after the pause of 3 minutes the systems are accessible through SSH. Here we are using local_action module. When we use this module, Ansible will run the module mentioned under it on the controller node. That module here is wait_for.

Time to execute the playbook.

linux_patching $ ansible-playbook -i myinventory linux_server_patching.yml -kK

playbookrun-1.png

playbookrun-2.png

Let us decode the playbook run output.

  • In Task 2 worker1 was skipped as one of the Web/DB services were running over there.
  • Task 3 skipped worker4 and worker3 as it was for CentOS/RHEL hosts and worker2 was in ok state because it was having latest kernel package running.
  • Task 4 skipped worker2 because of OS difference and worker3 was in ok state because it was having latest kernel package running. But here worker4 status is changed that means here the Kernel upgrade task ran.
  • In Task 7 worker2 was skipped as it did not require to reboot and fatal errors for worker3 and worker4 can be ignored as that task wasn't relevant for those two Ubuntu nodes.
  • In Task 8 worker3 was skipped as it did not require to reboot but worker4 status is changed that means this node required a reboot. Fatal errors for worker2 can be ignored as that task wasn't relevant for that CentOS node.
  • Task 9 was to have a pause of 3 mins to complete the reboot process on nodes. You can always "Continue Early" or "Abort" the pause.
  • Task 10 confirmed that after all the tasks run and reboot clients were reachable over SSH.

I have taken the snapshot of worker4 node before and after upgrade kernel versions and uptime to confirm the reboot.

Before the upgrade ->

worker4_before_upgrade-2.png

After the upgrade ->

worker4_after_upgrade-2.png

We can convert this playbook into an Ansible Role as well. Why I dint not do is because here it was just about installation of one kernel package so does not make much sense of creating a full fledged role. But when you want to install multiple packages of different versions on client nodes having a role is always recommended.

Updating all packages on Ubuntu / Debian Linux Servers

At time you want to upgrade all the packages installed on the system you can use the following playbook.

Here is the playbook to achieve this.

---
- name: Updating all packages on Ubuntu / Debian Linux Servers
  hosts: workers
  become: true
  become_user: root
  tasks:
    - name: Update apt repo and package cache
      apt:
        update_cache: yes
        force_apt_get: yes
        cache_valid_time: 3600

    - name: Upgrade all packages
      apt:
        upgrade: dist
        force_apt_get: yes

    - name: Check if a reboot is needed on all servers
      register: reboot_required_file_existence
      stat: path=/var/run/reboot-required get_md5=no

    - name: Reboot servers if kernel is updated
      reboot:
        msg: "Rebooting the servers after applying Kernel Updates"
        connect_timeout: 5
        reboot_timeout: 300
        pre_reboot_delay: 0
        post_reboot_delay: 30
        test_command: uptime
      when: reboot_required_file_existence.stat.exists

    - name: check if all the systems responding to ssh
      local_action:
        module: wait_for
          host={{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}
          port=22
          search_regex=OpenSSH
          delay=15
          timeout=300
          state=started

Similarly this can be achieved on CentOS and RHEL based systems by using yum module.

I have already covered individual package installations on Linux servers in couple of my earlier articles if you are following the series from starting. :)

That's all for this article now.

Hope you like the article. Stay Tuned for more.

Thank you. Happy learning!

Did you find this article valuable?

Support Learn Code Online by becoming a sponsor. Any amount is appreciated!