PostgreSQL DBA Basics — Setup Replication (Master — Slave Setup) on PostgreSQL 12

Monowar Mukul
11 min readMay 9, 2020

The master/slave database replication is a process of copying (syncing) data from a database on one server (the master) to a database on another server (the slaves). The main benefit of this process is to distribute databases to multiple machines, so when the master server has a problem, there is a backup machine with the same data available for handling requests without interruption.

PostgreSQL provides several ways to replicate a database. It can be used for backup purposes and to provide a high availability database server.

In this article, I will show you

i) How to install and configure PostgreSQL replication by using hot standby mode.
ii) Halt a slaver server and finally
iii) Failover testing

PostgreSQL -Replication (Master-Slave setup)

Prerequisite:
In my setup Operating system is OEL 7.6

++++++++++++++++++++++++++++++++++++++++
Step 1— Set up both the host [ One for Master and another for Slave Server
++++++++++++++++++++++++++++++++++++++++
In my case Master Server is oel75 with IP 192.168.56.101 — accepts connections from the client with reading and write permissions.
and Slave Server is devops with IP 192.168.56.110 — the standby server gets a copy of the data from the master server and allows read-only permission.

Postgres user has SSH User authentication between servers

ssh-keygen -b 4096
cd /var/lib/pgsql/.ssh
touch ~/.ssh/authorized_keys
chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys
cat id_rsa.pub >> ~/.ssh/authorized_keys

and cat /etc/hosts file has entry for all the hosts

$ cat /etc/hosts
— -
192.168.56.101 oel75.localdomain oel75
192.168.56.110 devops.localdomain devops

Let’s check
++++++++++++++++++++
$ ssh devops
Last login: Sat Apr 11 13:44:35 2020
$ hostname
devops.localdomain
$ ssh oel75
Last login: Sat Apr 11 11:11:20 2020
$ hostname
oel75.localdomain

+++++++++++++++++++++++++++++++++++++
Step 2 — Validate PostgreSQL installation on Master and Slave Server
+++++++++++++++++++++++++++++++++++++
Make sure Primary database is up and postgres services are running fine

Note: For installation you can check my other video “Installing PostgreSQL in Oracle Enterprise Linux”

Install PostgreSQL on Linux

Login to the source Postgresql server and ‘become’ the postgres user.
[root@devops ~]# sudo su — postgres
Last login: Fri Apr 10 18:09:26 AEST 2020 on pts/2

$ sudo systemctl status postgresql-12
● postgresql-12.service — PostgreSQL 12 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-12.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2020–04–10 20:56:32 AEST; 3s ago
Note: If not up then start PostgreSQL services on the Master server.
/*
## As root, initialize and start PostgreSQL 12 on the Master
$ systemctl start postgresql-12
*/

+++++++++++++++
ON SLAVE SERVER
+++++++++++++++

$ sudo systemctl status postgresql-12

+++++++++++++++++++++++++++++++++++
Step 3— Stop PostgreSQL services on the Slave Server
+++++++++++++++++++++++++++++++++++

i. Stop PostgreSQL services

$ sudo systemctl stop postgresql-12
$ sudo systemctl status postgresql-12

● postgresql-12.service — PostgreSQL 12 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-12.service; disabled; vendor preset: disabled)
Active: inactive (dead)
## Preparing the environment

+++++++++++++++++++++++++++++++++++
Step 4— Create Replication User [ Master server]
+++++++++++++++++++++++++++++++++++

Create a user for replication in the Master server.

CREATE USER replication REPLICATION LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD ‘password123’;

+++++++++++++++++++++++++++++++++++
Step 5— Modify Parameters for replication on the Master Server
+++++++++++++++++++++++++++++++++++

Modifying listen_addresses parameter to allow a specific IP interface or all (using *) this parameter and a restart the PostgreSQL services instance to get the change into effect.
# as postgres user
$ psql -c “ALTER SYSTEM SET listen_addresses TO ‘*’”;
ALTER SYSTEM
Edit the authentication configuration file pg_hba.conf file of the Master.

$ cd /var/lib/pgsql/12/data
$ vi pg_hba.conf
host replication replication_user 192.168.56.110/32 md5
Reload the setting of the cluster.
[postgres@prod ~]$ psql -c “select pg_reload_conf()”

$ cd /var/lib/pgsql/12/data
$ vi pg_hba.conf
host replication replication_user 192.168.56.101/32 md5
$ echo “host replication replication_user 192.168.56.110 /32 md5” >> $PGDATA/pg_hba.conf
## Get the changes into effect through a reload.
++++++++++++++++++++
Edit the general configuration file:
++++++++++++++++++++
$ psql
psql (12.2)
Type “help” for help.
postgres=# alter system set wal_level = replica ;
ALTER SYSTEM
postgres=# alter system set max_wal_senders = 3 ;
ALTER SYSTEM
postgres=# alter system set max_wal_size = ‘200MB’;
ALTER SYSTEM
postgres=# alter system set wal_keep_segments = 8;
ALTER SYSTEM
postgres=# alter system set hot_standby = on;
ALTER SYSTEM
postgres=# alter system set hot_standby_feedback = on ;
ALTER SYSTEM
postgres=# select pg_reload_conf();
postgres=# show hot_standby_feedback;
postgres=# show hot_standby;
postgres=# select pg_reload_conf();

++++++++++++++++++++++++++++
MAKE sure slave = postgres processes are stopped
++++++++++++++++++++++++++++
$ sudo systemctl status postgresql-12
[sudo] password for postgres:
● postgresql-12.service — PostgreSQL 12 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-12.service; disabled; vendor preset: disabled)
Active: inactive (dead)
On Slave: take a backup of current pg_hba.conf file
$ cp pg_hba.conf /var/lib/pgsql/12
On Slave: delete the contents of the data folder
rm -rf /var/lib/pgsql/12/data/

+++++++++++++++++++++++++++++++
Step 5 : take a primary base backup on the slave server
+++++++++++++++++++++++++++++++
In this case we are going to use pg_basebackup to backup the data directory of the Master from the Standby. To create all the required replication specific files and entries in the data directory we are going to use “-R”
We need to make sure that data directory is empty. For that either we can backup the current PostgreSQL cluster to a backup folder or delete all the files.

$ mv /var/lib/pgsql/12/data /var/lib/pgsql/12/data_old

$ pg_basebackup -h 192.168.56.101 -D /var/lib/pgsql/12/data -U replication -S “$(hostname -s)_slot” -X stream -P -v -R

where -R ==> pg_basebackup will create the replication specific files
“standby.signal” and append connection settings to postgresql.auto.conf in the output directory ( data directory ). The postgresql.auto.conf file will record the connection settings.

-D directory
— pgdata=directory
Directory to write the output to. pg_basebackup will create the directory and any parent directories if necessary. The directory may already exist but shoyld be empty ,

-P
— progress
Enables progress reporting. Turning this on will deliver an approximate progress report during the backup.

-v
— verbose
Enables verbose mode. Will output some extra steps during startup and shutdow

Password:
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot “pg_basebackup_16834”
52363/52363 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: waiting for background process to finish streaming …
pg_basebackup: syncing data to disk …
pg_basebackup: base backup completed

+++++++++++++++++++++++++++++
Step 6 : Start the replication on the Standby.
+++++++++++++++++++++++++++++

Let’s now start the database server at standby side using below command.
[root@devops ~]# sudo systemctl start postgresql-12
[root@devops ~]# sudo systemctl status postgresql-12
● postgresql-12.service — PostgreSQL 12 database server
Loaded: loaded (/usr/lib/systemd/system/postgresql-12.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2020–04–10 23:31:46 AEST; 8s ago
Docs: https://www.postgresql.org/docs/12/static/
Process: 26451 ExecStartPre=/usr/pgsql-12/bin/postgresql-12-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
Main PID: 26457 (postmaster)
Memory: 12.8M
CGroup: /system.slice/postgresql-12.service
├─26457 /usr/pgsql-12/bin/postmaster -D /var/lib/pgsql/12/data/
├─26459 postgres: logger
├─26460 postgres: startup waiting for 000000010000000000000003
├─26463 postgres: checkpointer
├─26464 postgres: background writer
└─26465 postgres: stats collector

+++++++++++++++++++++++++++++
Step 7 : Verify the replication between the Master and the Standby.
+++++++++++++++++++++++++++++

In order to verify, run this command on the Master.

$ psql -x -c “select * from pg_stat_replication”

From Slave — check postgres log file to get the details of the standby and the lag between the Master and Slave

[root@devops log]# pwd
/var/lib/pgsql/12/data/log
[root@devops log]# ls
postgresql-Fri.log
[root@devops log]# tail -f postgresql-Fri.log
2020–04–10 23:20:58.504 AEST [16604] FATAL: password authentication failed for user “replication”
2020–04–10 23:20:58.504 AEST [16604] DETAIL: Password does not match for user “replication”.
Connection matched pg_hba.conf line 90: “host replication replication 192.168.56.110/32 md5”
2020–04–10 23:31:46.238 AEST [26460] LOG: database system was interrupted; last known up at 2020–04–10 23:24:05 AEST
2020–04–10 23:31:46.963 AEST [26460] WARNING: specified neither primary_conninfo nor restore_command
2020–04–10 23:31:46.963 AEST [26460] HINT: The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2020–04–10 23:31:46.963 AEST [26460] LOG: entering standby mode
2020–04–10 23:31:46.966 AEST [26460] LOG: redo starts at 0/2000028
2020–04–10 23:31:46.967 AEST [26460] LOG: consistent recovery state reached at 0/2000100
2020–04–10 23:31:46.968 AEST [26457] LOG: database system is ready to accept read only connections

We see the message “database system is ready to accept read only connections”.

More verification:

Let’s create database named testslv at master side.

postgres=# create database testslv;
CREATE DATABASE
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
— — — — — -+ — — — — — + — — — — — + — — — — — — -+ — — — — —
demodbd | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
demodbd2 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
dempdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
testslv | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(7 rows)

Let’s list the databases at slave side again.

postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
— — — — — -+ — — — — — + — — — — — + — — — — — — -+ — — — — — —
demodbd | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
demodbd2 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
dempdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
testslv | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(7 rows)

We can see here that replication is working as expected.

Let’s create a object (a test table named tstslv) in the master side, demodbd2 database

postgres=# \c demodbd2
You are now connected to database “demodbd2” as user “postgres”.
demodbd2=# create table tstslv (serial int);
CREATE TABLE
demodbd2=# insert into tstslv values (3);
INSERT 0 1
demodbd2=# insert into tstslv values (7);
INSERT 0 1

and see if it replicates to slave side.

$ psql
psql (12.2)
Type “help” for help.

postgres=# \c demodbd2
You are now connected to database “demodbd2” as user “postgres”.
demodbd2=# select count(*) from tstslv;
count
— — — -
2
(1 row)

How to know which one is my master server and which one is my slave server?

From the os level you will see that for the Master side there will be a walsender process and for the slave side, you will see a walreceiver process as below —

Master side:
$ ps -ef | grep postgres | grep “walsender”
postgres 18615 17262 0 11:26 ? 00:00:00 postgres: walsender replication 192.168.56.110(27254) streaming 0/3012CC0
postgres 21217 17350 0 11:59 pts/0 00:00:00 grep — color=auto walsender

Slave side:
$ ps -ef | grep postgres | grep “walreceiver”
postgres 27540 27532 0 11:26 ? 00:00:02 postgres: walreceiver streaming 0/3012CC0
postgres 30241 27551 0 12:00 pts/0 00:00:00 grep — color=auto walreceiver

postgres=# select * from pg_stat_replication;
-[ RECORD 1 ] — — + — — — — — — — — — — — — — — —
pid | 18615
usesysid | 32807
usename | replication
application_name | walreceiver
client_addr | 192.168.56.110
client_hostname |
client_port | 27254
backend_start | 2020–04–11 11:26:50.500601+10
backend_xmin | 566
state | streaming
sent_lsn | 0/3012CC0
write_lsn | 0/3012CC0
flush_lsn | 0/3012CC0
replay_lsn | 0/3012CC0
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2020–04–11 11:39:21.444918+10

postgres=# select pg_current_wal_lsn();
-[ RECORD 1 ] — — — + — — — — —
pg_current_wal_lsn | 0/3012CC0

From Slave:
demodbd2=# select pg_is_in_recovery();
pg_is_in_recovery
— — — — — — — — — -
t
(1 row)

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ] — — — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
pid | 27540
status | streaming
receive_start_lsn | 0/3000000
receive_start_tli | 1
received_lsn | 0/3012CC0
received_tli | 1
last_msg_send_time | 2020–04–11 12:30:46.977852+10
last_msg_receipt_time | 2020–04–11 12:30:46.982434+10
latest_end_lsn | 0/3012CC0
latest_end_time | 2020–04–11 11:32:10.596368+10
slot_name |
sender_host | 192.168.56.101
sender_port | 5432
conninfo | user=replication password=******** dbname=replication host=192.168.56.101 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

postgres=# select pg_last_wal_receive_lsn();
-[ RECORD 1 ] — — — — — -+ — — — — —
pg_last_wal_receive_lsn | 0/3012CC0

postgres=# select pg_last_wal_replay_lsn();
-[ RECORD 1 ] — — — — — + — — — — —
pg_last_wal_replay_lsn | 0/3012CC0

++++++++++++++
TO PAUSE REPLAY
++++++++++++++
postgres=# select pg_wal_replay_pause();
-[ RECORD 1 ] — — — -+-
pg_wal_replay_pause |

NOTE: So we need to halt replay process from the Slave side not from the Master side

Master side:
postgres=# select pg_wal_replay_pause();
ERROR: recovery is not in progress
HINT: Recovery control functions can only be executed during recovery.

Do some change into the primary databas
======================
postgres=# \c demodbd2
You are now connected to database “demodbd2” as user “postgres”.
demodbd2=# create table test(col1 varchar);
CREATE TABLE
demodbd2=# checkpoint;
CHECKPOINT

From Slave:
postgres=# select pg_last_wal_receive_lsn();
-[ RECORD 1 ] — — — — — -+ — — — — —
pg_last_wal_receive_lsn | 0/3029C68

postgres=# select pg_last_wal_replay_lsn();
-[ RECORD 1 ] — — — — — + — — — — —
pg_last_wal_replay_lsn | 0/3012CC0

NOTE: transanction log is not flowing

This will halt replication. Transaction log will still flow from the master to slave. So data is protected on the slave side — in case Primary crash we will not loose data from the standby side

postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ] — — — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
pid | 27540
status | streaming
receive_start_lsn | 0/3000000
receive_start_tli | 1
received_lsn | 0/3012CC0
received_tli | 1
last_msg_send_time | 2020–04–11 12:40:18.082925+10
last_msg_receipt_time | 2020–04–11 12:40:18.087661+10
latest_end_lsn | 0/3012CC0
latest_end_time | 2020–04–11 11:32:10.596368+10
slot_name |
sender_host | 192.168.56.101
sender_port | 5432
conninfo | user=replication password=******** dbname=replication host=192.168.56.101 port=5432 fallback_applicatio n_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

postgres=# select pg_wal_replay_resume();
-[ RECORD 1 ] — — — — +-
pg_wal_replay_resume |

postgreSQL will start to replay WAL again.

postgres=# select pg_wal_replay_resume();
-[ RECORD 1 ] — — — — +-
pg_wal_replay_resume |

postgres=# select pg_last_wal_replay_lsn();
-[ RECORD 1 ] — — — — — + — — — — —
pg_last_wal_replay_lsn | 0/3029CA0

++++++++++++++++
Perform Failover :
++++++++++++++++

There are multiple ways to do this:

i) Shut down the master and promote standby.
PART 1: Failover From Old Master to Old Slave (New Master)

Login to the source Postgresql server and ‘become’ the postgres user.

Validate from the standby server
+++++++++++++++++++++++++++++++++++
On Slave: Check postgres: wal receiver process

++++++++++++++++++++++++
Master sever [ Current Master] :
++++++++++++++++++++++++

delete or move the contents of the data folder

mv /var/lib/pgsql/12/data /var/lib/pgsql/12/data_11042020

++++++++++++++++++++++++
Slave server : Promote to New Master
++++++++++++++++++++++++
$ /usr/pgsql-12/bin/pg_ctl promote -D /var/lib/pgsql/12/data

$ /usr/pgsql-12/bin/pg_ctl promote -D /var/lib/pgsql/12/data
waiting for server to promote…. done
server promoted

from log file [ another session]
2020–04–11 13:45:18.795 AEST [27536] LOG: redo done at 0/3029C68
2020–04–11 13:45:18.795 AEST [27536] LOG: last completed transaction was at log time 2020–04–11 13:28:52.028904+10
2020–04–11 13:45:18.869 AEST [27536] LOG: selected new timeline ID: 2
2020–04–11 13:45:19.163 AEST [27536] LOG: archive recovery complete
2020–04–11 13:45:19.255 AEST [27532] LOG: database system is ready to accept connections

++++++++++++++++++++
On New Master : Validate services
++++++++++++++++++++
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
— — — — — — — — — -
f
(1 row)

On New Master (Old Slave) : Check log if issues with services

$ tail -40 postgresql-Wed.log

In this article, we have successfully completed setup replication and failover to the master server.

--

--

Monowar Mukul

Monowar Mukul is a Cloud Solution Architect Professional. /*The statements and opinions expressed here are my own & nothing with my present or past employer*/