Charmed PostgreSQL

Canonical
Databases

Architecture:

Base version:

Channel	Revision	Published	Runs on
latest/stable	591	10 Apr 2025	Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04 Ubuntu 16.04 Ubuntu 14.04
latest/stable	239	09 Feb 2022	Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04 Ubuntu 16.04 Ubuntu 14.04
latest/stable	226	01 Apr 2021	Ubuntu 22.04 Ubuntu 20.04 Ubuntu 18.04 Ubuntu 16.04 Ubuntu 14.04
14/stable	553	04 Feb 2025	Ubuntu 22.04
14/stable	552	04 Feb 2025	Ubuntu 22.04
14/candidate	593	16 Apr 2025	Ubuntu 22.04
14/candidate	592	16 Apr 2025	Ubuntu 22.04
14/beta	605	27 Apr 2025	Ubuntu 22.04
14/beta	606	27 Apr 2025	Ubuntu 22.04
14/edge	741	06 May 2025	Ubuntu 22.04
14/edge	740	06 May 2025	Ubuntu 22.04
16/candidate	610	25 Apr 2025	Ubuntu 24.04
16/candidate	609	25 Apr 2025	Ubuntu 24.04
16/beta	610	24 Apr 2025	Ubuntu 24.04
16/beta	609	24 Apr 2025	Ubuntu 24.04
16/edge	758	07 May 2025	Ubuntu 24.04
16/edge	757	07 May 2025	Ubuntu 24.04

Learn to deploy on juju >

Platform:

24.04 22.04 20.04 18.04 16.04 14.04

Relevant links

Homepage

Contacts

Maintainers

Canonical Data Platform

Submit a bug

Share your thoughts on this charm with the community on discourse.

Join the discussion

PostgreSQL Switchover / Failover

Charmed PostgreSQL constantly monitors the cluster status and performs automated failover in case of Primary unit gone. Sometimes manual switchover is necessary for hardware maintenance reasons. Check the difference between them here.

The manual switchover is possible using Juju action promote-to-primary.

Important: Charmed PostgreSQL has been designed to provide maximum guaranties for the data survival in all corner cases, therefor allowed actions depends on the configured Juju unit state.

Switchover

To switchover the PostgreSQL Primary (write-endpoint) to new Juju unit, use Juju action promote-to-primary (on the unit x, which will be promoted as a new Primary):

juju run postgresql/x promote-to-primary scope=unit

Note: The manual switchover is possible on the healthy ‘Sync Standby’ unit only. Otherwise it will be rejected by Patroni with the reason explanation.

Note: It is a normal situation when Juju leader unit and PostgreSQL Primary unit are pointing to different Juju units. Juju Leader failover is fully automated and can be enforced for educational purpose only! Do NOT trigger Juju leader election for Primary moves.

Failover

Charmed PostgreSQL doesn’t provide manual failover due to lack of data safety guaranties. Advanced users can still execute it using patronictl and Patroni REST API. The same time Charmed PostgreSQL allows the cluster recovery using the full PostgreSQL/Patroni/Raft cluster re-initialization.

Raft re-initialization

Warning: this is the worst possible recovery case scenario when Primary and ALL Sync Standby units lost simultaneously and their data cannot be recovered from the disc. In this case Patroni cannot perform automatic failover for the only available Replica(s) units. Still Patroni provides the read-only access to the data.

The manual failover procedure cannot guaranty the latest SQL transactions availability on the Replica unit(s) (due to the lag distance to Primary)! Also Raft cluster consensus is not possible when one unit left in three units cluster.

The command to re-init Raft cluster should be executed when charm is ready:

the one/last Juju unit is available in Juju application
the last unit was has detected Raft majority lost, status: Raft majority loss, run: promote-to-primary

To re-initialize Raft and fix the Partition/PostgreSQL cluster (when requested):

juju run postgresql/x promote-to-primary scope=unit force=true

Example of Raft re-initialization

Deploy PostgreSQL 3 units:

> juju deploy postgresql --config synchronous_node_count=1

> juju status 
Model       Controller  Cloud/Region         Version  SLA          Timestamp
postgresql  lxd         localhost/localhost  3.6.5    unsupported  14:50:19+02:00

App         Version  Status  Scale  Charm       Channel  Rev  Exposed  Message
postgresql  14.17    active      3  postgresql  14/edge  615  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0*  active    idle   0        10.189.210.53   5432/tcp  
postgresql/1   active    idle   1        10.189.210.166  5432/tcp  
postgresql/2   active    idle   2        10.189.210.188  5432/tcp  Primary

Machine  State    Address         Inst id        Base          AZ  Message
0        started  10.189.210.53   juju-422c1a-0  ubuntu@22.04      Running
1        started  10.189.210.166  juju-422c1a-1  ubuntu@22.04      Running
2        started  10.189.210.188  juju-422c1a-2  ubuntu@22.04      Running

Find the current Primary/Standby/Replica:

> juju ssh postgresql/0
ubuntu@juju-422c1a-0:~$ sudo -u snap_daemon patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml list
+ Cluster: postgresql (7499430436963402504) ---+-----------+----+-----------+
| Member       | Host           | Role         | State     | TL | Lag in MB |
+--------------+----------------+--------------+-----------+----+-----------+
| postgresql-0 | 10.189.210.53  | Sync Standby | streaming |  3 |         0 |
| postgresql-1 | 10.189.210.166 | Replica      | streaming |  3 |         0 |
| postgresql-2 | 10.189.210.188 | Leader       | running   |  3 |           |
+--------------+----------------+--------------+-----------+----+-----------+

Kill the Leader and Sync Standby machines:

> lxc stop --force juju-422c1a-0  && lxc stop --force juju-422c1a-2

> juju status 
Model       Controller  Cloud/Region         Version  SLA          Timestamp
postgresql  lxd         localhost/localhost  3.6.5    unsupported  14:54:40+02:00

App         Version  Status  Scale  Charm       Channel  Rev  Exposed  Message
postgresql  14.17    active    1/3  postgresql  14/edge  615  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0   unknown   lost   0        10.189.210.53   5432/tcp  agent lost, see 'juju show-status-log postgresql/0'
postgresql/1*  active    idle   1        10.189.210.166  5432/tcp  <<<<<<<<< Replica unit left only
postgresql/2   unknown   lost   2        10.189.210.188  5432/tcp  agent lost, see 'juju show-status-log postgresql/2'

Machine  State    Address         Inst id        Base          AZ  Message
0        down     10.189.210.53   juju-422c1a-0  ubuntu@22.04      Running
1        started  10.189.210.166  juju-422c1a-1  ubuntu@22.04      Running
2        down     10.189.210.188  juju-422c1a-2  ubuntu@22.04      Running

At this stage it is recommended to restore the lost nodes, they will rejoin the cluster automatically once Juju detects their availability.

To start Raft re-initialization, remove DEAD machines as a signal to charm that they cannot be restored/started and no risks for split-brain:

> juju remove-machine --force 0 
WARNING This command will perform the following actions:
will remove machine 0
- will remove unit postgresql/0
- will remove storage pgdata/0
Continue [y/N]? y

> juju remove-machine --force 2
WARNING This command will perform the following actions:
will remove machine 2
- will remove unit postgresql/2
- will remove storage pgdata/2
Continue [y/N]? y

Check the status to ensure Raft majority loss:

> juju status
...
Unit           Workload  Agent      Machine  Public address  Ports     Message
postgresql/1*  blocked   executing  1        10.189.210.166  5432/tcp  Raft majority loss, run: promote-to-primary
...

Start Raft re-initialization:

> juju run postgresql/1 promote-to-primary scope=unit force=true

Wait for re-initiation to be completed:

> juju status
...
Unit           Workload     Agent      Machine  Public address  Ports     Message
postgresql/1*  maintenance  executing  3        10.189.210.166  5432/tcp  (promote-to-primary) Reinitialising raft
...

At the end, the Primary until is back:

> juju status
Model       Controller  Cloud/Region         Version  SLA          Timestamp
postgresql  lxd         localhost/localhost  3.6.5    unsupported  15:03:12+02:00

App         Version  Status  Scale  Charm       Channel  Rev  Exposed  Message
postgresql  14.17    active      1  postgresql  14/edge  615  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/1*  active    idle   1        10.189.210.166  5432/tcp  Primary

Machine  State    Address         Inst id        Base          AZ  Message
1        started  10.189.210.166  juju-422c1a-1  ubuntu@22.04      Running

Scale application to 3+ units to complete HA recovery:

> juju add-unit postgresql -n 2

The healthy status:

> juju status
Model       Controller  Cloud/Region         Version  SLA          Timestamp
postgresql  lxd         localhost/localhost  3.6.5    unsupported  15:09:56+02:00

App         Version  Status  Scale  Charm       Channel  Rev  Exposed  Message
postgresql  14.17    active      3  postgresql  14/edge  615  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/1*  active    idle   1        10.189.210.166  5432/tcp  Primary
postgresql/3   active    idle   3        10.189.210.124  5432/tcp  
postgresql/4   active    idle   4        10.189.210.178  5432/tcp  

Machine  State    Address         Inst id        Base          AZ  Message
1        started  10.189.210.166  juju-422c1a-1  ubuntu@22.04      Running
3        started  10.189.210.124  juju-422c1a-3  ubuntu@22.04      Running
4        started  10.189.210.178  juju-422c1a-4  ubuntu@22.04      Running

Help improve this document in the forum (guidelines). Last updated 8 days ago.