mirror of
https://github.com/optim-enterprises-bv/patroni.git
synced 2026-01-10 01:21:54 +00:00
- don't register secondaries with `noloadbalance` tag. - mention in the documentation that secondaries are also registered in `pg_dist_node`. - update docker/kubernetes README files to include examples with secondaries being registered in `pg_dist_node`.
371 lines
17 KiB
ReStructuredText
371 lines
17 KiB
ReStructuredText
.. _citus:
|
|
|
|
Citus support
|
|
=============
|
|
|
|
Patroni makes it extremely simple to deploy `Multi-Node Citus`__ clusters.
|
|
|
|
__ https://docs.citusdata.com/en/stable/installation/multi_node.html
|
|
|
|
TL;DR
|
|
-----
|
|
|
|
There are only a few simple rules you need to follow:
|
|
|
|
1. `Citus <https://github.com/citusdata/citus>`__ database extension to
|
|
PostgreSQL must be available on all nodes. Absolute minimum supported Citus
|
|
version is 10.0, but, to take all benefits from transparent switchovers and
|
|
restarts of workers we recommend using at least Citus 11.2.
|
|
2. Cluster name (``scope``) must be the same for all Citus nodes!
|
|
3. Superuser credentials must be the same on coordinator and all worker
|
|
nodes, and ``pg_hba.conf`` should allow superuser access between all nodes.
|
|
4. :ref:`REST API <restapi_settings>` access should be allowed from worker
|
|
nodes to the coordinator. E.g., credentials should be the same and if
|
|
configured, client certificates from worker nodes must be accepted by the
|
|
coordinator.
|
|
5. Add the following section to the ``patroni.yaml``:
|
|
|
|
.. code:: YAML
|
|
|
|
citus:
|
|
group: X # 0 for coordinator and 1, 2, 3, etc for workers
|
|
database: citus # must be the same on all nodes
|
|
|
|
|
|
After that you just need to start Patroni and it will handle the rest:
|
|
|
|
0. Patroni will set ``bootstrap.dcs.synchronous_mode`` to :ref:`quorum <quorum_mode>`
|
|
if it is not explicitly set to any other value.
|
|
1. ``citus`` extension will be automatically added to ``shared_preload_libraries``.
|
|
2. If ``max_prepared_transactions`` isn't explicitly set in the global
|
|
:ref:`dynamic configuration <dynamic_configuration>` Patroni will
|
|
automatically set it to ``2*max_connections``.
|
|
3. The ``citus.local_hostname`` GUC value will be adjusted from ``localhost`` to the
|
|
value that Patroni is using in order to connect to the local PostgreSQL
|
|
instance. The value sometimes should be different from the ``localhost``
|
|
because PostgreSQL might be not listening on it.
|
|
4. The ``citus.database`` will be automatically created followed by ``CREATE EXTENSION citus``.
|
|
5. Current superuser :ref:`credentials <postgresql_settings>` will be added to the ``pg_dist_authinfo``
|
|
table to allow cross-node communication. Don't forget to update them if
|
|
later you decide to change superuser username/password/sslcert/sslkey!
|
|
6. The coordinator primary node will automatically discover worker primary
|
|
nodes and add them to the ``pg_dist_node`` table using the
|
|
``citus_add_node()`` function.
|
|
7. Patroni will also maintain ``pg_dist_node`` in case failover/switchover
|
|
on the coordinator or worker clusters occurs.
|
|
|
|
patronictl
|
|
----------
|
|
|
|
Coordinator and worker clusters are physically different PostgreSQL/Patroni
|
|
clusters that are just logically groupped together using the
|
|
`Citus <https://github.com/citusdata/citus>`__ database extension to
|
|
PostgreSQL. Therefore in most cases it is not possible to manage them as a
|
|
single entity.
|
|
|
|
It results in two major differences in :ref:`patronictl` behaviour when
|
|
``patroni.yaml`` has the ``citus`` section comparing with the usual:
|
|
|
|
1. The ``list`` and the ``topology`` by default output all members of the Citus
|
|
formation (coordinators and workers). The new column ``Group`` indicates
|
|
which Citus group they belong to.
|
|
2. For all ``patronictl`` commands the new option is introduced, named
|
|
``--group``. For some commands the default value for the group might be
|
|
taken from the ``patroni.yaml``. For example, :ref:`patronictl_pause` will
|
|
enable the maintenance mode by default for the ``group`` that is set in the
|
|
``citus`` section, but for example for :ref:`patronictl_switchover` or
|
|
:ref:`patronictl_remove` the group must be explicitly specified.
|
|
|
|
An example of :ref:`patronictl_list` output for the Citus cluster::
|
|
|
|
postgres@coord1:~$ patronictl list demo
|
|
+ Citus cluster: demo ----------+----------------+---------+----+-----------+
|
|
| Group | Member | Host | Role | State | TL | Lag in MB |
|
|
+-------+---------+-------------+----------------+---------+----+-----------+
|
|
| 0 | coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
|
|
| 0 | coord2 | 172.27.0.6 | Quorum Standby | running | 1 | 0 |
|
|
| 0 | coord3 | 172.27.0.4 | Leader | running | 1 | |
|
|
| 1 | work1-1 | 172.27.0.8 | Quorum Standby | running | 1 | 0 |
|
|
| 1 | work1-2 | 172.27.0.2 | Leader | running | 1 | |
|
|
| 2 | work2-1 | 172.27.0.5 | Quorum Standby | running | 1 | 0 |
|
|
| 2 | work2-2 | 172.27.0.7 | Leader | running | 1 | |
|
|
+-------+---------+-------------+----------------+---------+----+-----------+
|
|
|
|
If we add the ``--group`` option, the output will change to::
|
|
|
|
postgres@coord1:~$ patronictl list demo --group 0
|
|
+ Citus cluster: demo (group: 0, 7179854923829112860) -+-----------+
|
|
| Member | Host | Role | State | TL | Lag in MB |
|
|
+--------+-------------+----------------+---------+----+-----------+
|
|
| coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
|
|
| coord2 | 172.27.0.6 | Quorum Standby | running | 1 | 0 |
|
|
| coord3 | 172.27.0.4 | Leader | running | 1 | |
|
|
+--------+-------------+----------------+---------+----+-----------+
|
|
|
|
postgres@coord1:~$ patronictl list demo --group 1
|
|
+ Citus cluster: demo (group: 1, 7179854923881963547) -+-----------+
|
|
| Member | Host | Role | State | TL | Lag in MB |
|
|
+---------+------------+----------------+---------+----+-----------+
|
|
| work1-1 | 172.27.0.8 | Quorum Standby | running | 1 | 0 |
|
|
| work1-2 | 172.27.0.2 | Leader | running | 1 | |
|
|
+---------+------------+----------------+---------+----+-----------+
|
|
|
|
Citus worker switchover
|
|
-----------------------
|
|
|
|
When a switchover is orchestrated for a Citus worker node, Citus offers the
|
|
opportunity to make the switchover close to transparent for an application.
|
|
Because the application connects to the coordinator, which in turn connects to
|
|
the worker nodes, then it is possible with Citus to `pause` the SQL traffic on
|
|
the coordinator for the shards hosted on a worker node. The switchover then
|
|
happens while the traffic is kept on the coordinator, and resumes as soon as a
|
|
new primary worker node is ready to accept read-write queries.
|
|
|
|
An example of :ref:`patronictl_switchover` on the worker cluster::
|
|
|
|
postgres@coord1:~$ patronictl switchover demo
|
|
+ Citus cluster: demo ----------+----------------+---------+----+-----------+
|
|
| Group | Member | Host | Role | State | TL | Lag in MB |
|
|
+-------+---------+-------------+----------------+---------+----+-----------+
|
|
| 0 | coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
|
|
| 0 | coord2 | 172.27.0.6 | Quorum Standby | running | 1 | 0 |
|
|
| 0 | coord3 | 172.27.0.4 | Leader | running | 1 | |
|
|
| 1 | work1-1 | 172.27.0.8 | Leader | running | 1 | |
|
|
| 1 | work1-2 | 172.27.0.2 | Quorum Standby | running | 1 | 0 |
|
|
| 2 | work2-1 | 172.27.0.5 | Quorum Standby | running | 1 | 0 |
|
|
| 2 | work2-2 | 172.27.0.7 | Leader | running | 1 | |
|
|
+-------+---------+-------------+----------------+---------+----+-----------+
|
|
Citus group: 2
|
|
Primary [work2-2]:
|
|
Candidate ['work2-1'] []:
|
|
When should the switchover take place (e.g. 2024-08-26T08:02 ) [now]:
|
|
Current cluster topology
|
|
+ Citus cluster: demo (group: 2, 7179854924063375386) -+-----------+
|
|
| Member | Host | Role | State | TL | Lag in MB |
|
|
+---------+------------+----------------+---------+----+-----------+
|
|
| work2-1 | 172.27.0.5 | Quorum Standby | running | 1 | 0 |
|
|
| work2-2 | 172.27.0.7 | Leader | running | 1 | |
|
|
+---------+------------+----------------+---------+----+-----------+
|
|
Are you sure you want to switchover cluster demo, demoting current primary work2-2? [y/N]: y
|
|
2024-08-26 07:02:40.33003 Successfully switched over to "work2-1"
|
|
+ Citus cluster: demo (group: 2, 7179854924063375386) ------+
|
|
| Member | Host | Role | State | TL | Lag in MB |
|
|
+---------+------------+---------+---------+----+-----------+
|
|
| work2-1 | 172.27.0.5 | Leader | running | 1 | |
|
|
| work2-2 | 172.27.0.7 | Replica | stopped | | unknown |
|
|
+---------+------------+---------+---------+----+-----------+
|
|
|
|
postgres@coord1:~$ patronictl list demo
|
|
+ Citus cluster: demo ----------+----------------+---------+----+-----------+
|
|
| Group | Member | Host | Role | State | TL | Lag in MB |
|
|
+-------+---------+-------------+----------------+---------+----+-----------+
|
|
| 0 | coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
|
|
| 0 | coord2 | 172.27.0.6 | Quorum Standby | running | 1 | 0 |
|
|
| 0 | coord3 | 172.27.0.4 | Leader | running | 1 | |
|
|
| 1 | work1-1 | 172.27.0.8 | Leader | running | 1 | |
|
|
| 1 | work1-2 | 172.27.0.2 | Quorum Standby | running | 1 | 0 |
|
|
| 2 | work2-1 | 172.27.0.5 | Leader | running | 2 | |
|
|
| 2 | work2-2 | 172.27.0.7 | Quorum Standby | running | 2 | 0 |
|
|
+-------+---------+-------------+----------------+---------+----+-----------+
|
|
|
|
And this is how it looks on the coordinator side::
|
|
|
|
# The worker primary notifies the coordinator that it is going to execute "pg_ctl stop".
|
|
2024-08-26 07:02:38,636 DEBUG: query(BEGIN, ())
|
|
2024-08-26 07:02:38,636 DEBUG: query(SELECT pg_catalog.citus_update_node(%s, %s, %s, true, %s), (3, '172.19.0.7-demoted', 5432, 10000))
|
|
# From this moment all application traffic on the coordinator to the worker group 2 is paused.
|
|
|
|
# The old worker primary is assiged as a secondary.
|
|
2024-08-26 07:02:40,084 DEBUG: query(SELECT pg_catalog.citus_update_node(%s, %s, %s, true, %s), (7, '172.19.0.7', 5432, 10000))
|
|
|
|
# The future worker primary notifies the coordinator that it acquired the leader lock in DCS and about to run "pg_ctl promote".
|
|
2024-08-26 07:02:40,085 DEBUG: query(SELECT pg_catalog.citus_update_node(%s, %s, %s, true, %s), (3, '172.19.0.5', 5432, 10000))
|
|
|
|
# The new worker primary just finished promote and notifies coordinator that it is ready to accept read-write traffic.
|
|
2024-08-26 07:02:41,485 DEBUG: query(COMMIT, ())
|
|
# From this moment the application traffic on the coordinator to the worker group 2 is unblocked.
|
|
|
|
Secondary nodes
|
|
---------------
|
|
|
|
Starting from Patroni v4.0.0 Citus secondary nodes without ``noloadbalance`` :ref:`tag <tags_settings>` are also registered in ``pg_dist_node``.
|
|
However, to use secondary nodes for read-only queries applications need to change `citus.use_secondary_nodes <https://docs.citusdata.com/en/latest/develop/api_guc.html#citus-use-secondary-nodes-enum>`__ GUC.
|
|
|
|
Peek into DCS
|
|
-------------
|
|
|
|
The Citus cluster (coordinator and workers) are stored in DCS as a fleet of
|
|
Patroni clusters logically grouped together::
|
|
|
|
/service/batman/ # scope=batman
|
|
/service/batman/0/ # citus.group=0, coordinator
|
|
/service/batman/0/initialize
|
|
/service/batman/0/leader
|
|
/service/batman/0/members/
|
|
/service/batman/0/members/m1
|
|
/service/batman/0/members/m2
|
|
/service/batman/1/ # citus.group=1, worker
|
|
/service/batman/1/initialize
|
|
/service/batman/1/leader
|
|
/service/batman/1/members/
|
|
/service/batman/1/members/m3
|
|
/service/batman/1/members/m4
|
|
...
|
|
|
|
Such an approach was chosen because for most DCS it becomes possible to fetch
|
|
the entire Citus cluster with a single recursive read request. Only Citus
|
|
coordinator nodes are reading the whole tree, because they have to discover
|
|
worker nodes. Worker nodes are reading only the subtree for their own group and
|
|
in some cases they could read the subtree of the coordinator group.
|
|
|
|
Citus on Kubernetes
|
|
-------------------
|
|
|
|
Since Kubernetes doesn't support hierarchical structures we had to include the
|
|
citus group to all K8s objects Patroni creates::
|
|
|
|
batman-0-leader # the leader config map for the coordinator
|
|
batman-0-config # the config map holding initialize, config, and history "keys"
|
|
...
|
|
batman-1-leader # the leader config map for worker group 1
|
|
batman-1-config
|
|
...
|
|
|
|
I.e., the naming pattern is: ``${scope}-${citus.group}-${type}``.
|
|
|
|
All Kubernetes objects are discovered by Patroni using the `label selector`__,
|
|
therefore all Pods with Patroni&Citus and Endpoints/ConfigMaps must have
|
|
similar labels, and Patroni must be configured to use them using Kubernetes
|
|
:ref:`settings <kubernetes_settings>` or :ref:`environment variables
|
|
<kubernetes_environment>`.
|
|
|
|
__ https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
|
|
|
|
A couple of examples of Patroni configuration using Pods environment variables:
|
|
|
|
1. for the coordinator cluster
|
|
|
|
.. code:: YAML
|
|
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
labels:
|
|
application: patroni
|
|
citus-group: "0"
|
|
citus-type: coordinator
|
|
cluster-name: citusdemo
|
|
name: citusdemo-0-0
|
|
namespace: default
|
|
spec:
|
|
containers:
|
|
- env:
|
|
- name: PATRONI_SCOPE
|
|
value: citusdemo
|
|
- name: PATRONI_NAME
|
|
valueFrom:
|
|
fieldRef:
|
|
apiVersion: v1
|
|
fieldPath: metadata.name
|
|
- name: PATRONI_KUBERNETES_POD_IP
|
|
valueFrom:
|
|
fieldRef:
|
|
apiVersion: v1
|
|
fieldPath: status.podIP
|
|
- name: PATRONI_KUBERNETES_NAMESPACE
|
|
valueFrom:
|
|
fieldRef:
|
|
apiVersion: v1
|
|
fieldPath: metadata.namespace
|
|
- name: PATRONI_KUBERNETES_LABELS
|
|
value: '{application: patroni}'
|
|
- name: PATRONI_CITUS_DATABASE
|
|
value: citus
|
|
- name: PATRONI_CITUS_GROUP
|
|
value: "0"
|
|
|
|
2. for the worker cluster from the group 2
|
|
|
|
.. code:: YAML
|
|
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
labels:
|
|
application: patroni
|
|
citus-group: "2"
|
|
citus-type: worker
|
|
cluster-name: citusdemo
|
|
name: citusdemo-2-0
|
|
namespace: default
|
|
spec:
|
|
containers:
|
|
- env:
|
|
- name: PATRONI_SCOPE
|
|
value: citusdemo
|
|
- name: PATRONI_NAME
|
|
valueFrom:
|
|
fieldRef:
|
|
apiVersion: v1
|
|
fieldPath: metadata.name
|
|
- name: PATRONI_KUBERNETES_POD_IP
|
|
valueFrom:
|
|
fieldRef:
|
|
apiVersion: v1
|
|
fieldPath: status.podIP
|
|
- name: PATRONI_KUBERNETES_NAMESPACE
|
|
valueFrom:
|
|
fieldRef:
|
|
apiVersion: v1
|
|
fieldPath: metadata.namespace
|
|
- name: PATRONI_KUBERNETES_LABELS
|
|
value: '{application: patroni}'
|
|
- name: PATRONI_CITUS_DATABASE
|
|
value: citus
|
|
- name: PATRONI_CITUS_GROUP
|
|
value: "2"
|
|
|
|
As you may noticed, both examples have ``citus-group`` label set. This label
|
|
allows Patroni to identify object as belonging to a certain Citus group. In
|
|
addition to that, there is also ``PATRONI_CITUS_GROUP`` environment variable,
|
|
which has the same value as the ``citus-group`` label. When Patroni creates
|
|
new Kubernetes objects ConfigMaps or Endpoints, it automatically puts the
|
|
``citus-group: ${env.PATRONI_CITUS_GROUP}`` label on them:
|
|
|
|
.. code:: YAML
|
|
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: citusdemo-0-leader # Is generated as ${env.PATRONI_SCOPE}-${env.PATRONI_CITUS_GROUP}-leader
|
|
labels:
|
|
application: patroni # Is set from the ${env.PATRONI_KUBERNETES_LABELS}
|
|
cluster-name: citusdemo # Is automatically set from the ${env.PATRONI_SCOPE}
|
|
citus-group: '0' # Is automatically set from the ${env.PATRONI_CITUS_GROUP}
|
|
|
|
You can find a complete example of Patroni deployment on Kubernetes with Citus
|
|
support in the `kubernetes`__ folder of the Patroni repository.
|
|
|
|
__ https://github.com/patroni/patroni/tree/master/kubernetes
|
|
|
|
There are two important files for you:
|
|
|
|
1. Dockerfile.citus
|
|
2. citus_k8s.yaml
|
|
|
|
Citus upgrades and PostgreSQL major upgrades
|
|
--------------------------------------------
|
|
|
|
First, please read about upgrading Citus version in the `documentation`__.
|
|
There is one minor change in the process. When executing upgrade, you have to
|
|
use :ref:`patronictl_restart` instead of ``systemctl restart`` to restart
|
|
PostgreSQL.
|
|
|
|
__ https://docs.citusdata.com/en/latest/admin_guide/upgrading_citus.html
|
|
|
|
The PostgreSQL major upgrade with Citus is a bit more complex. You will have to
|
|
combine techniques used in the Citus documentation about major upgrades and
|
|
Patroni documentation about :ref:`PostgreSQL major upgrade<major_upgrade>`.
|
|
Please keep in mind that Citus cluster consists of many Patroni clusters
|
|
(coordinator and workers) and they all have to be upgraded independently.
|