Merge branch 'master' of github.com:zalando/patroni into feature/citus-secondaries

This commit is contained in:
Alexander Kukushkin
2023-08-17 13:16:01 +02:00
61 changed files with 3953 additions and 1149 deletions

View File

@@ -173,4 +173,28 @@ jobs:
- uses: jakebailey/pyright-action@v1
with:
version: 1.1.317
version: 1.1.320
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: 3.11
cache: pip
- name: Install dependencies
run: pip install tox
- name: Install package dependencies
run: |
sudo apt update \
&& sudo apt install -y \
latexmk texlive-latex-extra tex-gyre \
--no-install-recommends
- name: Generate documentation
run: tox -m docs

1
.gitignore vendored
View File

@@ -51,6 +51,7 @@ scm-source.json
docs/build/
docs/source/_static/
docs/source/_templates/
docs/modules/
# Pycharm IDE
.idea/

View File

@@ -19,3 +19,8 @@ formats:
- epub
- pdf
- htmlzip
python:
install:
- requirements: requirements.docs.txt
- requirements: requirements.txt

View File

@@ -1,182 +1,12 @@
.. _contributing:
Contributing guidelines
=======================
Contributing
============
Wanna contribute to Patroni? Yay - here is how!
Resources and information for developers can be found in the pages below.
Chatting
--------
.. toctree::
:maxdepth: 2
Just want to chat with other Patroni users? Looking for interactive troubleshooting help? Join us on channel `#patroni <https://postgresteam.slack.com/archives/C9XPYG92A>`__ in the `PostgreSQL Slack <https://pgtreats.info/slack-invite>`__.
Running tests
-------------
Requirements for running behave tests:
1. PostgreSQL packages need to be installed.
2. PostgreSQL binaries must be available in your `PATH`. You may need to add them to the path with something like `PATH=/usr/lib/postgresql/11/bin:$PATH python -m behave`.
3. If you'd like to test with external DCSs (e.g., Etcd, Consul, and Zookeeper) you'll need the packages installed and respective services running and accepting unencrypted/unprotected connections on localhost and default port. In the case of Etcd or Consul, the behave test suite could start them up if binaries are available in the `PATH`.
Install dependencies:
.. code-block:: bash
# You may want to use Virtualenv or specify pip3.
pip install -r requirements.txt
pip install -r requirements.dev.txt
After you have all dependencies installed, you can run the various test suites:
.. code-block:: bash
# You may want to use Virtualenv or specify python3.
# Run flake8 to check syntax and formatting:
python setup.py flake8
# Run the pytest suite in tests/:
python setup.py test
# Run the behave (https://behave.readthedocs.io/en/latest/) test suite in features/;
# modify DCS as desired (raft has no dependencies so is the easiest to start with):
DCS=raft python -m behave
Testing with tox
----------------
To run tox tests you only need to install one dependency (other than Python)
.. code-block:: bash
pip install tox>=4
If you wish to run `behave` tests then you also need docker installed.
Tox configuration in `tox.ini` has "environments" to run the following tasks:
* lint: Python code lint with `flake8`
* test: unit tests for all available python interpreters with `pytest`,
generates XML reports or HTML reports if a TTY is detected
* dep: detect package dependency conflicts using `pipdeptree`
* type: static type checking with `pyright`
* black: code formatting with `black`
* docker-build: build docker image used for the `behave` env
* docker-cmd: run arbitrary command with the above image
* docker-behave-etcd: run tox for behave tests with above image
* py*behave: run behave with available python interpreters (without docker, although
this is what is called inside docker containers)
* docs: build docs with `sphinx`
Running tox
^^^^^^^^^^^
To run the default env list; dep, lint, test, and docs, just run:
.. code-block:: bash
tox
The `test` envs can be run with the label `test`:
.. code-block:: bash
tox -m test
The `behave` docker tests can be run with the label `behave`:
.. code-block:: bash
tox -m behave
Similarly, docs has the label `docs`.
All other envs can be run with their respective env names:
.. code-block:: bash
tox -e lint
tox -e py39-test-lin
It is also possible to select partial env lists using `factors`. For example, if you want to run
all envs for python 3.10:
.. code-block:: bash
tox -f py310
This is equivalent to running all the envs listed below:
.. code-block:: bash
$ tox -l -f py310
py310-test-lin
py310-test-mac
py310-test-win
py310-type-lin
py310-type-mac
py310-type-win
py310-behave-etcd-lin
py310-behave-etcd-win
py310-behave-etcd-mac
You can list all configured combinations of environments with tox (>=v4) like so
.. code-block:: bash
tox l
The envs `test` and `docs` will attempt to open the HTML output files
when the job completes, if tox is run with an active terminal. This
is intended to be for benefit of the developer running this env locally.
It will attempt to run `open` on a mac and `xdg-open` on Linux.
To use a different command set the env var `OPEN_CMD` to the name or path of
the command. If this step fails it will not fail the run overall.
If you want to disable this facility set the env var `OPEN_CMD` to the `:` no-op command.
.. code-block:: bash
OPEN_CMD=: tox -m docs
Behave tests
^^^^^^^^^^^^
Behave tests with `-m behave` will build docker images based on PG_MAJOR version 11 through 15 and then run all
behave tests. This can take quite a long time to run so you might want to limit the scope to a select version of
Postgres or to a specific feature set or steps.
To specify the version of postgres include the full name of the dependent image build env that you want and then the
behave env name. For instance if you want Postgres 15 use:
.. code-block:: bash
tox -e pg14-docker-build,pg14-docker-behave-etcd-lin
If on the other hand you want to test a specific feature you can pass positional arguments to behave. This will run
the watchdog behave feature test scenario with all versions of Postgres.
.. code-block:: bash
tox -m behave -- features/watchdog.feature
Of course you can combine the two.
Reporting issues
----------------
If you have a question about patroni or have a problem using it, please read the :ref:`README <readme>` before filing an issue.
Also double check with the current issues on our `Issues Tracker <https://github.com/zalando/patroni/issues>`__.
Contributing a pull request
---------------------------
1) Submit a comment to the relevant issue or create a new issue describing your proposed change.
2) Do a fork, develop and test your code changes.
3) Include documentation
4) Submit a pull request.
You'll get feedback about your pull request as soon as possible.
Happy Patroni hacking ;-)
contributing_guidelines
Patroni API docs<modules/modules>

View File

@@ -20,10 +20,15 @@
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
from patroni.version import __version__
project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
module_dir = os.path.abspath(os.path.join(project_root, 'patroni'))
excludes = ['tests', 'setup.py', 'conf']
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
@@ -33,11 +38,21 @@ from patroni.version import __version__
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.intersphinx',
extensions = [
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.mathjax',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode']
# 'sphinx.ext.viewcode',
'sphinx_github_style', # Generate "View on GitHub" for source code
'sphinxcontrib.apidoc', # For generating module docs from code
'sphinx.ext.autodoc', # For generating module docs from docstrings
'sphinx.ext.napoleon', # For Google and Numpy formatted docstrings
]
apidoc_module_dir = module_dir
apidoc_output_dir = 'modules'
apidoc_excluded_paths = excludes
apidoc_separate_modules = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
@@ -107,6 +122,34 @@ if not on_rtd: # only import and set the theme if we're building docs locally
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Replace "source" links with "edit on GitHub" when using rtd theme
html_context = {
'display_github': True,
'github_user': 'zalando',
'github_repo': 'patroni',
'github_version': 'master',
'conf_py_path': '/docs/',
}
# sphinx-github-style options, https://sphinx-github-style.readthedocs.io/en/latest/index.html
# The name of the top-level package.
top_level = "patroni"
# The blob to link to on GitHub - any of "head", "last_tag", or "{blob}"
# linkcode_blob = 'head'
# The link to your GitHub repository formatted as https://github.com/user/repo
# If not provided, will attempt to create the link from the html_context dict
# linkcode_url = f"https://github.com/{html_context['github_user']}/" \
# f"{html_context['github_repo']}/{html_context['github_version']}"
# The text to use for the linkcode link
# linkcode_link_text: str = "View on GitHub"
# A linkcode_resolve() function to use for resolving the link target
# linkcode_resolve: types.FunctionType
# -- Options for HTMLHelp output ------------------------------------------
@@ -165,7 +208,6 @@ texinfo_documents = [
]
# -- Options for Epub output ----------------------------------------------
# Bibliographic Dublin Core info.
@@ -187,10 +229,57 @@ epub_copyright = copyright
epub_exclude_files = ['search.html']
# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {'python': ('https://docs.python.org/', None)}
# Remove these pages from index, references, toc trees, etc.
# If the builder is not 'html' then add the API docs modules index to pages to be removed.
exclude_from_builder = {
'latex': ['modules/modules'],
'epub': ['modules/modules'],
}
# Internal holding list, anything added here will always be excluded
_docs_to_remove = []
def builder_inited(app):
"""Run during Sphinx `builder-inited` phase.
Set a config value to builder name and add module docs to `docs_to_remove`.
"""
print(f'The builder is: {app.builder.name}')
app.add_config_value('builder', app.builder.name, 'env')
# Remove pages when builder matches any referenced in exclude_from_builder
if exclude_from_builder.get(app.builder.name):
_docs_to_remove.extend(exclude_from_builder[app.builder.name])
def env_get_outdated(app, env, added, changed, removed):
"""Run during Sphinx `env-get-outdated` phase.
Remove the items listed in `docs_to_remove` from known pages.
"""
added.difference_update(_docs_to_remove)
changed.difference_update(_docs_to_remove)
removed.update(_docs_to_remove)
return []
def doctree_read(app, doctree):
"""Run during Sphinx `doctree-read` phase.
Remove the items listed in `docs_to_remove` from the table of contents.
"""
from sphinx import addnodes
for toc_tree_node in doctree.traverse(addnodes.toctree):
for e in toc_tree_node['entries']:
ref = str(e[1])
if ref in _docs_to_remove:
toc_tree_node['entries'].remove(e)
# A possibility to have an own stylesheet, to add new rules or override existing ones
# For the latter case, the CSS specificity of the rules should be higher than the default ones
def setup(app):
@@ -198,3 +287,8 @@ def setup(app):
app.add_css_file('custom.css')
else:
app.add_stylesheet('custom.css')
# Run extra steps to remove module docs when running with a non-html builder
app.connect('builder-inited', builder_inited)
app.connect('env-get-outdated', env_get_outdated)
app.connect('doctree-read', doctree_read)

View File

@@ -0,0 +1,182 @@
.. _contributing_guidelines:
Contributing guidelines
=======================
Wanna contribute to Patroni? Yay - here is how!
Chatting
--------
Just want to chat with other Patroni users? Looking for interactive troubleshooting help? Join us on channel `#patroni <https://postgresteam.slack.com/archives/C9XPYG92A>`__ in the `PostgreSQL Slack <https://pgtreats.info/slack-invite>`__.
Running tests
-------------
Requirements for running behave tests:
1. PostgreSQL packages need to be installed.
2. PostgreSQL binaries must be available in your `PATH`. You may need to add them to the path with something like `PATH=/usr/lib/postgresql/11/bin:$PATH python -m behave`.
3. If you'd like to test with external DCSs (e.g., Etcd, Consul, and Zookeeper) you'll need the packages installed and respective services running and accepting unencrypted/unprotected connections on localhost and default port. In the case of Etcd or Consul, the behave test suite could start them up if binaries are available in the `PATH`.
Install dependencies:
.. code-block:: bash
# You may want to use Virtualenv or specify pip3.
pip install -r requirements.txt
pip install -r requirements.dev.txt
After you have all dependencies installed, you can run the various test suites:
.. code-block:: bash
# You may want to use Virtualenv or specify python3.
# Run flake8 to check syntax and formatting:
python setup.py flake8
# Run the pytest suite in tests/:
python setup.py test
# Run the behave (https://behave.readthedocs.io/en/latest/) test suite in features/;
# modify DCS as desired (raft has no dependencies so is the easiest to start with):
DCS=raft python -m behave
Testing with tox
----------------
To run tox tests you only need to install one dependency (other than Python)
.. code-block:: bash
pip install tox>=4
If you wish to run `behave` tests then you also need docker installed.
Tox configuration in `tox.ini` has "environments" to run the following tasks:
* lint: Python code lint with `flake8`
* test: unit tests for all available python interpreters with `pytest`,
generates XML reports or HTML reports if a TTY is detected
* dep: detect package dependency conflicts using `pipdeptree`
* type: static type checking with `pyright`
* black: code formatting with `black`
* docker-build: build docker image used for the `behave` env
* docker-cmd: run arbitrary command with the above image
* docker-behave-etcd: run tox for behave tests with above image
* py*behave: run behave with available python interpreters (without docker, although
this is what is called inside docker containers)
* docs: build docs with `sphinx`
Running tox
^^^^^^^^^^^
To run the default env list; dep, lint, test, and docs, just run:
.. code-block:: bash
tox
The `test` envs can be run with the label `test`:
.. code-block:: bash
tox -m test
The `behave` docker tests can be run with the label `behave`:
.. code-block:: bash
tox -m behave
Similarly, docs has the label `docs`.
All other envs can be run with their respective env names:
.. code-block:: bash
tox -e lint
tox -e py39-test-lin
It is also possible to select partial env lists using `factors`. For example, if you want to run
all envs for python 3.10:
.. code-block:: bash
tox -f py310
This is equivalent to running all the envs listed below:
.. code-block:: bash
$ tox -l -f py310
py310-test-lin
py310-test-mac
py310-test-win
py310-type-lin
py310-type-mac
py310-type-win
py310-behave-etcd-lin
py310-behave-etcd-win
py310-behave-etcd-mac
You can list all configured combinations of environments with tox (>=v4) like so
.. code-block:: bash
tox l
The envs `test` and `docs` will attempt to open the HTML output files
when the job completes, if tox is run with an active terminal. This
is intended to be for benefit of the developer running this env locally.
It will attempt to run `open` on a mac and `xdg-open` on Linux.
To use a different command set the env var `OPEN_CMD` to the name or path of
the command. If this step fails it will not fail the run overall.
If you want to disable this facility set the env var `OPEN_CMD` to the `:` no-op command.
.. code-block:: bash
OPEN_CMD=: tox -m docs
Behave tests
^^^^^^^^^^^^
Behave tests with `-m behave` will build docker images based on PG_MAJOR version 11 through 15 and then run all
behave tests. This can take quite a long time to run so you might want to limit the scope to a select version of
Postgres or to a specific feature set or steps.
To specify the version of postgres include the full name of the dependent image build env that you want and then the
behave env name. For instance if you want Postgres 15 use:
.. code-block:: bash
tox -e pg14-docker-build,pg14-docker-behave-etcd-lin
If on the other hand you want to test a specific feature you can pass positional arguments to behave. This will run
the watchdog behave feature test scenario with all versions of Postgres.
.. code-block:: bash
tox -m behave -- features/watchdog.feature
Of course you can combine the two.
Reporting issues
----------------
If you have a question about patroni or have a problem using it, please read the :ref:`README <readme>` before filing an issue.
Also double check with the current issues on our `Issues Tracker <https://github.com/zalando/patroni/issues>`__.
Contributing a pull request
---------------------------
1) Submit a comment to the relevant issue or create a new issue describing your proposed change.
2) Do a fork, develop and test your code changes.
3) Include documentation
4) Submit a pull request.
You'll get feedback about your pull request as soon as possible.
Happy Patroni hacking ;-)

View File

@@ -10,18 +10,58 @@ To deploy a Patroni cluster without using a pre-existing PostgreSQL instance, se
Procedure
---------
A Patroni cluster can be started with a data directory from a single-node PostgreSQL database. This is achieved by following closely these steps:
You can find below an overview of steps for converting an existing Postgres cluster to a Patroni managed cluster. In the steps we assume all nodes that are part of the existing cluster are currently up and running, and that you *do not* intend to change Postgres configuration while the migration is ongoing. The steps:
1. Manually start PostgreSQL daemon
2. Create Patroni superuser and replication users as defined in the :ref:`authentication <postgresql_settings>` section of the Patroni configuration. If this user is created in SQL, the following queries achieve this:
#. Create the Postgres users as explained for :ref:`authentication <postgresql_settings>` section of the Patroni configuration. You can find sample SQL commands to create the users in the code block below, in which you need to replace the usernames and passwords as per your environment. If you already have the relevant users, then you can skip this step.
.. code-block:: sql
.. code-block:: sql
CREATE USER $PATRONI_SUPERUSER_USERNAME WITH SUPERUSER ENCRYPTED PASSWORD '$PATRONI_SUPERUSER_PASSWORD';
CREATE USER $PATRONI_REPLICATION_USERNAME WITH REPLICATION ENCRYPTED PASSWORD '$PATRONI_REPLICATION_PASSWORD';
-- Patroni superuser
-- Replace PATRONI_SUPERUSER_USERNAME and PATRONI_SUPERUSER_PASSWORD accordingly
CREATE USER PATRONI_SUPERUSER_USERNAME WITH SUPERUSER ENCRYPTED PASSWORD 'PATRONI_SUPERUSER_PASSWORD';
3. Start Patroni (e.g. ``patroni /etc/patroni/patroni.yml``). It automatically detects that PostgreSQL daemon is already running but its configuration might be out-of-date.
4. Ask Patroni to restart the node with ``patronictl restart cluster-name node-name``. This step is only required if PostgreSQL configuration is out-of-date.
-- Patroni replication user
-- Replace PATRONI_REPLICATION_USERNAME and PATRONI_REPLICATION_PASSWORD accordingly
CREATE USER PATRONI_REPLICATION_USERNAME WITH REPLICATION ENCRYPTED PASSWORD 'PATRONI_REPLICATION_PASSWORD';
-- Patroni rewind user, if you intend to enable use_pg_rewind in your Patroni configuration
-- Replace PATRONI_REWIND_USERNAME and PATRONI_REWIND_PASSWORD accordingly
CREATE USER PATRONI_REWIND_USERNAME WITH ENCRYPTED PASSWORD 'PATRONI_REWIND_PASSWORD';
GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean) TO PATRONI_REWIND_USERNAME;
GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO PATRONI_REWIND_USERNAME;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO PATRONI_REWIND_USERNAME;
GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO PATRONI_REWIND_USERNAME;
#. Perform the following steps on all Postgres nodes. Perform all steps on one node before proceeding with the next node. Start with the primary node, then proceed with each standby node:
#. If you are running Postgres through systemd, then disable the Postgres systemd unit. This is performed as Patroni manages starting and stopping the Postgres daemon.
#. Create a YAML configuration file for Patroni.
* **Note (specific for the primary node):** If you have replication slots being used for replication between cluster members, then it is recommended that you enable ``use_slots`` and configure the existing replication slots as permanent via the ``slots`` configuration item. Be aware that Patroni automatically creates replication slots for replication between members, and drops replication slots that it does not recognize, when ``use_slots`` is enabled. The idea of using permanent slots here is to allow your existing slots to persist while the migration to Patroni is in progress. See :ref:`YAML Configuration Settings <yaml_configuration>` for details.
#. Start Patroni using the ``patroni`` systemd service unit. It automatically detects that Postgres is already running and starts monitoring the instance.
#. Hand over Postgres "start up procedure" to Patroni. In order to do that you need to restart the cluster members through ``patronictl restart cluster-name member-name`` command. For minimal downtime you might want to split this step into:
#. Immediate restart of the standby nodes.
#. Scheduled restart of the primary node within a maintenance window.
#. If you configured permanent slots in step ``1.2.``, then you should remove them from ``slots`` configuration through ``patronictl edit-config cluster-name member-name`` command once the ``restart_lsn`` of the slots created by Patroni is able to catch up with the ``restart_lsn`` of the original slots for the corresponding members. By removing the slots from ``slots`` configuration you will allow Patroni to drop the original slots from your cluster once they are not needed anymore. You can find below an example query to check the ``restart_lsn`` of a couple slots, so you can compare them:
.. code-block:: sql
-- Assume original_slot_for_member_x is the name of the slot in your original
-- cluster for replicating changes to member X, and slot_for_member_x is the
-- slot created by Patroni for that purpose. You need restart_lsn of
-- slot_for_member_x to be >= restart_lsn of original_slot_for_member_x
SELECT slot_name,
restart_lsn
FROM pg_replication_slots
WHERE slot_name IN (
'original_slot_for_member_x',
'slot_for_member_x'
)
.. _major_upgrade:
@@ -30,14 +70,14 @@ Major Upgrade of PostgreSQL Version
The only possible way to do a major upgrade currently is:
1. Stop Patroni
2. Upgrade PostgreSQL binaries and perform `pg_upgrade <https://www.postgresql.org/docs/current/pgupgrade.html>`_ on the primary node
3. Update patroni.yml
4. Remove the initialize key from DCS or wipe complete cluster state from DCS. The second one could be achieved by running ``patronictl remove <cluster-name>``. It is necessary because pg_upgrade runs initdb which actually creates a new database with a new PostgreSQL system identifier.
5. If you wiped the cluster state in the previous step, you may wish to copy patroni.dynamic.json from old data dir to the new one. It will help you to retain some PostgreSQL parameters you had set before.
6. Start Patroni on the primary node.
7. Upgrade PostgreSQL binaries, update patroni.yml and wipe the data_dir on standby nodes.
8. Start Patroni on the standby nodes and wait for the replication to complete.
#. Stop Patroni
#. Upgrade PostgreSQL binaries and perform `pg_upgrade <https://www.postgresql.org/docs/current/pgupgrade.html>`_ on the primary node
#. Update patroni.yml
#. Remove the initialize key from DCS or wipe complete cluster state from DCS. The second one could be achieved by running ``patronictl remove <cluster-name>``. It is necessary because pg_upgrade runs initdb which actually creates a new database with a new PostgreSQL system identifier.
#. If you wiped the cluster state in the previous step, you may wish to copy patroni.dynamic.json from old data dir to the new one. It will help you to retain some PostgreSQL parameters you had set before.
#. Start Patroni on the primary node.
#. Upgrade PostgreSQL binaries, update patroni.yml and wipe the data_dir on standby nodes.
#. Start Patroni on the standby nodes and wait for the replication to complete.
Running pg_upgrade on standby nodes is not supported by PostgreSQL. If you know what you are doing, you can try the rsync procedure described in https://www.postgresql.org/docs/current/pgupgrade.html instead of wiping data_dir on standby nodes. The safest way is however to let Patroni replicate the data for you.

View File

@@ -12,7 +12,7 @@ In both cases, it is important to be clear about the following concepts:
- You should run the odd number of etcd, ZooKeeper or Consul nodes: 3 or 5!
Synchronous Replication
----------------------------
-----------------------
To have a multi DC cluster that can automatically tolerate a zone drop, a minimum of 3 is required.
@@ -27,7 +27,7 @@ Regarding postgres, we must deploy at least 2 nodes, in different DC. Then you h
This enables sync replication and the primary node will choose one of the nodes as synchronous.
Asynchronous Replication
----------------------------------
------------------------
With only two data centers it would be better to have two independent etcd clusters and run Patroni :ref:`standby cluster <standby_cluster>` in the second data center. If the first site is down, you can MANUALLY promote the ``standby_cluster``.

View File

@@ -40,6 +40,13 @@ Currently supported PostgreSQL versions: 9.3 to 15.
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. ifconfig:: builder == 'html'
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
.. ifconfig:: builder != 'html'
* :ref:`genindex`
* :ref:`search`

View File

@@ -32,8 +32,11 @@ Configuration
Patroni Kubernetes :ref:`settings <kubernetes_settings>` and :ref:`environment variables <kubernetes_environment>` are described in the general chapters of the documentation.
.. _kubernetes_role_values:
Customize role label
^^^^^^^^^^^^^^^^^^^^
By default, Patroni will set corresponding labels on the pod it runs in based on node's role, such as ``role=master``.
The key and value of label can be customized by `kubernetes.role_label`, `kubernetes.leader_label_value`, `kubernetes.follower_label_value` and `kubernetes.standby_leader_label_value`.

View File

@@ -44,7 +44,6 @@ Some of the PostgreSQL parameters **must hold the same values on the primary and
- **max_worker_processes**: 8
- **max_prepared_transactions**: 0
- **wal_level**: hot_standby
- **wal_log_hints**: on
- **track_commit_timestamp**: off
For the parameters below, PostgreSQL does not require equal values among the primary and all the replicas. However, considering the possibility of a replica to become the primary at any time, it doesn't really make sense to set them differently; therefore, **Patroni restricts setting their values to the** :ref:`dynamic configuration <dynamic_configuration>`.
@@ -62,6 +61,7 @@ There are some other Postgres parameters controlled by Patroni:
- **port** - is set either from ``postgresql.listen`` or from ``PATRONI_POSTGRESQL_LISTEN`` environment variable
- **cluster_name** - is set either from ``scope`` or from ``PATRONI_SCOPE`` environment variable
- **hot_standby: on**
- **wal_log_hints: on** - for Postgres 9.4 and newer.
To be on the safe side parameters from the above lists are not written into ``postgresql.conf``, but passed as a list of arguments to the ``pg_ctl start`` which gives them the highest precedence, even above `ALTER SYSTEM <https://www.postgresql.org/docs/current/static/sql-altersystem.html>`__

View File

@@ -3,6 +3,84 @@
Release notes
=============
Version 3.1.0
-------------
**Breaking changes**
- Changed semantic of ``restapi.keyfile`` and ``restapi.certfile`` (Alexander Kukushkin)
Previously Patroni was using ``restapi.keyfile`` and ``restapi.certfile`` as client certificates as a fallback if there were no respective configuration parameters in the ``ctl`` section.
.. warning::
If you enabled client certificates validation (``restapi.verify_client`` is set to ``required``), you also **must** provide **valid client certificates** in the ``ctl.certfile``, ``ctl.keyfile``, ``ctl.keyfile_password``. If not provided, Patroni will not work correctly.
**New features**
- Make Pod role label configurable (Waynerv)
Values could be customized using ``kubernetes.leader_label_value``, ``kubernetes.follower_label_value`` and ``kubernetes.standby_leader_label_value`` parameters. This feature will be very useful when we change the ``master`` role to the ``primary``. You can read more about the feature and migration steps :ref:`here <kubernetes_role_values>`.
**Improvements**
- Various improvements of ``patroni --validate-config`` (Alexander Kukushkin)
Improved parameter validation for different DCS, ``bootstrap.dcs`` , ``ctl``, ``restapi``, and ``watchdog`` sections.
- Start Postgres not in recovery if it crashed during recovery while Patroni is running (Alexander Kukushkin)
It may reduce recovery time and will help to prevent unnecessary timeline increments.
- Avoid unnecessary updates of ``/status`` key (Alexander Kukushkin)
When there are no permanent logical slots Patroni was updating the ``/status`` on every heartbeat loop even when LSN on the primary didn't move forward.
- Don't allow stale primary to win the leader race (Alexander Kukushkin)
If Patroni was hanging during a significant time due to lack of resources it will additionally check that no other nodes promoted Postgres before acquiring the leader lock.
- Implemented visibility of certain PostgreSQL parameters validation (Alexander Kukushkin, Feike Steenbergen)
If validation of ``max_connections``, ``max_wal_senders``, ``max_prepared_transactions``, ``max_locks_per_transaction``, ``max_replication_slots``, or ``max_worker_processes`` failed Patroni was using some sane default value. Now in addition to that it will also show a warning.
- Set permissions for files and directories created in ``PGDATA`` (Alexander Kukushkin)
All files created by Patroni had only owner read/write permissions. This behaviour was breaking backup tools that run under a different user and relying on group read permissions. Now Patroni honors permissions on ``PGDATA`` and correctly sets permissions on all directories and files it creates inside ``PGDATA``.
**Bugfixes**
- Run ``archive_command`` through shell (Waynerv)
Patroni might archive some WAL segments before doing crash recovery in a single-user mode or before ``pg_rewind``. If the archive_command contains some shell operators, like ``&&`` it didn't work with Patroni.
- Fixed "on switchover" shutdown checks (Polina Bungina)
It was possible that specified candidate is still streaming and didn't received shut down checking but the leader key was removed because some other nodes were healthy.
- Fixed "is primary" check (Alexander Kukushkin)
During the leader race replicas were not able to recognize that Postgres on the old leader is still running as a primary.
- Fixed ``patronictl list`` (Alexander Kukushkin)
The Cluster name field was missing in ``tsv``, ``json``, and ``yaml`` output formats.
- Fixed ``pg_rewind`` behaviour after pause (Alexander Kukushkin)
Under certain conditions, Patroni wasn't able to join the false primary back to the cluster with ``pg_rewind`` after coming out of maintenance mode.
- Fixed bug in Etcd v3 implementation (Alexander Kukushkin)
Invalidate internal KV cache if key update performed using ``create_revision``/``mod_revision`` field due to revision mismatch.
- Fixed behaviour of replicas in standby cluster in pause (Alexander Kukushkin)
When the leader key expires replicas in standby cluster will not follow the remote node but keep ``primary_conninfo`` as it is.
Version 3.0.4
-------------

View File

@@ -275,7 +275,6 @@ Config endpoint
"use_pg_rewind": true,
"parameters": {
"hot_standby": "on",
"wal_log_hints": "on",
"wal_level": "hot_standby",
"max_wal_senders": 5,
"max_replication_slots": 5,
@@ -302,7 +301,6 @@ Config endpoint
"use_pg_rewind": true,
"parameters": {
"hot_standby": "on",
"wal_log_hints": "on",
"wal_level": "hot_standby",
"max_wal_senders": 5,
"max_replication_slots": 5,
@@ -355,7 +353,6 @@ If you want to remove (reset) some setting just patch it with ``null``:
"hot_standby": "on",
"unix_socket_directories": ".",
"wal_level": "hot_standby",
"wal_log_hints": "on",
"max_wal_senders": 5,
"max_replication_slots": 5
}
@@ -369,7 +366,7 @@ The above call removes ``postgresql.parameters.max_connections`` from the dynami
.. code-block:: bash
$ curl -s -XPUT -d \
'{"maximum_lag_on_failover":1048576,"retry_timeout":10,"postgresql":{"use_slots":true,"use_pg_rewind":true,"parameters":{"hot_standby":"on","wal_log_hints":"on","wal_level":"hot_standby","unix_socket_directories":".","max_wal_senders":5}},"loop_wait":3,"ttl":20}' \
'{"maximum_lag_on_failover":1048576,"retry_timeout":10,"postgresql":{"use_slots":true,"use_pg_rewind":true,"parameters":{"hot_standby":"on","wal_level":"hot_standby","unix_socket_directories":".","max_wal_senders":5}},"loop_wait":3,"ttl":20}' \
http://localhost:8008/config | jq .
{
"ttl": 20,
@@ -381,7 +378,6 @@ The above call removes ``postgresql.parameters.max_connections`` from the dynami
"hot_standby": "on",
"unix_socket_directories": ".",
"wal_level": "hot_standby",
"wal_log_hints": "on",
"max_wal_senders": 5
},
"use_pg_rewind": true

View File

@@ -30,9 +30,15 @@ Log
Bootstrap configuration
-----------------------
.. note::
Once Patroni has initialized the cluster for the first time and settings have been stored in the DCS, all future
changes to the ``bootstrap.dcs`` section of the YAML configuration will not take any effect! If you want to change
them please use either ``patronictl edit-config`` or the Patroni :ref:`REST API <rest_api>`.
- **bootstrap**:
- **dcs**: This section will be written into `/<namespace>/<scope>/config` of the given configuration store after initializing of new cluster. The global dynamic configuration for the cluster. Under the ``bootstrap.dcs`` you can put any of the parameters described in the :ref:`Dynamic Configuration settings <dynamic_configuration>` and after Patroni initialized (bootstrapped) the new cluster, it will write this section into `/<namespace>/<scope>/config` of the configuration store. All later changes of ``bootstrap.dcs`` will not take any effect! If you want to change them please use either ``patronictl edit-config`` or Patroni :ref:`REST API <rest_api>`.
- **dcs**: This section will be written into `/<namespace>/<scope>/config` of the given configuration store after initializing the new cluster. The global dynamic configuration for the cluster. You can put any of the parameters described in the :ref:`Dynamic Configuration settings <dynamic_configuration>` under ``bootstrap.dcs`` and after Patroni has initialized (bootstrapped) the new cluster, it will write this section into `/<namespace>/<scope>/config` of the configuration store.
- **method**: custom script to use for bootstrapping this cluster.
See :ref:`custom bootstrap methods documentation <custom_bootstrap>` for details.
@@ -43,17 +49,24 @@ Bootstrap configuration
- **- data-checksums**: Must be enabled when pg_rewind is needed on 9.3.
- **- encoding: UTF8**: default encoding for new databases.
- **- locale: UTF8**: default locale for new databases.
- **users**: Some additional users which need to be created after initializing new cluster
- **admin**: the name of user
- **password**: (optional) password for the user
- **options**: list of options for CREATE USER statement
- **- createrole**
- **- createdb**
- **users**: Some additional users which need to be created after initializing new cluster, see :ref:`Bootstrap users configuration <bootstrap_users_configuration>` below.
- **post\_bootstrap** or **post\_init**: An additional script that will be executed after initializing the cluster. The script receives a connection string URL (with the cluster superuser as a user name). The PGPASSFILE variable is set to the location of pgpass file.
.. _bootstrap_users_configuration:
Bootstrap users configuration
=============================
Users which need to be created after initializing the cluster:
- **admin**: the name of user
- **password**: (optional) password for the user
- **options**: list of options for CREATE USER statement
- **- createrole**
- **- createdb**
.. _citus_settings:
Citus

View File

@@ -1,3 +1,8 @@
"""Patroni main entry point.
Implement ``patroni`` main daemon and expose its entry point.
"""
import logging
import os
import signal
@@ -16,8 +21,33 @@ logger = logging.getLogger(__name__)
class Patroni(AbstractPatroniDaemon):
"""Implement ``patroni`` command daemon.
:ivar version: Patroni version.
:ivar dcs: DCS object.
:ivar watchdog: watchdog handler, if configured to use watchdog.
:ivar postgresql: managed Postgres instance.
:ivar api: REST API server instance of this node.
:ivar request: wrapper for performing HTTP requests.
:ivar ha: HA handler.
:ivar tags: cache of custom tags configured for this node.
:ivar next_run: time when to run the next HA loop cycle.
:ivar scheduled_restart: when a restart has been scheduled to occur, if any. In that case, should contain two keys:
* ``schedule``: timestamp when restart should occur;
* ``postmaster_start_time``: timestamp when Postgres was last started.
"""
def __init__(self, config: 'Config') -> None:
"""Create a :class:`Patroni` instance with the given *config*.
Get a connection to the DCS, configure watchdog (if required), set up Patroni interface with Postgres, configure
the HA loop and bring the REST API up.
.. note::
Expected to be instantiated and run through :func:`~patroni.daemon.abstract_main`.
:param config: Patroni configuration.
"""
from patroni.api import RestApiServer
from patroni.dcs import get_dcs
from patroni.ha import Ha
@@ -46,6 +76,17 @@ class Patroni(AbstractPatroniDaemon):
self.scheduled_restart: Dict[str, Any] = {}
def load_dynamic_configuration(self) -> None:
"""Load Patroni dynamic configuration.
Load dynamic configuration from the DCS, if `/config` key is available in the DCS, otherwise fall back to
``bootstrap.dcs`` section from the configuration file.
If the DCS connection fails returning the exception :class:`~patroni.exceptions.DCSError` an attempt will be
remade every 5 seconds.
.. note::
This method is called only once, at the time when Patroni is started.
"""
from patroni.exceptions import DCSError
while True:
try:
@@ -74,25 +115,54 @@ class Patroni(AbstractPatroniDaemon):
if not isinstance(member, Member):
return
try:
_ = self.request(member, endpoint="/liveness")
_ = self.request(member, endpoint="/liveness", timeout=3)
logger.fatal("Can't start; there is already a node named '%s' running", self.config['name'])
sys.exit(1)
except Exception:
return
def get_tags(self) -> Dict[str, Any]:
"""Get tags configured for this node, if any.
Handle both predefined Patroni tags and custom defined tags.
.. note::
A custom tag is any tag added to the configuration ``tags`` section that is not one of ``clonefrom``,
``nofailover``, ``noloadbalance`` or ``nosync``.
For the Patroni predefined tags, the returning object will only contain them if they are enabled as they
all are boolean values that default to disabled.
:returns: a dictionary of tags set for this node. The key is the tag name, and the value is the corresponding
tag value.
"""
return {tag: value for tag, value in self.config.get('tags', {}).items()
if tag not in ('clonefrom', 'nofailover', 'noloadbalance', 'nosync') or value}
@property
def nofailover(self) -> bool:
"""``True`` if ``tags.nofailover`` configuration is enabled for this node, else ``False``."""
return bool(self.tags.get('nofailover', False))
@property
def nosync(self) -> bool:
"""``True`` if ``tags.nosync`` configuration is enabled for this node, else ``False``."""
return bool(self.tags.get('nosync', False))
def reload_config(self, sighup: bool = False, local: Optional[bool] = False) -> None:
"""Apply new configuration values for ``patroni`` daemon.
Reload:
* Cached tags;
* Request wrapper configuration;
* REST API configuration;
* Watchdog configuration;
* Postgres configuration;
* DCS configuration.
:param sighup: if it is related to a SIGHUP signal.
:param local: if there has been changes to the local configuration file.
"""
try:
super(Patroni, self).reload_config(sighup, local)
if local:
@@ -107,14 +177,21 @@ class Patroni(AbstractPatroniDaemon):
logger.exception('Failed to reload config_file=%s', self.config.config_file)
@property
def replicatefrom(self):
def replicatefrom(self) -> Optional[str]:
"""Value of ``tags.replicatefrom`` configuration, if any."""
return self.tags.get('replicatefrom')
@property
def noloadbalance(self):
def noloadbalance(self) -> bool:
"""``True`` if ``tags.noloadbalance`` configuration is enabled for this node, else ``False``."""
return bool(self.tags.get('noloadbalance', False))
def schedule_next_run(self) -> None:
"""Schedule the next run of the ``patroni`` daemon main loop.
Next run is scheduled based on previous run plus value of ``loop_wait`` configuration from DCS. If that has
already been exceeded, run the next cycle immediately.
"""
self.next_run += self.dcs.loop_wait
current_time = time.time()
nap_time = self.next_run - current_time
@@ -128,11 +205,21 @@ class Patroni(AbstractPatroniDaemon):
self.next_run = time.time()
def run(self) -> None:
"""Run ``patroni`` daemon process main loop.
Start the REST API and keep running HA cycles every ``loop_wait`` seconds.
"""
self.api.start()
self.next_run = time.time()
super(Patroni, self).run()
def _run_cycle(self) -> None:
"""Run a cycle of the ``patroni`` daemon main loop.
Run an HA cycle and schedule the next cycle run. If any dynamic configuration change request is detected, apply
the change and cache the new dynamic configuration values in ``patroni.dynamic.json`` file under Postgres data
directory.
"""
logger.info(self.ha.run_cycle())
if self.dcs.cluster and self.dcs.cluster.config and self.dcs.cluster.config.data \
@@ -145,6 +232,10 @@ class Patroni(AbstractPatroniDaemon):
self.schedule_next_run()
def _shutdown(self) -> None:
"""Perform shutdown of ``patroni`` daemon process.
Shut down the REST API and the HA handler.
"""
try:
self.api.shutdown()
except Exception:
@@ -156,18 +247,54 @@ class Patroni(AbstractPatroniDaemon):
def patroni_main(configfile: str) -> None:
"""Configure and start ``patroni`` main daemon process.
:param configfile: path to Patroni configuration file.
"""
from multiprocessing import freeze_support
# Windows executables created by PyInstaller are frozen, thus we need to enable frozen support for
# :mod:`multiprocessing` to avoid :class:`RuntimeError` exceptions.
freeze_support()
abstract_main(Patroni, configfile)
def process_arguments() -> Namespace:
"""Process command-line arguments.
Create a basic command-line parser through :func:`~patroni.daemon.get_base_arg_parser`, extend its capabilities by
adding these flags and parse command-line arguments.:
* ``--validate-config`` -- used to validate the Patroni configuration file
* ``--generate-config`` -- used to generate Patroni configuration from a running PostgreSQL instance
* ``--generate-sample-config`` -- used to generate a sample Patroni configuration
.. note::
If running with ``--generate-config``, ``--generate-sample-config`` or ``--validate-flag`` will exit
after generating or validating configuration.
:returns: parsed arguments, if not running with ``--validate-config`` flag.
"""
from patroni.config_generator import generate_config
parser = get_base_arg_parser()
parser.add_argument('--validate-config', action='store_true', help='Run config validator and exit')
group = parser.add_mutually_exclusive_group()
group.add_argument('--validate-config', action='store_true', help='Run config validator and exit')
group.add_argument('--generate-sample-config', action='store_true',
help='Generate a sample Patroni yaml configuration file')
group.add_argument('--generate-config', action='store_true',
help='Generate a Patroni yaml configuration file for a running instance')
parser.add_argument('--dsn', help='Optional DSN string of the instance to be used as a source \
for config generation. Superuser connection is required.')
args = parser.parse_args()
if args.validate_config:
if args.generate_sample_config:
generate_config(args.configfile, True, None)
sys.exit(0)
elif args.generate_config:
generate_config(args.configfile, False, args.dsn)
sys.exit(0)
elif args.validate_config:
from patroni.validator import schema
from patroni.config import Config, ConfigParseError
@@ -181,6 +308,16 @@ def process_arguments() -> Namespace:
def main() -> None:
"""Main entrypoint of :mod:`patroni.__main__`.
Process command-line arguments, ensure :mod:`psycopg2` (or :mod:`psycopg`) attendee the pre-requisites and start
``patroni`` daemon process.
.. note::
If running through a Docker container, make the main process take care of init process duties and run
``patroni`` daemon as another process. In that case relevant signals received by the main process and forwarded
to ``patroni`` daemon process.
"""
from patroni import check_psycopg
args = process_arguments()
@@ -196,7 +333,13 @@ def main() -> None:
# Looks like we are in a docker, so we will act like init
def sigchld_handler(signo: int, stack_frame: Optional[FrameType]) -> None:
"""Handle ``SIGCHLD`` received by main process from ``patroni`` daemon when the daemon terminates.
:param signo: signal number.
:param stack_frame: current stack frame.
"""
try:
# log exit code of all children processes, and break loop when there is none left
while True:
ret = os.waitpid(-1, os.WNOHANG)
if ret == (0, 0):
@@ -206,7 +349,12 @@ def main() -> None:
except OSError:
pass
def passtochild(signo: int, stack_frame: Optional[FrameType]):
def passtochild(signo: int, stack_frame: Optional[FrameType]) -> None:
"""Forward a signal *signo* from main process to child process.
:param signo: signal number.
:param stack_frame: current stack frame.
"""
if pid:
os.kill(pid, signo)

View File

@@ -49,9 +49,23 @@ def check_access(func: Callable[['RestApiHandler'], None]) -> Callable[..., None
:Example:
@check_access
def do_PUT_foo():
pass
>>> class FooServer:
... def check_access(self, *args, **kwargs):
... print(f'In FooServer: {args[0].__class__.__name__}')
... return True
...
>>> class Foo:
... server = FooServer()
... @check_access
... def do_PUT_foo(self):
... print('In do_PUT_foo')
>>> f = Foo()
>>> f.do_PUT_foo()
In FooServer: Foo
In do_PUT_foo
"""
def wrapper(self: 'RestApiHandler', *args: Any, **kwargs: Any) -> None:
@@ -97,6 +111,7 @@ class RestApiHandler(BaseHTTPRequestHandler):
"""Write a response that is composed only of the HTTP status.
The response is written with these values separated by space:
* HTTP protocol version;
* *status_code*;
* description of *status_code*.
@@ -157,19 +172,19 @@ class RestApiHandler(BaseHTTPRequestHandler):
Modifies *response* before sending it to the client. Defines the ``patroni`` key, which is a
dictionary that contains the mandatory keys:
* ``version``: Patroni version, e.g. ``3.0.2``;
* ``scope``: value of ``scope`` setting from Patroni configuration.
* ``version``: Patroni version, e.g. ``3.0.2``;
* ``scope``: value of ``scope`` setting from Patroni configuration.
May also add the following optional keys, depending on the status of this Patroni/PostgreSQL node:
* ``tags``: tags that were set through Patroni configuration merged with dynamically applied tags;
* ``database_system_identifier``: ``Database system identifier`` from ``pg_controldata`` output;
* ``pending_restart``: ``True`` if PostgreSQL is pending to be restarted;
* ``scheduled_restart``: a dictionary with a single key ``schedule``, which is the timestamp for the scheduled
restart;
* ``watchdog_failed``: ``True`` if watchdog device is unhealthy;
* ``logger_queue_size``: log queue length if it is longer than expected;
* ``logger_records_lost``: number of log records that have been lost while the log queue was full.
* ``tags``: tags that were set through Patroni configuration merged with dynamically applied tags;
* ``database_system_identifier``: ``Database system identifier`` from ``pg_controldata`` output;
* ``pending_restart``: ``True`` if PostgreSQL is pending to be restarted;
* ``scheduled_restart``: a dictionary with a single key ``schedule``, which is the timestamp for the
scheduled restart;
* ``watchdog_failed``: ``True`` if watchdog device is unhealthy;
* ``logger_queue_size``: log queue length if it is longer than expected;
* ``logger_records_lost``: number of log records that have been lost while the log queue was full.
:param status_code: response HTTP status code.
:param response: represents the status of the PostgreSQL node, and is used as a basis for the HTTP response.
@@ -204,32 +219,54 @@ class RestApiHandler(BaseHTTPRequestHandler):
Is used for handling all health-checks requests. E.g. "GET /(primary|replica|sync|async|etc...)".
The (optional) query parameters and the HTTP response status depend on the requested path:
* ``/``, ``primary``, or ``read-write``:
* HTTP status ``200``: if a primary with the leader lock.
* ``/standby-leader``:
* HTTP status ``200``: if holds the leader lock in a standby cluster.
* ``/leader``:
* HTTP status ``200``: if holds the leader lock.
* ``/replica``:
* Query parameters:
* ``lag``: only accept replication lag up to ``lag``. Accepts either an :class:`int`, which
represents lag in bytes, or a :class:`str` representing lag in human-readable format (e.g.
``10MB``).
* Any custom parameter: will attempt to match them against node tags.
* HTTP status ``200``: if up and running as a standby and without ``noloadbalance`` tag.
* ``/read-only``:
* HTTP status ``200``: if up and running and without ``noloadbalance`` tag.
* ``/synchronous`` or ``/sync``:
* HTTP status ``200``: if up and running as a synchronous standby.
* ``/read-only-sync``:
* HTTP status ``200``: if up and running as a synchronous standby or primary.
* ``/asynchronous``:
* Query parameters:
* ``lag``: only accept replication lag up to ``lag``. Accepts either an :class:`int`, which
represents lag in bytes, or a :class:`str` representing lag in human-readable format (e.g.
``10MB``).
* HTTP status ``200``: if up and running as an asynchronous standby.
* ``/health``:
* HTTP status ``200``: if up and running.
.. note::
@@ -333,16 +370,16 @@ class RestApiHandler(BaseHTTPRequestHandler):
def do_OPTIONS(self) -> None:
"""Handle an ``OPTIONS`` request.
Write a simple HTTP response that represents the current PostgreSQL status. Send only `200 OK` or
`503 Service Unavailable` as a response and nothing more, particularly no headers.
Write a simple HTTP response that represents the current PostgreSQL status. Send only ``200 OK`` or
``503 Service Unavailable`` as a response and nothing more, particularly no headers.
"""
self.do_GET(write_status_code_only=True)
def do_HEAD(self) -> None:
"""Handle a ``HEAD`` request.
Write a simple HTTP response that represents the current PostgreSQL status. Send only `200 OK` or
`503 Service Unavailable` as a response and nothing more, particularly no headers.
Write a simple HTTP response that represents the current PostgreSQL status. Send only ``200 OK`` or
``503 Service Unavailable`` as a response and nothing more, particularly no headers.
"""
self.do_GET(write_status_code_only=True)
@@ -350,11 +387,17 @@ class RestApiHandler(BaseHTTPRequestHandler):
"""Handle a ``GET`` request to ``/liveness`` path.
Write a simple HTTP response with HTTP status:
* ``200``:
* If the cluster is in maintenance mode; or
* If Patroni heartbeat loop is properly running;
* ``503`` if Patroni heartbeat loop last run was more than ``ttl`` setting ago on the primary (or twice the
value of ``ttl`` on a replica).
* ``503``:
* if Patroni heartbeat loop last run was more than ``ttl`` setting ago on the primary (or twice the
value of ``ttl`` on a replica).
"""
patroni: Patroni = self.server.patroni
is_primary = patroni.postgresql.role in ('master', 'primary') and patroni.postgresql.is_running()
@@ -371,10 +414,14 @@ class RestApiHandler(BaseHTTPRequestHandler):
"""Handle a ``GET`` request to ``/readiness`` path.
Write a simple HTTP response which HTTP status can be:
* ``200``:
* If this Patroni node holds the DCS leader lock; or
* If this PostgreSQL instance is up and running;
* ``503``: if none of the previous conditions apply.
"""
patroni = self.server.patroni
if patroni.ha.is_leader():
@@ -397,8 +444,8 @@ class RestApiHandler(BaseHTTPRequestHandler):
def do_GET_cluster(self) -> None:
"""Handle a ``GET`` request to ``/cluster`` path.
Write an HTTP response with JSON content based on the output of :func:`cluster_as_json`, with HTTP status
``200`` and the JSON representation of the cluster topology.
Write an HTTP response with JSON content based on the output of :func:`~patroni.utils.cluster_as_json`, with
HTTP status ``200`` and the JSON representation of the cluster topology.
"""
cluster = self.server.patroni.dcs.get_cluster(True)
global_config = self.server.patroni.config.get_global_config(cluster)
@@ -412,11 +459,13 @@ class RestApiHandler(BaseHTTPRequestHandler):
The response contains a :class:`list` of failover/switchover events. Each item is a :class:`list` with the
following items:
* Timeline when the event occurred (class:`int`);
* LSN at which the event occurred (class:`int`);
* The reason for the event (class:`str`);
* Timestamp when the new timeline was created (class:`str`);
* Name of the involved Patroni node (class:`str`).
"""
cluster = self.server.patroni.dcs.cluster or self.server.patroni.dcs.get_cluster()
self._write_json_response(200, cluster.history and cluster.history.lines or [])
@@ -443,32 +492,33 @@ class RestApiHandler(BaseHTTPRequestHandler):
The response contains the following items:
* ``patroni_version``: Patroni version without periods, e.g. ``030002`` for Patroni ``3.0.2``;
* ``patroni_postgres_running``: ``1`` if PostgreSQL is running, else ``0``;
* ``patroni_postmaster_start_time``: epoch timestamp since Postmaster was started;
* ``patroni_master``: ``1`` if this node holds the leader lock, else ``0``;
* ``patroni_primary``: same as ``patroni_master``;
* ``patroni_xlog_location``: ``pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')`` if leader, else ``0``;
* ``patroni_standby_leader``: ``1`` if standby leader node, else ``0``;
* ``patroni_replica``: ``1`` if a replica, else ``0``;
* ``patroni_sync_standby``: ``1`` if a sync replica, else ``0``;
* ``patroni_xlog_received_location``: ``pg_wal_lsn_diff(pg_last_wal_receive_lsn(), '0/0')``;
* ``patroni_xlog_replayed_location``: ``pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '0/0)``;
* ``patroni_xlog_replayed_timestamp``: ``pg_last_xact_replay_timestamp``;
* ``patroni_xlog_paused``: ``pg_is_wal_replay_paused()``;
* ``patroni_postgres_server_version``: Postgres version without periods, e.g. ``150002`` for Postgres ``15.2``;
* ``patroni_cluster_unlocked``: ``1`` if no one holds the leader lock, else ``0``;
* ``patroni_failsafe_mode_is_active``: ``1`` if ``failsafe_mode`` is currently active, else ``0``;
* ``patroni_postgres_timeline``: PostgreSQL timeline based on current WAL file name;
* ``patroni_dcs_last_seen``: epoch timestamp when DCS was last contacted successfully;
* ``patroni_pending_restart``: ``1`` if this PostgreSQL node is pending a restart, else ``0``;
* ``patroni_is_paused``: ``1`` if Patroni is in maintenance node, else ``0``.
* ``patroni_version``: Patroni version without periods, e.g. ``030002`` for Patroni ``3.0.2``;
* ``patroni_postgres_running``: ``1`` if PostgreSQL is running, else ``0``;
* ``patroni_postmaster_start_time``: epoch timestamp since Postmaster was started;
* ``patroni_master``: ``1`` if this node holds the leader lock, else ``0``;
* ``patroni_primary``: same as ``patroni_master``;
* ``patroni_xlog_location``: ``pg_wal_lsn_diff(pg_current_wal_flush_lsn(), '0/0')`` if leader, else ``0``;
* ``patroni_standby_leader``: ``1`` if standby leader node, else ``0``;
* ``patroni_replica``: ``1`` if a replica, else ``0``;
* ``patroni_sync_standby``: ``1`` if a sync replica, else ``0``;
* ``patroni_xlog_received_location``: ``pg_wal_lsn_diff(pg_last_wal_receive_lsn(), '0/0')``;
* ``patroni_xlog_replayed_location``: ``pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '0/0)``;
* ``patroni_xlog_replayed_timestamp``: ``pg_last_xact_replay_timestamp``;
* ``patroni_xlog_paused``: ``pg_is_wal_replay_paused()``;
* ``patroni_postgres_server_version``: Postgres version without periods, e.g. ``150002`` for Postgres
``15.2``;
* ``patroni_cluster_unlocked``: ``1`` if no one holds the leader lock, else ``0``;
* ``patroni_failsafe_mode_is_active``: ``1`` if ``failsafe_mode`` is currently active, else ``0``;
* ``patroni_postgres_timeline``: PostgreSQL timeline based on current WAL file name;
* ``patroni_dcs_last_seen``: epoch timestamp when DCS was last contacted successfully;
* ``patroni_pending_restart``: ``1`` if this PostgreSQL node is pending a restart, else ``0``;
* ``patroni_is_paused``: ``1`` if Patroni is in maintenance node, else ``0``.
For PostgreSQL v9.6+ the response will also have the following:
* ``patroni_postgres_streaming``: 1 if Postgres is streaming from another node, else ``0``;
* ``patroni_postgres_in_archive_recovery``: ``1`` if Postgres isn't streaming and
there is ``restore_command`` available, else ``0``.
* ``patroni_postgres_streaming``: 1 if Postgres is streaming from another node, else ``0``;
* ``patroni_postgres_in_archive_recovery``: ``1`` if Postgres isn't streaming and
there is ``restore_command`` available, else ``0``.
"""
postgres = self.get_postgresql_status(True)
patroni = self.server.patroni
@@ -667,7 +717,7 @@ class RestApiHandler(BaseHTTPRequestHandler):
def do_POST_reload(self) -> None:
"""Handle a ``POST`` request to ``/reload`` path.
Schedules a reload to Patroni and writes a response with HTTP status `202`.
Schedules a reload to Patroni and writes a response with HTTP status ``202``.
"""
self.server.patroni.sighup_handler()
self.write_response(202, 'reload scheduled')
@@ -728,13 +778,17 @@ class RestApiHandler(BaseHTTPRequestHandler):
:param schedule: a string representing a timestamp, e.g. ``2023-04-14T20:27:00+00:00``.
:param action: the action to be scheduled (``restart``, ``switchover``, or ``failover``).
:returns: a tuple composed of 3 items
:returns: a tuple composed of 3 items:
* Suggested HTTP status code for a response:
* ``None``: if no issue was faced while parsing, leaving it up to the caller to decide the status; or
* ``400``: if no timezone information could be found in *schedule*; or
* ``422``: if *schedule* is invalid -- in the past or not parsable.
* An error message, if any error is faced, otherwise ``None``;
* Parsed *schedule*, if able to parse, otherwise ``None``.
"""
error = None
scheduled_at = None
@@ -761,25 +815,31 @@ class RestApiHandler(BaseHTTPRequestHandler):
Used to restart postgres (or schedule a restart), mainly by ``patronictl restart``.
The request body should be a JSON dictionary, and it can contain the following keys:
* ``schedule``: timestamp at which the restart should occur;
* ``role``: restart only nodes which role is ``role``. Can be either:
* ``primary`` (or ``master``); or
* ``replica``.
* ``postgres_version``: restart only nodes which PostgreSQL version is less than ``postgres_version``, e.g.
``15.2``;
* ``timeout``: if restart takes longer than ``timeout`` return an error and fail over to a replica;
* ``restart_pending``: if we should restart only when have ``pending restart`` flag;
Response HTTP status codes:
* ``200``: if successfully performed an immediate restart; or
* ``202``: if successfully scheduled a restart for later; or
* ``500``: if the cluster is in maintenance mode; or
* ``400``: if
* ``role`` value is invalid; or
* ``postgres_version`` value is invalid; or
* ``timeout`` is not a number, or lesser than ``0``; or
* request contains an unknown key; or
* exception is faced while performing an immediate restart.
* ``409``: if another restart was already previously scheduled; or
* ``503``: if any issue was found while performing an immediate restart; or
* HTTP status returned by :func:`parse_schedule`, if any error was observed while parsing the schedule.
@@ -857,6 +917,7 @@ class RestApiHandler(BaseHTTPRequestHandler):
Used to remove a scheduled restart of PostgreSQL.
Response HTTP status codes:
* ``200``: if a scheduled restart was removed; or
* ``404``: if no scheduled restart could be found.
"""
@@ -875,6 +936,7 @@ class RestApiHandler(BaseHTTPRequestHandler):
Used to remove a scheduled switchover in the cluster.
It writes a response, and the HTTP status code can be:
* ``200``: if a scheduled switchover was removed; or
* ``404``: if no scheduled switchover could be found; or
* ``409``: if not able to update the switchover info in the DCS.
@@ -896,11 +958,13 @@ class RestApiHandler(BaseHTTPRequestHandler):
"""Handle a ``POST`` request to ``/reinitialize`` path.
The request body may contain a JSON dictionary with the following key:
* ``force``: ``True`` if we want to cancel an already running task in order to reinit a replica.
Response HTTP status codes:
* ``200``: if the reinit operation has started; or
* ``503``: if any error is returned by :func:`Ha.reinitialize`.
* ``503``: if any error is returned by :func:`~patroni.ha.Ha.reinitialize`.
"""
request = self._read_json_content(body_is_optional=True)
@@ -924,11 +988,15 @@ class RestApiHandler(BaseHTTPRequestHandler):
:param candidate: name of the Patroni node to be promoted.
:param action: the action that is ongoing (``switchover`` or ``failover``).
:returns: a tuple composed of 2 items
:returns: a tuple composed of 2 items:
* Response HTTP status codes:
* ``200``: if the operation succeeded; or
* ``503``: if the operation failed or timed out.
* A status message about the operation.
"""
timeout = max(10, self.server.patroni.dcs.loop_wait)
for _ in range(0, timeout * 2):
@@ -987,12 +1055,14 @@ class RestApiHandler(BaseHTTPRequestHandler):
Handles manual failovers/switchovers, mainly from ``patronictl``.
The request body should be a JSON dictionary, and it can contain the following keys:
* ``leader``: name of the current leader in the cluster;
* ``candidate``: name of the Patroni node to be promoted;
* ``scheduled_at``: a string representing the timestamp when to execute the switchover/failover, e.g.
``2023-04-14T20:27:00+00:00``.
Response HTTP status codes:
* ``202``: if operation has been scheduled;
* ``412``: if operation is not possible;
* ``503``: if unable to register the operation to the DCS;
@@ -1069,8 +1139,8 @@ class RestApiHandler(BaseHTTPRequestHandler):
def do_POST_citus(self) -> None:
"""Handle a ``POST`` request to ``/citus`` path.
Call :func:`CitusHandler.handle_event` to handle the request, then write a response with HTTP status code
``200``.
Call :func:`~patroni.postgresql.CitusHandler.handle_event` to handle the request, then write a response with
HTTP status code ``200``.
.. note::
If unable to parse the request body, then the request is silently discarded.
@@ -1086,18 +1156,21 @@ class RestApiHandler(BaseHTTPRequestHandler):
self.write_response(200, 'OK')
def parse_request(self) -> bool:
"""Override :func:`parse_request` method to enrich basic functionality of :class:`BaseHTTPRequestHandler`.
"""Override :func:`parse_request` to enrich basic functionality of :class:`~http.server.BaseHTTPRequestHandler`.
Original class can only invoke :func:`do_GET`, :func:`do_POST`, :func:`do_PUT`, etc method implementations if
they are defined.
But we would like to have at least some simple routing mechanism, i.e.:
* ``GET /uri1/part2`` request should invoke :func:`do_GET_uri1()`
* ``POST /other`` should invoke :func:`do_POST_other()`
If the :func:`do_<REQUEST_METHOD>_<first_part_url>` method does not exist we'll fall back to original behavior.
:returns: ``True`` for success, ``False`` for failure; on failure, any relevant error response has already been
sent back.
sent back.
"""
ret = BaseHTTPRequestHandler.parse_request(self)
if ret:
@@ -1131,36 +1204,46 @@ class RestApiHandler(BaseHTTPRequestHandler):
Some of the values are collected by executing a query and other are taken from the state stored in memory.
:param retry: whether the query should be retried if failed or give up immediately
:returns: a dict with the status of Postgres/Patroni. The keys are:
* ``state``: Postgres state among ``stopping``, ``stopped``, ``stop failed``, ``crashed``, ``running``,
``starting``, ``start failed``, ``restarting``, ``restart failed``, ``initializing new cluster``,
``initdb failed``, ``running custom bootstrap script``, ``custom bootstrap failed``,
``creating replica``, or ``unknown``;
``starting``, ``start failed``, ``restarting``, ``restart failed``, ``initializing new cluster``,
``initdb failed``, ``running custom bootstrap script``, ``custom bootstrap failed``,
``creating replica``, or ``unknown``;
* ``postmaster_start_time``: ``pg_postmaster_start_time()``;
* ``role``: ``replica`` or ``master`` based on ``pg_is_in_recovery()`` output;
* ``server_version``: Postgres version without periods, e.g. ``150002`` for Postgres ``15.2``;
* ``xlog``: dictionary. Its structure depends on ``role``:
* If ``master``:
* ``location``: ``pg_current_wal_lsn()``
* ``location``: ``pg_current_wal_flush_lsn()``
* If ``replica``:
* ``received_location``: ``pg_wal_lsn_diff(pg_last_wal_receive_lsn(), '0/0')``;
* ``replayed_location``: ``pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '0/0)``;
* ``replayed_timestamp``: ``pg_last_xact_replay_timestamp``;
* ``paused``: ``pg_is_wal_replay_paused()``;
* ``sync_standby``: ``True`` if replication mode is synchronous and this is a sync standby;
* ``timeline``: PostgreSQL primary node timeline;
* ``replication``: :class:`list` of :class:`dict` entries, one for each replication connection. Each entry
contains the following keys:
* ``application_name``: ``pg_stat_activity.application_name``;
* ``client_addr``: ``pg_stat_activity.client_addr``;
* ``state``: ``pg_stat_replication.state``;
* ``sync_priority``: ``pg_stat_replication.sync_priority``;
* ``sync_state``: ``pg_stat_replication.sync_state``;
* ``usename``: ``pg_stat_activity.usename``.
* ``pause``: ``True`` if cluster is in maintenance mode;
* ``cluster_unlocked``: ``True`` if cluster has no node holding the leader lock;
* ``failsafe_mode_is_active``: ``True`` if DCS failsafe mode is currently active;
* ``dcs_last_seen``: epoch timestamp DCS was last reached by Patroni.
"""
postgresql = self.server.patroni.postgresql
cluster = self.server.patroni.dcs.cluster
@@ -1179,8 +1262,8 @@ class RestApiHandler(BaseHTTPRequestHandler):
" application_name, client_addr, w.state, sync_state, sync_priority"
" FROM pg_catalog.pg_stat_get_wal_senders() w, pg_catalog.pg_stat_get_activity(pid)) AS ri")
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name,
postgresql.wal_flush), retry=retry)[0]
result = {
'state': postgresql.state,
'postmaster_start_time': row[0],
@@ -1291,8 +1374,10 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
:param params: positional arguments to be used as parameters for *sql*.
:returns: a list of rows that were fetched from the database.
:raises psycopg.Error: if had issues while executing *sql*.
:raises PostgresConnectionException: if had issues while connecting to the database.
:raises:
:class:`psycopg.Error`: if had issues while executing *sql*.
:class:`~patroni.exceptions.PostgresConnectionException`: if had issues while connecting to the database.
"""
cursor = None
try:
@@ -1352,7 +1437,7 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
:param host: hostname to be checked.
:param port: port to be checked.
:rtype: Iterator[Union[IPv4Network, IPv6Network]] of *host* + *port* resolved to IP networks.
:yields: *host* + *port* resolved to IP networks.
"""
try:
for _, _, _, _, sa in socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM, socket.IPPROTO_TCP):
@@ -1366,8 +1451,7 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
.. note::
Only yields object if ``restapi.allowlist_include_members`` setting is enabled.
:rtype: Iterator[Union[IPv4Network, IPv6Network]] of each node ``restapi.connect_address`` resolved to an IP
network.
:yields: each node ``restapi.connect_address`` resolved to an IP network.
"""
cluster = self.patroni.dcs.cluster
if self.__allowlist_include_members and cluster:
@@ -1387,8 +1471,10 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
"""Ensure client has enough privileges to perform a given request.
Write a response back to the client if any issue is observed, and the HTTP status may be:
* ``401``: if ``Authorization`` header is missing or contain an invalid password;
* ``403``: if:
* ``restapi.allowlist`` was configured, but client IP is not in the allowed list; or
* ``restapi.allowlist_include_members`` is enabled, but client IP is not in the members list; or
* a client certificate is expected by the server, but is missing in the request.
@@ -1468,17 +1554,20 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
``host`` can be a hostname or IP address. It is the value of ``restapi.listen`` setting.
:param ssl_options: dictionary that may contain the following keys, depending on what has been configured in
``restapi` section:
* ``certfile``: path to PEM certificate. If given, will start in HTTPS mode;
* ``keyfile``: path to key of ``certfile``;
* ``keyfile_password``: password for decrypting ``keyfile``;
* ``cafile``: path to CA file to validate client certificates;
* ``ciphers``: permitted cipher suites;
* ``verify_client``: value can be one among:
* ``none``: do not check client certificates;
* ``optional``: check client certificate only for unsafe REST API endpoints;
* ``required``: check client certificate for all REST API endpoints.
:raises ValueError: if any issue is faced while parsing *listen*.
:raises:
:class:`ValueError`: if any issue is faced while parsing *listen*.
"""
try:
host, port = split_host_port(listen, None)
@@ -1526,7 +1615,8 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
client_address: Tuple[str, int]) -> None:
"""Process a request to the REST API.
Wrapper for :func:`ThreadingMixIn.process_request_thread` that additionally:
Wrapper for :func:`~socketserver.ThreadingMixIn.process_request_thread` that additionally:
* Enable TCP keepalive
* Perform SSL handshake (if an SSL socket).
@@ -1544,7 +1634,8 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
def shutdown_request(self, request: Union[socket.socket, Tuple[bytes, socket.socket]]) -> None:
"""Shut down a request to the REST API.
Wrapper for :func:`HTTPServer.shutdown_request` that additionally:
Wrapper for :func:`http.server.HTTPServer.shutdown_request` that additionally:
* Perform SSL shutdown handshake (if a SSL socket).
:param request: socket to handle the client request.
@@ -1592,7 +1683,7 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
:param value: list of IPs and/or networks contained in ``restapi.allowlist`` setting. Each item can be a host,
an IP, or a network in CIDR format.
:rtype: Iterator[Union[IPv4Network, IPv6Network]] of *host* + *port* resolved to IP networks.
:yields: *host* + *port* resolved to IP networks.
"""
if isinstance(value, list):
for v in value:
@@ -1609,7 +1700,9 @@ class RestApiServer(ThreadingMixIn, HTTPServer, Thread):
"""Reload REST API configuration.
:param config: dictionary representing values under the ``restapi`` configuration section.
:raises ValueError: if ``listen`` key is not present in *config*.
:raises:
:class:`ValueError`: if ``listen`` key is not present in *config*.
"""
if 'listen' not in config: # changing config in runtime
raise ValueError('Can not find "restapi.listen" config')

View File

@@ -3,7 +3,7 @@
Provides a case insensitive :class:`dict` and :class:`set` object types.
"""
from collections import OrderedDict
from typing import Any, Collection, Dict, Iterator, MutableMapping, MutableSet, Optional
from typing import Any, Collection, Dict, Iterator, KeysView, MutableMapping, MutableSet, Optional
class CaseInsensitiveSet(MutableSet[str]):
@@ -187,6 +187,13 @@ class CaseInsensitiveDict(MutableMapping[str, Any]):
"""
return CaseInsensitiveDict({v[0]: v[1] for v in self._values.values()})
def keys(self) -> KeysView[str]:
"""Return a new view of the dict's keys.
:returns: a set-like object providing a view on the dict's keys
"""
return self._values.keys()
def __repr__(self) -> str:
"""Get a string representation of the dict.

View File

@@ -1,3 +1,4 @@
"""Facilities related to Patroni configuration."""
import json
import logging
import os
@@ -13,6 +14,7 @@ from . import PATRONI_ENV_PREFIX
from .collections import CaseInsensitiveDict
from .dcs import ClusterConfig, Cluster
from .exceptions import ConfigParseError
from .file_perm import pg_perm
from .postgresql.config import ConfigHandler
from .utils import deep_compare, parse_bool, parse_int, patch_config
@@ -34,121 +36,162 @@ _AUTH_ALLOWED_PARAMETERS = (
def default_validator(conf: Dict[str, Any]) -> List[str]:
"""Ensure *conf* is not empty.
Designed to be used as default validator for :class:`Config` objects, if no specific validator is provided.
:param conf: configuration to be validated.
:returns: an empty list -- :class:`Config` expects the validator to return a list of 0 or more issues found while
validating the configuration.
:raises:
:class:`ConfigParseError`: if *conf* is empty.
"""
if not conf:
raise ConfigParseError("Config is empty.")
return []
class GlobalConfig(object):
"""A class that wraps global configuration and provides convenient methods to access/check values.
"""A class that wrapps global configuration and provides convinient methods to access/check values.
It is instantiated by calling :func:`Config.global_config` method which picks either a
configuration from provided :class:`Cluster` object (the most up-to-date) or from the
local cache if :class::`ClusterConfig` is not initialized or doesn't have a valid config.
It is instantiated either by calling :func:`get_global_config` or :meth:`Config.get_global_config`, which picks
either a configuration from provided :class:`Cluster` object (the most up-to-date) or from the
local cache if :class:`ClusterConfig` is not initialized or doesn't have a valid config.
"""
def __init__(self, config: Dict[str, Any]) -> None:
"""Initialize :class:`GlobalConfig` object.
"""Initialize :class:`GlobalConfig` object with given *config*.
:param config: current configuration either from
:class:`ClusterConfig` or from :class:`Config.dynamic_configuration`
:class:`ClusterConfig` or from :func:`Config.dynamic_configuration`.
"""
self.__config = config
def get(self, name: str) -> Any:
"""Gets global configuration value by name.
"""Gets global configuration value by *name*.
:param name: parameter name
:returns: configuration value or `None` if it is missing
:param name: parameter name.
:returns: configuration value or ``None`` if it is missing.
"""
return self.__config.get(name)
def check_mode(self, mode: str) -> bool:
"""Checks whether the certain parameter is enabled.
:param mode: parameter name could be: synchronous_mode, failsafe_mode, pause, check_timeline, and so on
:returns: `True` if *mode* is enabled in the global configuration.
:param mode: parameter name, e.g. ``synchronous_mode``, ``failsafe_mode``, ``pause``, ``check_timeline``, and
so on.
:returns: ``True`` if parameter *mode* is enabled in the global configuration.
"""
return bool(parse_bool(self.__config.get(mode)))
@property
def is_paused(self) -> bool:
""":returns: `True` if cluster is in maintenance mode."""
"""``True`` if cluster is in maintenance mode."""
return self.check_mode('pause')
@property
def is_synchronous_mode(self) -> bool:
""":returns: `True` if synchronous replication is requested."""
"""``True`` if synchronous replication is requested."""
return self.check_mode('synchronous_mode')
@property
def is_synchronous_mode_strict(self) -> bool:
""":returns: `True` if at least one synchronous node is required."""
"""``True`` if at least one synchronous node is required."""
return self.check_mode('synchronous_mode_strict')
def get_standby_cluster_config(self) -> Union[Dict[str, Any], Any]:
""":returns: "standby_cluster" configuration."""
"""Get ``standby_cluster`` configuration.
:returns: a copy of ``standby_cluster`` configuration.
"""
return deepcopy(self.get('standby_cluster'))
@property
def is_standby_cluster(self) -> bool:
""":returns: `True` if global configuration has a valid "standby_cluster" section."""
"""``True`` if global configuration has a valid ``standby_cluster`` section."""
config = self.get_standby_cluster_config()
return isinstance(config, dict) and\
bool(config.get('host') or config.get('port') or config.get('restore_command'))
def get_int(self, name: str, default: int = 0) -> int:
"""Gets current value from the global configuration and trying to return it as int.
"""Gets current value of *name* from the global configuration and try to return it as :class:`int`.
:param name: name of the parameter
:param default: default value if *name* is not in the configuration or invalid
:returns: currently configured value from the global configuration or *default* if it is not set or invalid.
:param name: name of the parameter.
:param default: default value if *name* is not in the configuration or invalid.
:returns: currently configured value of *name* from the global configuration or *default* if it is not set or
invalid.
"""
ret = parse_int(self.get(name))
return default if ret is None else ret
@property
def min_synchronous_nodes(self) -> int:
""":returns: the minimal number of synchronous nodes based on whether strict mode is requested or not."""
"""The minimal number of synchronous nodes based on whether ``synchronous_mode_strict`` is enabled or not."""
return 1 if self.is_synchronous_mode_strict else 0
@property
def synchronous_node_count(self) -> int:
""":returns: currently configured value from the global configuration or 1 if it is not set or invalid."""
"""Currently configured value of ``synchronous_node_count`` from the global configuration.
Assume ``1`` if it is not set or invalid.
"""
return max(self.get_int('synchronous_node_count', 1), self.min_synchronous_nodes)
@property
def maximum_lag_on_failover(self) -> int:
""":returns: currently configured value from the global configuration or 1048576 if it is not set or invalid."""
"""Currently configured value of ``maximum_lag_on_failover`` from the global configuration.
Assume ``1048576`` if it is not set or invalid.
"""
return self.get_int('maximum_lag_on_failover', 1048576)
@property
def maximum_lag_on_syncnode(self) -> int:
""":returns: currently configured value from the global configuration or -1 if it is not set or invalid."""
"""Currently configured value of ``maximum_lag_on_syncnode`` from the global configuration.
Assume ``-1`` if it is not set or invalid.
"""
return self.get_int('maximum_lag_on_syncnode', -1)
@property
def primary_start_timeout(self) -> int:
""":returns: currently configured value from the global configuration or 300 if it is not set or invalid."""
"""Currently configured value of ``primary_start_timeout`` from the global configuration.
Assume ``300`` if it is not set or invalid.
.. note::
``master_start_timeout`` is still supported to keep backward compatibility.
"""
default = 300
return self.get_int('primary_start_timeout', default)\
if 'primary_start_timeout' in self.__config else self.get_int('master_start_timeout', default)
@property
def primary_stop_timeout(self) -> int:
""":returns: currently configured value from the global configuration or 300 if it is not set or invalid."""
"""Currently configured value of ``primary_stop_timeout`` from the global configuration.
Assume ``0`` if it is not set or invalid.
.. note::
``master_stop_timeout`` is still supported to keep backward compatibility.
"""
default = 0
return self.get_int('primary_stop_timeout', default)\
if 'primary_stop_timeout' in self.__config else self.get_int('master_stop_timeout', default)
def get_global_config(cluster: Union[Cluster, None], default: Optional[Dict[str, Any]] = None) -> GlobalConfig:
def get_global_config(cluster: Optional[Cluster], default: Optional[Dict[str, Any]] = None) -> GlobalConfig:
"""Instantiates :class:`GlobalConfig` based on the input.
:param cluster: the currently known cluster state from DCS
:param default: default configuration, which will be used if there is no valid *cluster.config*
:returns: :class:`GlobalConfig` object
:param cluster: the currently known cluster state from DCS.
:param default: default configuration, which will be used if there is no valid *cluster.config*.
:returns: :class:`GlobalConfig` object.
"""
# Try to protect from the case when DCS was wiped out
if cluster and cluster.config and cluster.config.modify_version:
@@ -159,23 +202,29 @@ def get_global_config(cluster: Union[Cluster, None], default: Optional[Dict[str,
class Config(object):
"""
"""Handle Patroni configuration.
This class is responsible for:
1) Building and giving access to `effective_configuration` from:
* `Config.__DEFAULT_CONFIG` -- some sane default values
* `dynamic_configuration` -- configuration stored in DCS
* `local_configuration` -- configuration from `config.yml` or environment
1) Building and giving access to ``effective_configuration`` from:
2) Saving and loading `dynamic_configuration` into 'patroni.dynamic.json' file
* ``Config.__DEFAULT_CONFIG`` -- some sane default values;
* ``dynamic_configuration`` -- configuration stored in DCS;
* ``local_configuration`` -- configuration from `config.yml` or environment.
2) Saving and loading ``dynamic_configuration`` into 'patroni.dynamic.json' file
located in local_configuration['postgresql']['data_dir'] directory.
This is necessary to be able to restore `dynamic_configuration`
if DCS was accidentally wiped
This is necessary to be able to restore ``dynamic_configuration``
if DCS was accidentally wiped.
3) Loading of configuration file in the old format and converting it into new format
3) Loading of configuration file in the old format and converting it into new format.
4) Mimicking some of the `dict` interfaces to make it possible
to work with it as with the old `config` object.
4) Mimicking some ``dict`` interfaces to make it possible
to work with it as with the old ``config`` object.
:cvar PATRONI_CONFIG_VARIABLE: name of the environment variable that can be used to load Patroni configuration from.
:cvar __CACHE_FILENAME: name of the file used to cache dynamic configuration under Postgres data directory.
:cvar __DEFAULT_CONFIG: default configuration values for some Patroni settings.
"""
PATRONI_CONFIG_VARIABLE = PATRONI_ENV_PREFIX + 'CONFIGURATION'
@@ -193,21 +242,38 @@ class Config(object):
'recovery_min_apply_delay': ''
},
'postgresql': {
'bin_dir': '',
'use_slots': True,
'parameters': CaseInsensitiveDict({p: v[0] for p, v in ConfigHandler.CMDLINE_OPTIONS.items()
if p not in ('wal_keep_segments', 'wal_keep_size')})
if v[0] is not None and p not in ('wal_keep_segments', 'wal_keep_size')})
}
}
def __init__(self, configfile: str,
validator: Optional[Callable[[Dict[str, Any]], List[str]]] = default_validator) -> None:
"""Create a new instance of :class:`Config` and validate the loaded configuration using *validator*.
.. note::
Patroni will read configuration from these locations in this order:
* file or directory path passed as command-line argument (*configfile*), if it exists and the file or
files found in the directory can be parsed (see :meth:`~Config._load_config_path`), otherwise
* YAML file passed via the environment variable (see :cvar:`PATRONI_CONFIG_VARIABLE`), if the referenced
file exists and can be parsed, otherwise
* from configuration values defined as environment variables, see
:meth:`~Config._build_environment_configuration`.
:param configfile: path to Patroni configuration file.
:param validator: function used to validate Patroni configuration. It should receive a dictionary which
represents Patroni configuration, and return a list of zero or more error messages based on validation.
:raises:
:class:`ConfigParseError`: if any issue is reported by *validator*.
"""
self._modify_version = -1
self._dynamic_configuration = {}
self.__environment_configuration = self._build_environment_configuration()
# Patroni reads the configuration from the command-line argument if it exists, otherwise from the environment
self._config_file = configfile if configfile and os.path.exists(configfile) else None
if self._config_file:
self._local_configuration = self._load_config_file()
@@ -223,21 +289,48 @@ class Config(object):
self.__effective_configuration = self._build_effective_configuration({}, self._local_configuration)
self._data_dir = self.__effective_configuration.get('postgresql', {}).get('data_dir', "")
self._cache_file = os.path.join(self._data_dir, self.__CACHE_FILENAME)
self._load_cache()
if validator: # patronictl uses validator=None and we don't want to load anything from local cache in this case
self._load_cache()
self._cache_needs_saving = False
@property
def config_file(self) -> Union[str, None]:
def config_file(self) -> Optional[str]:
"""Path to Patroni configuration file, if any, else ``None``."""
return self._config_file
@property
def dynamic_configuration(self) -> Dict[str, Any]:
"""Deep copy of cached Patroni dynamic configuration."""
return deepcopy(self._dynamic_configuration)
def _load_config_path(self, path: str) -> Dict[str, Any]:
@property
def local_configuration(self) -> Dict[str, Any]:
"""Deep copy of cached Patroni local configuration.
:returns: copy of :attr:`~Config._local_configuration`
"""
If path is a file, loads the yml file pointed to by path.
If path is a directory, loads all yml files in that directory in alphabetical order
return deepcopy(dict(self._local_configuration))
@classmethod
def get_default_config(cls) -> Dict[str, Any]:
"""Deep copy default configuration.
:returns: copy of :attr:`~Config.__DEFAULT_CONFIG`
"""
return deepcopy(cls.__DEFAULT_CONFIG)
def _load_config_path(self, path: str) -> Dict[str, Any]:
"""Load Patroni configuration file(s) from *path*.
If *path* is a file, load the yml file pointed to by *path*.
If *path* is a directory, load all yml files in that directory in alphabetical order.
:param path: path to either an YAML configuration file, or to a folder containing YAML configuration files.
:returns: configuration after reading the configuration file(s) from *path*.
:raises:
:class:`ConfigParseError`: if *path* is invalid.
"""
if os.path.isfile(path):
files = [path]
@@ -256,14 +349,18 @@ class Config(object):
return overall_config
def _load_config_file(self) -> Dict[str, Any]:
"""Loads config.yaml from filesystem and applies some values which were set via ENV"""
"""Load configuration file(s) from filesystem and apply values which were set via environment variables.
:returns: final configuration after merging configuration file(s) and environment variables.
"""
if TYPE_CHECKING: # pragma: no cover
assert self._config_file is not None
config = self._load_config_path(self._config_file)
assert self.config_file is not None
config = self._load_config_path(self.config_file)
patch_config(config, self.__environment_configuration)
return config
def _load_cache(self) -> None:
"""Load dynamic configuration from ``patroni.dynamic.json``."""
if os.path.isfile(self._cache_file):
try:
with open(self._cache_file) as f:
@@ -272,14 +369,22 @@ class Config(object):
logger.exception('Exception when loading file: %s', self._cache_file)
def save_cache(self) -> None:
"""Save dynamic configuration to ``patroni.dynamic.json`` under Postgres data directory.
.. note::
``patroni.dynamic.jsonXXXXXX`` is created as a temporary file and than renamed to ``patroni.dynamic.json``,
where ``XXXXXX`` is a random suffix.
"""
if self._cache_needs_saving:
tmpfile = fd = None
try:
pg_perm.set_permissions_from_data_directory(self._data_dir)
(fd, tmpfile) = tempfile.mkstemp(prefix=self.__CACHE_FILENAME, dir=self._data_dir)
with os.fdopen(fd, 'w') as f:
fd = None
json.dump(self.dynamic_configuration, f)
tmpfile = shutil.move(tmpfile, self._cache_file)
os.chmod(self._cache_file, pg_perm.file_create_mode)
self._cache_needs_saving = False
except Exception:
logger.exception('Exception when saving file: %s', self._cache_file)
@@ -296,9 +401,16 @@ class Config(object):
# configuration could be either ClusterConfig or dict
def set_dynamic_configuration(self, configuration: Union[ClusterConfig, Dict[str, Any]]) -> bool:
"""Set dynamic configuration values with given *configuration*.
:param configuration: new dynamic configuration values. Supports :class:`dict` for backward compatibility.
:returns: ``True`` if changes have been detected between current dynamic configuration and the new dynamic
*configuration*, ``False`` otherwise.
"""
if isinstance(configuration, ClusterConfig):
if self._modify_version == configuration.modify_version:
return False # If the version didn't changed there is nothing to do
return False # If the version didn't change there is nothing to do
self._modify_version = configuration.modify_version
configuration = configuration.data
@@ -314,6 +426,14 @@ class Config(object):
return False
def reload_local_configuration(self) -> Optional[bool]:
"""Reload configuration values from the configuration file(s).
.. note::
Designed to be used when user applies changes to configuration file(s), so Patroni can use the new values
with a reload instead of a restart.
:returns: ``True`` if changes have been detected between current local configuration
"""
if self.config_file:
try:
configuration = self._load_config_file()
@@ -329,6 +449,38 @@ class Config(object):
@staticmethod
def _process_postgresql_parameters(parameters: Dict[str, Any], is_local: bool = False) -> Dict[str, Any]:
"""Process Postgres *parameters*.
.. note::
If *is_local* configuration discard any setting from *parameters* that is listed under
:attr:`~patroni.postgresql.config.ConfigHandler.CMDLINE_OPTIONS` as those are supposed to be set only
through dynamic configuration.
When setting parameters from :attr:`~patroni.postgresql.config.ConfigHandler.CMDLINE_OPTIONS` through
dynamic configuration their value will be validated as per the validator defined in that very same
attribute entry. If the given value cannot be validated, a warning will be logged and the default value of
the GUC will be used instead.
Some parameters from :attr:`~patroni.postgresql.config.ConfigHandler.CMDLINE_OPTIONS` cannot be set even if
not *is_local* configuration:
* ``listen_addresses``: inferred from ``postgresql.listen`` local configuration or from
``PATRONI_POSTGRESQL_LISTEN`` environment variable;
* ``port``: inferred from ``postgresql.listen`` local configuration or from
``PATRONI_POSTGRESQL_LISTEN`` environment variable;
* ``cluster_name``: set through ``scope`` local configuration or through ``PATRONI_SCOPE`` environment
variable;
* ``hot_standby``: always enabled;
* ``wal_log_hints``: always enabled.
:param parameters: Postgres parameters to be processed. Should be the parsed YAML value of
``postgresql.parameters`` configuration, either from local or from dynamic configuration.
:param is_local: should be ``True`` if *parameters* refers to local configuration, or ``False`` if *parameters*
refers to dynamic configuration.
:returns: new value for ``postgresql.parameters`` after processing and validating *parameters*.
"""
pg_params: Dict[str, Any] = {}
for name, value in (parameters or {}).items():
@@ -338,13 +490,39 @@ class Config(object):
if ConfigHandler.CMDLINE_OPTIONS[name][1](value):
pg_params[name] = value
else:
logging.warning("postgresql parameter %s=%s failed validation, defaulting to %s",
name, value, ConfigHandler.CMDLINE_OPTIONS[name][0])
logger.warning("postgresql parameter %s=%s failed validation, defaulting to %s",
name, value, ConfigHandler.CMDLINE_OPTIONS[name][0])
return pg_params
def _safe_copy_dynamic_configuration(self, dynamic_configuration: Dict[str, Any]) -> Dict[str, Any]:
config = deepcopy(self.__DEFAULT_CONFIG)
"""Create a copy of *dynamic_configuration*.
Merge *dynamic_configuration* with :attr:`__DEFAULT_CONFIG` (*dynamic_configuration* takes precedence), and
process ``postgresql.parameters`` from *dynamic_configuration* through :func:`_process_postgresql_parameters`,
if present.
.. note::
The following settings are not allowed in ``postgresql`` section as they are intended to be local
configuration, and are removed if present:
* ``connect_address``;
* ``proxy_address``;
* ``listen``;
* ``config_dir``;
* ``data_dir``;
* ``pgpass``;
* ``authentication``;
Besides that any setting present in *dynamic_configuration* but absent from :attr:`__DEFAULT_CONFIG` is
discarded.
:param dynamic_configuration: Patroni dynamic configuration.
:returns: copy of *dynamic_configuration*, merged with default dynamic configuration and with some sanity checks
performed over it.
"""
config = self.get_default_config()
for name, value in dynamic_configuration.items():
if name == 'postgresql':
@@ -364,9 +542,25 @@ class Config(object):
@staticmethod
def _build_environment_configuration() -> Dict[str, Any]:
"""Get local configuration settings that were specified through environment variables.
:returns: dictionary containing the found environment variables and their values, respecting the expected
structure of Patroni configuration.
"""
ret: Dict[str, Any] = defaultdict(dict)
def _popenv(name: str) -> Union[str, None]:
def _popenv(name: str) -> Optional[str]:
"""Get value of environment variable *name*.
.. note::
*name* is prefixed with :data:`~patroni.PATRONI_ENV_PREFIX` when searching in the environment.
Also, the corresponding environment variable is removed from the environment upon reading its value.
:param name: name of the environment variable.
:returns: value of *name*, if present in the environment, otherwise ``None``.
"""
return os.environ.pop(PATRONI_ENV_PREFIX + name.upper(), None)
for param in ('name', 'namespace', 'scope'):
@@ -375,6 +569,23 @@ class Config(object):
ret[param] = value
def _fix_log_env(name: str, oldname: str) -> None:
"""Normalize a log related environment variable.
.. note::
Patroni used to support different names for log related environment variables in the past. As the
environment variables were renamed, this function takes care of mapping and normalizing the environment.
*name* is prefixed with :data:`~patroni.PATRONI_ENV_PREFIX` and ``LOG`` when searching in the
environment.
*oldname* is prefixed with :data:`~patroni.PATRONI_ENV_PREFIX` when searching in the environment.
If both *name* and *oldname* are set in the environment, *name* takes precedence.
:param name: new name of a log related environment variable.
:param oldname: original name of a log related environment variable.
:type oldname: str
"""
value = _popenv(oldname)
name = PATRONI_ENV_PREFIX + 'LOG_' + name.upper()
if value and name not in os.environ:
@@ -384,6 +595,15 @@ class Config(object):
_fix_log_env(name, oldname)
def _set_section_values(section: str, params: List[str]) -> None:
"""Get value of *params* environment variables that are related with *section*.
.. note::
The values are retrieved from the environment and updated directly into the returning dictionary of
:func:`_build_environment_configuration`.
:param section: configuration section the *params* belong to.
:param params: name of the Patroni settings.
"""
for param in params:
value = _popenv(section + '_' + param)
if value:
@@ -405,6 +625,7 @@ class Config(object):
if value:
ret['postgresql'].setdefault('bin_name', {})[binary] = value
# parse all values retrieved from the environment as Python objects, according to the expected type
for first, second in (('restapi', 'allowlist_include_members'), ('ctl', 'insecure')):
value = ret.get(first, {}).pop(second, None)
if value:
@@ -421,7 +642,13 @@ class Config(object):
if value is not None:
ret[first][second] = value
def _parse_list(value: str) -> Union[List[str], None]:
def _parse_list(value: str) -> Optional[List[str]]:
"""Parse an YAML list *value* as a :class:`list`.
:param value: YAML list as a string.
:returns: *value* as :class:`list`.
"""
if not (value.strip().startswith('-') or '[' in value):
value = '[{0}]'.format(value)
try:
@@ -437,7 +664,13 @@ class Config(object):
if value:
ret[first][second] = value
def _parse_dict(value: str) -> Union[Dict[str, Any], None]:
def _parse_dict(value: str) -> Optional[Dict[str, Any]]:
"""Parse an YAML dictionary *value* as a :class:`dict`.
:param value: YAML dictionary as a string.
:returns: *value* as :class:`dict`.
"""
if not value.strip().startswith('{'):
value = '{{{0}}}'.format(value)
try:
@@ -454,9 +687,16 @@ class Config(object):
if value:
ret[first][second] = value
def _get_auth(name: str, params: Optional[Collection[str]] = None) -> Dict[str, str]:
def _get_auth(name: str, params: Collection[str] = _AUTH_ALLOWED_PARAMETERS[:2]) -> Dict[str, str]:
"""Get authorization related environment variables *params* from section *name*.
:param name: name of a configuration section that may contain authorization *params*.
:param params: the authorization settings that may be set under section *name*.
:returns: dictionary containing environment values for authorization *params* of section *name*.
"""
ret: Dict[str, str] = {}
for param in params or _AUTH_ALLOWED_PARAMETERS[:2]:
for param in params:
value = _popenv(name + '_' + param)
if value:
ret[param] = value
@@ -479,7 +719,7 @@ class Config(object):
for param in list(os.environ.keys()):
if param.startswith(PATRONI_ENV_PREFIX):
# PATRONI_(ETCD|CONSUL|ZOOKEEPER|EXHIBITOR|...)_(HOSTS?|PORT|..)
name, suffix = (param[8:].split('_', 1) + [''])[:2]
name, suffix = (param[len(PATRONI_ENV_PREFIX):].split('_', 1) + [''])[:2]
if suffix in ('HOST', 'HOSTS', 'PORT', 'USE_PROXIES', 'PROTOCOL', 'SRV', 'SRV_SUFFIX', 'URL', 'PROXY',
'CACERT', 'CERT', 'KEY', 'VERIFY', 'TOKEN', 'CHECKS', 'DC', 'CONSISTENCY',
'REGISTER_SERVICE', 'SERVICE_CHECK_INTERVAL', 'SERVICE_CHECK_TLS_SERVER_NAME',
@@ -510,14 +750,14 @@ class Config(object):
users = {}
for param in list(os.environ.keys()):
if param.startswith(PATRONI_ENV_PREFIX):
name, suffix = (param[8:].rsplit('_', 1) + [''])[:2]
name, suffix = (param[len(PATRONI_ENV_PREFIX):].rsplit('_', 1) + [''])[:2]
# PATRONI_<username>_PASSWORD=<password>, PATRONI_<username>_OPTIONS=<option1,option2,...>
# CREATE USER "<username>" WITH <OPTIONS> PASSWORD '<password>'
if name and suffix == 'PASSWORD':
password = os.environ.pop(param)
if password:
users[name] = {'password': password}
options = os.environ.pop(param[:-9] + '_OPTIONS', None)
options = os.environ.pop(param[:-9] + '_OPTIONS', None) # replace "_PASSWORD" with "_OPTIONS"
options = options and _parse_list(options)
if options:
users[name]['options'] = options
@@ -528,6 +768,16 @@ class Config(object):
def _build_effective_configuration(self, dynamic_configuration: Dict[str, Any],
local_configuration: Dict[str, Union[Dict[str, Any], Any]]) -> Dict[str, Any]:
"""Build effective configuration by merging *dynamic_configuration* and *local_configuration*.
.. note::
*local_configuration* takes precedence over *dynamic_configuration* if a setting is defined in both.
:param dynamic_configuration: Patroni dynamic configuration.
:param local_configuration: Patroni local configuration.
:returns: _description_
"""
config = self._safe_copy_dynamic_configuration(dynamic_configuration)
for name, value in local_configuration.items():
if name == 'citus': # remove invalid citus configuration
@@ -591,23 +841,57 @@ class Config(object):
return config
def get(self, key: str, default: Optional[Any] = None) -> Any:
"""Get effective value of ``key`` setting from Patroni configuration root.
Designed to work the same way as :func:`dict.get`.
:param key: name of the setting.
:param default: default value if *key* is not present in the effective configuration.
:returns: value of *key*, if present in the effective configuration, otherwise *default*.
"""
return self.__effective_configuration.get(key, default)
def __contains__(self, key: str) -> bool:
"""Check if setting *key* is present in the effective configuration.
Designed to work the same way as :func:`dict.__contains__`.
:param key: name of the setting to be checked.
:returns: ``True`` if setting *key* exists in effective configuration, else ``False``.
"""
return key in self.__effective_configuration
def __getitem__(self, key: str) -> Any:
"""Get value of setting *key* from effective configuration.
Designed to work the same way as :func:`dict.__getitem__`.
:param key: name of the setting.
:returns: value of setting *key*.
:raises:
:class:`KeyError`: if *key* is not present in effective configuration.
"""
return self.__effective_configuration[key]
def copy(self) -> Dict[str, Any]:
"""Get a deep copy of effective Patroni configuration.
:returns: a deep copy of the Patroni configuration.
"""
return deepcopy(self.__effective_configuration)
def get_global_config(self, cluster: Union[Cluster, None]) -> GlobalConfig:
def get_global_config(self, cluster: Optional[Cluster]) -> GlobalConfig:
"""Instantiate :class:`GlobalConfig` based on input.
Use the configuration from provided *cluster* (the most up-to-date) or from the
local cache if *cluster.config* is not initialized or doesn't have a valid config.
:param cluster: the currently known cluster state from DCS
:returns: :class:`GlobalConfig` object
:param cluster: the currently known cluster state from DCS.
:returns: :class:`GlobalConfig` object.
"""
return get_global_config(cluster, self._dynamic_configuration)

463
patroni/config_generator.py Normal file
View File

@@ -0,0 +1,463 @@
"""patroni ``--generate-config`` machinery."""
import abc
import logging
import os
import psutil
import socket
import sys
import yaml
from getpass import getuser, getpass
from contextlib import contextmanager
from typing import Any, Dict, Iterator, List, Optional, Tuple, TYPE_CHECKING, Union
if TYPE_CHECKING: # pragma: no cover
from psycopg import Cursor
from psycopg2 import cursor
from . import psycopg
from .config import Config
from .exceptions import PatroniException
from .postgresql.config import ConfigHandler, parse_dsn
from .postgresql.misc import postgres_major_version_to_int
from .utils import get_major_version, parse_bool, patch_config, read_stripped
# Mapping between the libpq connection parameters and the environment variables.
# This dict should be kept in sync with `patroni.utils._AUTH_ALLOWED_PARAMETERS`
# (we use "username" in the Patroni config for some reason, other parameter names are the same).
_AUTH_ALLOWED_PARAMETERS_MAPPING = {
'user': 'PGUSER',
'password': 'PGPASSWORD',
'sslmode': 'PGSSLMODE',
'sslcert': 'PGSSLCERT',
'sslkey': 'PGSSLKEY',
'sslpassword': '',
'sslrootcert': 'PGSSLROOTCERT',
'sslcrl': 'PGSSLCRL',
'sslcrldir': 'PGSSLCRLDIR',
'gssencmode': 'PGGSSENCMODE',
'channel_binding': 'PGCHANNELBINDING'
}
_NO_VALUE_MSG = '#FIXME'
def get_address() -> Tuple[str, str]:
"""Try to get hostname and the ip address for it returned by :func:`~socket.gethostname`.
.. note::
Can also return local ip.
:returns: tuple consisting of the hostname returned by :func:`~socket.gethostname`
and the first element in the sorted list of the addresses returned by :func:`~socket.getaddrinfo`.
Sorting guarantees it will prefer IPv4.
If an exception occured, hostname and ip values are equal to :data:`~patroni.config_generator._NO_VALUE_MSG`.
"""
hostname = None
try:
hostname = socket.gethostname()
return hostname, sorted(socket.getaddrinfo(hostname, 0, socket.AF_UNSPEC, socket.SOCK_STREAM, 0),
key=lambda x: x[0])[0][4][0]
except Exception as err:
logging.warning('Failed to obtain address: %r', err)
return _NO_VALUE_MSG, _NO_VALUE_MSG
class AbstractConfigGenerator(abc.ABC):
"""Object representing the generated Patroni config.
:ivar output_file: full path to the output file to be used.
:ivar pg_major: integer representation of the major PostgreSQL version.
:ivar config: dictionary used for the generated configuration storage.
"""
_HOSTNAME, _IP = get_address()
def __init__(self, output_file: Optional[str]) -> None:
"""Set up the output file (if passed), helper vars and the minimal config structure.
:param output_file: full path to the output file to be used.
"""
self.output_file = output_file
self.pg_major = 0
self.config = self.get_template_config()
self.generate()
@classmethod
def get_template_config(cls) -> Dict[str, Any]:
"""Generate a template config for further extension (e.g. in the inherited classes).
:returns: dictionary with the values gathered from Patroni env, hopefully defined hostname and ip address
(otherwise set to :data:`~patroni.config_generator._NO_VALUE_MSG`), and some sane defaults.
"""
template_config: Dict[str, Any] = {
'scope': _NO_VALUE_MSG,
'name': cls._HOSTNAME,
'postgresql': {
'data_dir': _NO_VALUE_MSG,
'connect_address': _NO_VALUE_MSG + ':5432',
'listen': _NO_VALUE_MSG + ':5432',
'bin_dir': '',
'authentication': {
'superuser': {
'username': 'postgres',
'password': _NO_VALUE_MSG
},
'replication': {
'username': 'replicator',
'password': _NO_VALUE_MSG
}
}
},
'restapi': {
'connect_address': cls._IP + ':8008',
'listen': cls._IP + ':8008'
}
}
dynamic_config = Config.get_default_config()
# to properly dump CaseInsensitiveDict as YAML later
dynamic_config['postgresql']['parameters'] = dict(dynamic_config['postgresql']['parameters'])
config = Config('', None).local_configuration # Get values from env
config.setdefault('bootstrap', {})['dcs'] = dynamic_config
config.setdefault('postgresql', {})
del config['bootstrap']['dcs']['standby_cluster']
patch_config(template_config, config)
return template_config
@abc.abstractmethod
def generate(self) -> None:
"""Generate config and store in :attr:`~AbstractConfigGenerator.config`."""
def write_config(self) -> None:
"""Write current :attr:`~AbstractConfigGenerator.config` to the output file if provided, to stdout otherwise."""
if self.output_file:
dir_path = os.path.dirname(self.output_file)
if dir_path and not os.path.isdir(dir_path):
os.makedirs(dir_path)
with open(self.output_file, 'w', encoding='UTF-8') as output_file:
yaml.safe_dump(self.config, output_file, default_flow_style=False, allow_unicode=True)
else:
yaml.safe_dump(self.config, sys.stdout, default_flow_style=False, allow_unicode=True)
class SampleConfigGenerator(AbstractConfigGenerator):
"""Object representing the generated sample Patroni config.
Sane defults are used based on the gathered PG version.
"""
@property
def get_auth_method(self) -> str:
"""Return the preferred authentication method for a specific PG version if provided or the default ``md5``.
:returns: :class:`str` value for the preferred authentication method.
"""
return 'scram-sha-256' if self.pg_major and self.pg_major >= 100000 else 'md5'
def _get_int_major_version(self) -> int:
"""Get major PostgreSQL version from the binary as an integer.
:returns: an integer PostgreSQL major version representation gathered from the PostgreSQL binary.
See :func:`~patroni.postgresql.misc.postgres_major_version_to_int` and
:func:`~patroni.utils.get_major_version`.
"""
postgres_bin = ((self.config.get('postgresql') or {}).get('bin_name') or {}).get('postgres', 'postgres')
return postgres_major_version_to_int(get_major_version(self.config['postgresql'].get('bin_dir'), postgres_bin))
def generate(self) -> None:
"""Generate sample config using some sane defaults and update :attr:`~AbstractConfigGenerator.config`."""
self.pg_major = self._get_int_major_version()
self.config['postgresql']['parameters'] = {'password_encryption': self.get_auth_method}
username = self.config["postgresql"]["authentication"]["replication"]["username"]
self.config['postgresql']['pg_hba'] = [
f'host all all all {self.get_auth_method}',
f'host replication {username} all {self.get_auth_method}'
]
# add version-specific configuration
wal_keep_param = 'wal_keep_segments' if self.pg_major < 130000 else 'wal_keep_size'
self.config['bootstrap']['dcs']['postgresql']['parameters'][wal_keep_param] = \
ConfigHandler.CMDLINE_OPTIONS[wal_keep_param][0]
self.config['bootstrap']['dcs']['postgresql']['use_pg_rewind'] = True
if self.pg_major >= 110000:
self.config['postgresql']['authentication'].setdefault(
'rewind', {'username': 'rewind_user'}).setdefault('password', _NO_VALUE_MSG)
class RunningClusterConfigGenerator(AbstractConfigGenerator):
"""Object representing the Patroni config generated using information gathered from the running instance.
:ivar dsn: DSN string for the local instance to get GUC values from (if provided).
:ivar parsed_dsn: DSN string parsed into a dictionary (see :func:`~patroni.postgresql.config.parse_dsn`).
"""
def __init__(self, output_file: Optional[str] = None, dsn: Optional[str] = None) -> None:
"""Additionally store the passed dsn (if any) in both original and parsed version and run config generation.
:param output_file: full path to the output file to be used.
:param dsn: DSN string for the local instance to get GUC values from.
:raises:
:exc:`~patroni.exceptions.PatroniException`: if DSN parsing failed.
"""
self.dsn = dsn
self.parsed_dsn = {}
super().__init__(output_file)
@property
def _get_hba_conn_types(self) -> Tuple[str, ...]:
"""Return the connection types allowed.
If :attr:`~RunningClusterConfigGenerator.pg_major` is defined, adds additional parameters
for PostgreSQL version >=16.
:returns: tuple of the connection methods allowed.
"""
allowed_types = ('local', 'host', 'hostssl', 'hostnossl', 'hostgssenc', 'hostnogssenc')
if self.pg_major and self.pg_major >= 160000:
allowed_types += ('include', 'include_if_exists', 'include_dir')
return allowed_types
@property
def _required_pg_params(self) -> List[str]:
"""PG configuration prameters that have to be always present in the generated config.
:returns: list of the parameter names.
"""
return ['hba_file', 'ident_file', 'config_file', 'data_directory'] + \
list(ConfigHandler.CMDLINE_OPTIONS.keys())
def _get_bin_dir_from_running_instance(self) -> str:
"""Define the directory postgres binaries reside using postmaster's pid executable.
:returns: path to the PostgreSQL binaries directory.
:raises:
:exc:`~patroni.exceptions.PatroniException`: if:
* pid could not be obtained from the ``postmaster.pid`` file; or
* :exc:`OSError` occured during ``postmaster.pid`` file handling; or
* the obtained postmaster pid doesn't exist.
"""
postmaster_pid = None
data_dir = self.config['postgresql']['data_dir']
try:
with open(f"{data_dir}/postmaster.pid", 'r') as pid_file:
postmaster_pid = pid_file.readline()
if not postmaster_pid:
raise PatroniException('Failed to obtain postmaster pid from postmaster.pid file')
postmaster_pid = int(postmaster_pid.strip())
except OSError as err:
raise PatroniException(f'Error while reading postmaster.pid file: {err}')
try:
return os.path.dirname(psutil.Process(postmaster_pid).exe())
except psutil.NoSuchProcess:
raise PatroniException("Obtained postmaster pid doesn't exist.")
@contextmanager
def _get_connection_cursor(self) -> Iterator[Union['cursor', 'Cursor[Any]']]:
"""Get cursor for the PG connection established based on the stored information.
:raises:
:exc:`~patroni.exceptions.PatroniException`: if :exc:`psycopg.Error` occured.
"""
try:
conn = psycopg.connect(dsn=self.dsn,
password=self.config['postgresql']['authentication']['superuser']['password'])
with conn.cursor() as cur:
yield cur
conn.close()
except psycopg.Error as e:
raise PatroniException(f'Failed to establish PostgreSQL connection: {e}')
def _set_pg_params(self, cur: Union['cursor', 'Cursor[Any]']) -> None:
"""Extend :attr:`~RunningClusterConfigGenerator.config` with the actual PG GUCs values.
THe following GUC values are set:
* Non-internal having configuration file, postmaster command line or environment variable
as a source.
* List of the always required parameters (see :meth:`~RunningClusterConfigGenerator._required_pg_params`).
:param cur: connection cursor to use.
"""
cur.execute("SELECT name, current_setting(name) FROM pg_settings "
"WHERE context <> 'internal' "
"AND source IN ('configuration file', 'command line', 'environment variable') "
"AND category <> 'Write-Ahead Log / Recovery Target' "
"AND setting <> '(disabled)' "
"OR name = ANY(%s)", (self._required_pg_params,))
helper_dict = dict.fromkeys(['port', 'listen_addresses'])
self.config['postgresql'].setdefault('parameters', {})
for param, value in cur.fetchall():
if param == 'data_directory':
self.config['postgresql']['data_dir'] = value
elif param == 'cluster_name' and value:
self.config['scope'] = value
elif param in ('archive_command', 'restore_command',
'archive_cleanup_command', 'recovery_end_command',
'ssl_passphrase_command', 'hba_file',
'ident_file', 'config_file'):
# write commands to the local config due to security implications
# write hba/ident/config_file to local config to ensure they are not removed later
self.config['postgresql']['parameters'][param] = value
elif param in helper_dict:
helper_dict[param] = value
else:
self.config['bootstrap']['dcs']['postgresql']['parameters'][param] = value
connect_port = self.parsed_dsn.get('port', os.getenv('PGPORT', helper_dict['port']))
self.config['postgresql']['connect_address'] = f'{self._IP}:{connect_port}'
self.config['postgresql']['listen'] = f'{helper_dict["listen_addresses"]}:{helper_dict["port"]}'
def _set_su_params(self) -> None:
"""Extend :attr:`~RunningClusterConfigGenerator.config` with the superuser auth information.
Information set is based on the options used for connection.
"""
su_params: Dict[str, str] = {}
for conn_param, env_var in _AUTH_ALLOWED_PARAMETERS_MAPPING.items():
val = self.parsed_dsn.get(conn_param, os.getenv(env_var))
if val:
su_params[conn_param] = val
patroni_env_su_username = ((self.config.get('authentication') or {}).get('superuser') or {}).get('username')
patroni_env_su_pwd = ((self.config.get('authentication') or {}).get('superuser') or {}).get('password')
# because we use "username" in the config for some reason
su_params['username'] = su_params.pop('user', patroni_env_su_username) or getuser()
su_params['password'] = su_params.get('password', patroni_env_su_pwd) or \
getpass('Please enter the user password:')
self.config['postgresql']['authentication'] = {
'superuser': su_params,
'replication': {'username': _NO_VALUE_MSG, 'password': _NO_VALUE_MSG}
}
def _set_conf_files(self) -> None:
"""Extend :attr:`~RunningClusterConfigGenerator.config` with ``pg_hba.conf`` and ``pg_ident.conf`` content.
.. note::
This function only defines ``postgresql.pg_hba`` and ``postgresql.pg_ident`` when
``hba_file`` and ``ident_file`` are set to the defaults. It may happen these files
are located outside of ``PGDATA`` and Patroni doesn't have write permissions for them.
:raises:
:exc:`~patroni.exceptions.PatroniException`: if :exc:`OSError` occured during the conf files handling.
"""
default_hba_path = os.path.join(self.config['postgresql']['data_dir'], 'pg_hba.conf')
if self.config['postgresql']['parameters']['hba_file'] == default_hba_path:
try:
self.config['postgresql']['pg_hba'] = list(
filter(lambda i: i and i.split()[0] in self._get_hba_conn_types, read_stripped(default_hba_path)))
except OSError as err:
raise PatroniException(f'Failed to read pg_hba.conf: {err}')
default_ident_path = os.path.join(self.config['postgresql']['data_dir'], 'pg_ident.conf')
if self.config['postgresql']['parameters']['ident_file'] == default_ident_path:
try:
self.config['postgresql']['pg_ident'] = [i for i in read_stripped(default_ident_path)
if i and not i.startswith('#')]
except OSError as err:
raise PatroniException(f'Failed to read pg_ident.conf: {err}')
if not self.config['postgresql']['pg_ident']:
del self.config['postgresql']['pg_ident']
def _enrich_config_from_running_instance(self) -> None:
"""Extend :attr:`~RunningClusterConfigGenerator.config` with the values gathered from the running instance.
Retrieve the following information from the running PostgreSQL instance:
* superuser auth parameters (see :meth:`~RunningClusterConfigGenerator._set_su_params`);
* some GUC values (see :meth:`~RunningClusterConfigGenerator._set_pg_params`);
* ``postgresql.connect_address``, ``postgresql.listen``;
* ``postgresql.pg_hba`` and ``postgresql.pg_ident`` (see :meth:`~RunningClusterConfigGenerator._set_conf_files`)
And redefine ``scope`` with the ``cluster_name`` GUC value if set.
:raises:
:exc:`~patroni.exceptions.PatroniException`: if the provided user doesn't have superuser privileges.
"""
self._set_su_params()
with self._get_connection_cursor() as cur:
self.pg_major = getattr(cur.connection, 'server_version', 0)
if not parse_bool(cur.connection.info.parameter_status('is_superuser')):
raise PatroniException('The provided user does not have superuser privilege')
self._set_pg_params(cur)
self._set_conf_files()
def generate(self) -> None:
"""Generate config using the info gathered from the specified running PG instance.
Result is written to :attr:`~RunningClusterConfigGenerator.config`.
"""
if self.dsn:
self.parsed_dsn = parse_dsn(self.dsn) or {}
if not self.parsed_dsn:
raise PatroniException('Failed to parse DSN string')
self._enrich_config_from_running_instance()
self.config['postgresql']['bin_dir'] = self._get_bin_dir_from_running_instance()
def generate_config(output_file: str, sample: bool, dsn: Optional[str]) -> None:
"""Generate Patroni configuration file.
Gather all the available non-internal GUC values having configuration file, postmaster command line or environment
variable as a source and store them in the appropriate part of Patroni configuration (``postgresql.parameters`` or
``bootstrap.dcs.postgresql.parameters``). Either the provided DSN (takes precedence) or PG ENV vars will be used
for the connection. If password is not provided, it should be entered via prompt.
The created configuration contains:
* ``scope``: ``cluster_name`` GUC value or ``PATRONI_SCOPE ENV`` variable value if available.
* ``name``: ``PATRONI_NAME`` ENV variable value if set, otherwise hostname.
* ``bootstrap.dcs``: section with all the parameters (incl. the majority of PG GUCs) set to their default values
defined by Patroni and adjusted by the source instances's configuration values.
* ``postgresql.parameters``: the source instance's ``archive_command``, ``restore_command``,
``archive_cleanup_command``, ``recovery_end_command``, ``ssl_passphrase_command``, ``hba_file``, ``ident_file``,
``config_file`` GUC values.
* ``postgresql.bin_dir``: path to Postgres binaries gathered from the running instance or, if not available,
the value of ``PATRONI_POSTGRESQL_BIN_DIR`` ENV variable. Otherwise, an empty string.
* ``postgresql.datadir``: the value gathered from the corresponding PG GUC.
* ``postgresql.listen``: source instance's ``listen_addresses`` and port GUC values.
* ``postgresql.connect_address``: if possible, generated from the connection params.
* ``postgresql.authentication``:
* superuser and replication users defined (if possible, usernames are set from the respective Patroni ENV vars,
otherwise the default ``postgres`` and ``replicator`` values are used).
If not a sample config, either DSN or PG ENV vars are used to define superuser authentication parameters.
* rewind user is defined only for sample config, if PG version can be defined and PG version is >=11
(if possible, username is set from the respective Patroni ENV var).
* ``bootstrap.dcs.postgresql.use_pg_rewind`` set to ``True`` for a sample config only.
* ``postgresql.pg_hba`` defaults or the lines gathered from the source instance's ``hba_file``.
* ``postgresql.pg_ident`` the lines gathered from the source instance's ``ident_file``.
:param output_file: Full path to the configuration file to be used. If not provided, result is sent to ``stdout``.
:param sample: Optional flag. If set, no source instance will be used - generate config with some sane defaults.
:param dsn: Optional DSN string for the local instance to get GUC values from.
"""
try:
if sample:
config_generator = SampleConfigGenerator(output_file)
else:
config_generator = RunningClusterConfigGenerator(output_file, dsn)
config_generator.write_config()
except PatroniException as e:
sys.exit(str(e))
except Exception as e:
sys.exit(f'Unexpected exception: {e}')

View File

@@ -36,7 +36,7 @@ from collections import defaultdict
from contextlib import contextmanager
from prettytable import ALL, FRAME, PrettyTable
from urllib.parse import urlparse
from typing import Any, Dict, Generator, Iterator, List, Optional, Union, Tuple, TYPE_CHECKING
from typing import Any, Dict, Iterator, List, Optional, Union, Tuple, TYPE_CHECKING
if TYPE_CHECKING: # pragma: no cover
from psycopg import Cursor
from psycopg2 import cursor
@@ -1816,7 +1816,7 @@ def resume(obj: Dict[str, Any], cluster_name: str, group: Optional[int], wait: b
@contextmanager
def temporary_file(contents: bytes, suffix: str = '', prefix: str = 'tmp') -> Generator[str, None, None]:
def temporary_file(contents: bytes, suffix: str = '', prefix: str = 'tmp') -> Iterator[str]:
"""Create a temporary file with specified contents that persists for the context.
:param contents: binary string that will be written to the file.

File diff suppressed because it is too large Load Diff

View File

@@ -643,12 +643,8 @@ class Consul(AbstractDCS):
return self._client.kv.put(self.history_path, value)
@catch_consul_errors
def _delete_leader(self) -> bool:
cluster = self.cluster
if cluster and isinstance(cluster.leader, Leader) and\
cluster.leader.name == self._name and isinstance(cluster.leader.version, int):
return self._client.kv.delete(self.leader_path, cas=cluster.leader.version)
return True
def _delete_leader(self, leader: Leader) -> bool:
return self._client.kv.delete(self.leader_path, cas=int(leader.version))
@catch_consul_errors
def set_sync_state_value(self, value: str, version: Optional[int] = None) -> Union[int, bool]:

View File

@@ -809,7 +809,7 @@ class Etcd(AbstractEtcd):
return bool(self.retry(self._client.write, self.initialize_path, sysid, prevExist=(not create_new)))
@catch_etcd_errors
def _delete_leader(self) -> bool:
def _delete_leader(self, leader: Leader) -> bool:
return bool(self._client.delete(self.leader_path, prevValue=self._name))
@catch_etcd_errors

View File

@@ -228,7 +228,7 @@ class Etcd3Client(AbstractEtcdClientWithFailover):
return self.http.urlopen
def _handle_server_response(self, response: urllib3.response.HTTPResponse) -> Dict[str, Any]:
data: Union[bytes, str] = response.data
data = response.data
try:
data = data.decode('utf-8')
ret: Dict[str, Any] = json.loads(data)
@@ -912,11 +912,10 @@ class Etcd3(AbstractEtcd):
return self.retry(self._client.put, self.initialize_path, sysid, create_revision='0' if create_new else None)
@catch_etcd_errors
def _delete_leader(self) -> bool:
cluster = self.cluster
if cluster and isinstance(cluster.leader, Leader) and cluster.leader.name == self._name:
return self._client.deleterange(self.leader_path, mod_revision=cluster.leader.version)
return True
def _delete_leader(self, leader: Leader) -> bool:
fields = build_range_request(self.leader_path)
compare = {'key': fields['key'], 'target': 'VALUE', 'value': base64_encode(self._name)}
return bool(self._client.txn(compare, {'request_delete_range': fields}))
@catch_etcd_errors
def cancel_initialization(self) -> bool:

View File

@@ -134,6 +134,8 @@ class K8sConfig(object):
config: Dict[str, Any] = yaml.safe_load(f)
context = context or config['current-context']
if TYPE_CHECKING: # pragma: no cover
assert isinstance(context, str)
context_value = self._get_by_name(config, 'context', context)
if TYPE_CHECKING: # pragma: no cover
assert isinstance(context_value, dict)
@@ -1306,11 +1308,11 @@ class Kubernetes(AbstractDCS):
if cluster and cluster.config and cluster.config.version else None
return self.patch_or_create_config({self._INITIALIZE: sysid}, resource_version)
def _delete_leader(self) -> bool:
def _delete_leader(self, leader: Leader) -> bool:
"""Unused"""
raise NotImplementedError # pragma: no cover
def delete_leader(self, last_lsn: Optional[int] = None) -> bool:
def delete_leader(self, leader: Optional[Leader], last_lsn: Optional[int] = None) -> bool:
ret = False
kind = self._kinds.get(self.leader_path)
if kind and (kind.metadata.annotations or {}).get(self._LEADER) == self._name:

View File

@@ -446,7 +446,7 @@ class Raft(AbstractDCS):
def initialize(self, create_new: bool = True, sysid: str = '') -> bool:
return self._sync_obj.set(self.initialize_path, sysid, prevExist=(not create_new)) is not False
def _delete_leader(self) -> bool:
def _delete_leader(self, leader: Leader) -> bool:
return self._sync_obj.delete(self.leader_path, prevValue=self._name, timeout=1)
def cancel_initialization(self) -> bool:

View File

@@ -466,7 +466,7 @@ class ZooKeeper(AbstractDCS):
return False
return True
def _delete_leader(self) -> bool:
def _delete_leader(self, leader: Leader) -> bool:
self._client.restart()
return True

95
patroni/file_perm.py Normal file
View File

@@ -0,0 +1,95 @@
"""Helper object that helps with figuring out file and directory permissions based on permissions of PGDATA.
:var logger: logger of this module.
:var pg_perm: instance of the :class:`__FilePermissions` object.
"""
import logging
import os
import stat
logger = logging.getLogger(__name__)
class __FilePermissions:
"""Helper class for managing permissions of directories and files under PGDATA.
Execute :meth:`set_permissions_from_data_directory` to figure out which permissions should be used for files and
directories under PGDATA based on permissions of PGDATA root directory.
"""
# Mode mask for data directory permissions that only allows the owner to
# read/write directories and files -- mask 077.
__PG_MODE_MASK_OWNER = stat.S_IRWXG | stat.S_IRWXO
# Mode mask for data directory permissions that also allows group read/execute -- mask 027.
__PG_MODE_MASK_GROUP = stat.S_IWGRP | stat.S_IRWXO
# Default mode for creating directories -- mode 700.
__PG_DIR_MODE_OWNER = stat.S_IRWXU
# Mode for creating directories that allows group read/execute -- mode 750.
__PG_DIR_MODE_GROUP = stat.S_IRWXU | stat.S_IRGRP | stat.S_IXGRP
# Default mode for creating files -- mode 600.
__PG_FILE_MODE_OWNER = stat.S_IRUSR | stat.S_IWUSR
# Mode for creating files that allows group read -- mode 640.
__PG_FILE_MODE_GROUP = stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP
def __init__(self) -> None:
"""Create a :class:`__FilePermissions` object and set default permissions."""
self.__set_owner_permissions()
self.__set_umask()
def __set_umask(self) -> None:
"""Set umask value based on calculations.
.. note::
Should only be called once either :meth:`__set_owner_permissions`
or :meth:`__set_group_permissions` has been executed.
"""
try:
os.umask(self.__pg_mode_mask)
except Exception as e:
logger.error('Can not set umask to %03o: %r', self.__pg_mode_mask, e)
def __set_owner_permissions(self) -> None:
"""Make directories/files accessible only by the owner."""
self.__pg_dir_create_mode = self.__PG_DIR_MODE_OWNER
self.__pg_file_create_mode = self.__PG_FILE_MODE_OWNER
self.__pg_mode_mask = self.__PG_MODE_MASK_OWNER
def __set_group_permissions(self) -> None:
"""Make directories/files accessible by the owner and readable by group."""
self.__pg_dir_create_mode = self.__PG_DIR_MODE_GROUP
self.__pg_file_create_mode = self.__PG_FILE_MODE_GROUP
self.__pg_mode_mask = self.__PG_MODE_MASK_GROUP
def set_permissions_from_data_directory(self, data_dir: str) -> None:
"""Set new permissions based on provided *data_dir*.
:param data_dir: reference to PGDATA to calculate permissions from.
"""
try:
st = os.stat(data_dir)
if (st.st_mode & self.__PG_DIR_MODE_GROUP) == self.__PG_DIR_MODE_GROUP:
self.__set_group_permissions()
else:
self.__set_owner_permissions()
except Exception as e:
logger.error('Can not check permissions on %s: %r', data_dir, e)
else:
self.__set_umask()
@property
def dir_create_mode(self) -> int:
"""Directory permissions."""
return self.__pg_dir_create_mode
@property
def file_create_mode(self) -> int:
"""File permissions."""
return self.__pg_file_create_mode
pg_perm = __FilePermissions()

View File

@@ -83,11 +83,7 @@ class Failsafe(object):
def __init__(self, dcs: AbstractDCS) -> None:
self._lock = RLock()
self._dcs = dcs
self._last_update = 0
self._name = None
self._conn_url = None
self._api_url = None
self._slots = None
self._reset_state()
def update(self, data: Dict[str, Any]) -> None:
with self._lock:
@@ -97,13 +93,20 @@ class Failsafe(object):
self._api_url = data['api_url']
self._slots = data.get('slots')
def _reset_state(self) -> None:
self._last_update = 0
self._name = None
self._conn_url = None
self._api_url = None
self._slots = None
@property
def leader(self) -> Optional[Leader]:
with self._lock:
if self._last_update + self._dcs.ttl > time.time() and self._name:
return Leader('', '', RemoteMember.from_name_and_data(self._name, {'api_url': self._api_url,
'conn_url': self._conn_url,
'slots': self._slots}))
return Leader('', '', RemoteMember(self._name, {'api_url': self._api_url,
'conn_url': self._conn_url,
'slots': self._slots}))
def update_cluster(self, cluster: Cluster) -> Cluster:
# Enreach cluster with the real leader if there was a ping from it
@@ -130,6 +133,8 @@ class Failsafe(object):
def set_is_active(self, value: float) -> None:
with self._lock:
self._last_update = value
if not value:
self._reset_state()
class Ha(object):
@@ -142,8 +147,8 @@ class Ha(object):
self.cluster = Cluster.empty()
self.global_config = self.patroni.config.get_global_config(None)
self.old_cluster = Cluster.empty()
self._is_leader = False
self._is_leader_lock = RLock()
self._leader_expiry = 0
self._leader_expiry_lock = RLock()
self._failsafe = Failsafe(patroni.dcs)
self._was_paused = False
self._leader_timeline = None
@@ -188,12 +193,27 @@ class Ha(object):
return self.global_config.is_standby_cluster
def is_leader(self) -> bool:
with self._is_leader_lock:
return self._is_leader > time.time()
""":returns: `True` if the current node is the leader, based on expiration set when it last held the key."""
with self._leader_expiry_lock:
return self._leader_expiry > time.time()
def set_is_leader(self, value: bool) -> None:
with self._is_leader_lock:
self._is_leader = time.time() + self.dcs.ttl if value else 0
"""Update the current node's view of it's own leadership status.
Will update the expiry timestamp to match the dcs ttl if setting leadership to true,
otherwise will set the expiry to the past to immediately invalidate.
:param value: is the current node the leader.
"""
with self._leader_expiry_lock:
self._leader_expiry = time.time() + self.dcs.ttl if value else 0
def sync_mode_is_active(self) -> bool:
"""Check whether synchronous replication is requested and already active.
:returns: ``True`` if the primary already put its name into the ``/sync`` in DCS.
"""
return self.is_synchronous_mode() and not self.cluster.sync.is_empty
def load_cluster_from_dcs(self) -> None:
cluster = self.dcs.get_cluster()
@@ -442,16 +462,23 @@ class Ha(object):
"""Handle the case when postgres isn't running.
Depending on the state of Patroni, DCS cluster view, and pg_controldata the following could happen:
- if ``primary_start_timeout`` is 0 and this node owns the leader lock, the lock
will be voluntarily released if there are healthy replicas to take it over.
- if postgres was running as a ``primary`` and this node owns the leader lock, postgres is started as primary.
- crash recover in a single-user mode is executed in the following cases:
- postgres was running as ``primary`` wasn't ``shut down`` cleanly and there is no leader in DCS
- postgres was running as ``replica`` wasn't ``shut down in recovery`` (cleanly)
and we need to run ``pg_rewind`` to join back to the cluster.
- ``pg_rewind`` is executed if it is necessary, or optinally, the data directory could
be removed if it is allowed by configuration.
- after ``crash recovery`` and/or ``pg_rewind`` are executed, postgres is started in recovery.
- if ``primary_start_timeout`` is 0 and this node owns the leader lock, the lock
will be voluntarily released if there are healthy replicas to take it over.
- if postgres was running as a ``primary`` and this node owns the leader lock, postgres is started as primary.
- crash recover in a single-user mode is executed in the following cases:
- postgres was running as ``primary`` wasn't ``shut down`` cleanly and there is no leader in DCS
- postgres was running as ``replica`` wasn't ``shut down in recovery`` (cleanly)
and we need to run ``pg_rewind`` to join back to the cluster.
- ``pg_rewind`` is executed if it is necessary, or optinally, the data directory could
be removed if it is allowed by configuration.
- after ``crash recovery`` and/or ``pg_rewind`` are executed, postgres is started in recovery.
:returns: action message, describing what was performed.
"""
@@ -460,7 +487,7 @@ class Ha(object):
if timeout == 0:
# We are requested to prefer failing over to restarting primary. But see first if there
# is anyone to fail over to.
if self.is_failover_possible(self.cluster.members):
if self.is_failover_possible():
self.watchdog.disable()
logger.info("Primary crashed. Failing over.")
self.demote('immediate')
@@ -560,7 +587,7 @@ class Ha(object):
if refresh:
self.load_cluster_from_dcs()
is_leader = self.state_handler.is_leader()
is_leader = self.state_handler.is_primary()
node_to_follow = self._get_node_to_follow(self.cluster)
@@ -629,11 +656,20 @@ class Ha(object):
promoting standbys that were guaranteed to be replicating synchronously.
"""
if self.is_synchronous_mode():
current = CaseInsensitiveSet(self.cluster.sync.members)
sync = self.cluster.sync
if sync.is_empty:
# corner case: we need to explicitly enable synchronous mode by updating the
# ``/sync`` key with the current leader name and empty members. In opposite case
# it will never be automatically enabled if there are not eligible candidates.
sync = self.dcs.write_sync_state(self.state_handler.name, None, version=sync.version)
if not sync:
return logger.warning("Updating sync state failed")
logger.info("Enabled synchronous replication")
current = CaseInsensitiveSet(sync.members)
picked, allow_promote = self.state_handler.sync_handler.current_state(self.cluster)
if picked != current:
sync = self.cluster.sync
# update synchronous standby list in dcs temporarily to point to common nodes in current and picked
sync_common = current & allow_promote
if sync_common != current:
@@ -739,7 +775,7 @@ class Ha(object):
"""
if not self.is_paused():
if not self.watchdog.is_running and not self.watchdog.activate():
if self.state_handler.is_leader():
if self.state_handler.is_primary():
self.demote('immediate')
return 'Demoting self because watchdog could not be activated'
else:
@@ -755,7 +791,7 @@ class Ha(object):
self._async_response.reset()
return 'Promotion cancelled because the pre-promote script failed'
if self.state_handler.is_leader():
if self.state_handler.is_primary():
# Inform the state handler about its primary role.
# It may be unaware of it if postgres is promoted manually.
self.state_handler.set_role('master')
@@ -777,6 +813,9 @@ class Ha(object):
self.state_handler.sync_handler.set_synchronous_standby_names(
CaseInsensitiveSet('*') if self.global_config.is_synchronous_mode_strict else CaseInsensitiveSet())
if self.state_handler.role not in ('master', 'promoted', 'primary'):
# reset failsafe state when promote
self._failsafe.set_is_active(0)
def before_promote():
self.notify_citus_coordinator('before_promote')
@@ -802,6 +841,8 @@ class Ha(object):
return _MemberStatus.unknown(member)
def fetch_nodes_statuses(self, members: List[Member]) -> List[_MemberStatus]:
if not members:
return []
pool = ThreadPool(len(members))
results = pool.map(self.fetch_node_status, members) # Run API calls on members in parallel
pool.close()
@@ -839,7 +880,7 @@ class Ha(object):
data['slots'] = self.state_handler.slots()
except Exception:
logger.exception('Exception when called state_handler.slots()')
members = [RemoteMember.from_name_and_data(name, {'api_url': url})
members = [RemoteMember(name, {'api_url': url})
for name, url in failsafe.items() if name != self.state_handler.name]
if not members: # A sinlge node cluster
return True
@@ -854,6 +895,7 @@ class Ha(object):
"""Returns if instance with an wal should consider itself unhealthy to be promoted due to replication lag.
:param wal_position: Current wal position.
:returns True when node is lagging
"""
lag = (self.cluster.last_lsn or 0) - wal_position
@@ -880,51 +922,50 @@ class Ha(object):
# Prepare list of nodes to run check against
members = [m for m in members if m.name != self.state_handler.name and not m.nofailover and m.api_url]
if members:
for st in self.fetch_nodes_statuses(members):
if st.failover_limitation() is None:
if st.in_recovery is False:
logger.warning('Primary (%s) is still alive', st.member.name)
for st in self.fetch_nodes_statuses(members):
if st.failover_limitation() is None:
if st.in_recovery is False:
logger.warning('Primary (%s) is still alive', st.member.name)
return False
if my_wal_position < st.wal_position:
logger.info('Wal position of %s is ahead of my wal position', st.member.name)
# In synchronous mode the former leader might be still accessible and even be ahead of us.
# We should not disqualify himself from the leader race in such a situation.
if not self.sync_mode_is_active() or not self.cluster.sync.leader_matches(st.member.name):
return False
if my_wal_position < st.wal_position:
logger.info('Wal position of %s is ahead of my wal position', st.member.name)
# In synchronous mode the former leader might be still accessible and even be ahead of us.
# We should not disqualify himself from the leader race in such a situation.
if not self.is_synchronous_mode() or self.cluster.sync.is_empty\
or not self.cluster.sync.leader_matches(st.member.name):
return False
logger.info('Ignoring the former leader being ahead of us')
logger.info('Ignoring the former leader being ahead of us')
return True
def is_failover_possible(self, members: List[Member], check_synchronous: Optional[bool] = True,
cluster_lsn: Optional[int] = 0) -> bool:
"""Checks whether one of the members from the list can possibly win the leader race.
def is_failover_possible(self, *, cluster_lsn: int = 0, exclude_failover_candidate: bool = False) -> bool:
"""Checks whether any of the cluster members is allowed to promote and is healthy enough for that.
:param members: list of members to check
:param check_synchronous: consider only members that are known to be listed in /sync key when sync replication.
:param cluster_lsn: to calculate replication lag and exclude member if it is laggin
:returns: `True` if there are members eligible to be the new leader
:param cluster_lsn: to calculate replication lag and exclude member if it is lagging.
:param exclude_failover_candidate: if ``True``, exclude :attr:`failover.candidate` from the members
list against which the failover possibility checks are run.
:returns: `True` if there are members eligible to become the new leader.
"""
candidates = self.get_failover_candidates(exclude_failover_candidate)
if self.is_synchronous_mode() and self.cluster.failover and self.cluster.failover.candidate and not candidates:
logger.warning('Failover candidate=%s does not match with sync_standbys=%s',
self.cluster.failover.candidate, self.cluster.sync.sync_standby)
elif not candidates:
logger.warning('manual failover: candidates list is empty')
ret = False
cluster_timeline = self.cluster.timeline
members = [m for m in members if m.name != self.state_handler.name and not m.nofailover and m.api_url]
if check_synchronous and self.is_synchronous_mode() and not self.cluster.sync.is_empty:
members = [m for m in members if self.cluster.sync.matches(m.name)]
if members:
for st in self.fetch_nodes_statuses(members):
not_allowed_reason = st.failover_limitation()
if not_allowed_reason:
logger.info('Member %s is %s', st.member.name, not_allowed_reason)
elif cluster_lsn and st.wal_position < cluster_lsn or\
not cluster_lsn and self.is_lagging(st.wal_position):
logger.info('Member %s exceeds maximum replication lag', st.member.name)
elif self.check_timeline() and (not st.timeline or st.timeline < cluster_timeline):
logger.info('Timeline %s of member %s is behind the cluster timeline %s',
st.timeline, st.member.name, cluster_timeline)
else:
ret = True
else:
logger.warning('manual failover: members list is empty')
for st in self.fetch_nodes_statuses(candidates):
not_allowed_reason = st.failover_limitation()
if not_allowed_reason:
logger.info('Member %s is %s', st.member.name, not_allowed_reason)
elif cluster_lsn and st.wal_position < cluster_lsn or \
not cluster_lsn and self.is_lagging(st.wal_position):
logger.info('Member %s exceeds maximum replication lag', st.member.name)
elif self.check_timeline() and (not st.timeline or st.timeline < cluster_timeline):
logger.info('Timeline %s of member %s is behind the cluster timeline %s',
st.timeline, st.member.name, cluster_timeline)
else:
ret = True
return ret
def manual_failover_process_no_leader(self) -> Optional[bool]:
@@ -932,7 +973,7 @@ class Ha(object):
:returns: - `True` if the current node is the best candidate to become the new leader
- `None` if the current node is running as a primary and requested candidate doesn't exist
"""
"""
failover = self.cluster.failover
if TYPE_CHECKING: # pragma: no cover
assert failover is not None
@@ -943,7 +984,7 @@ class Ha(object):
# Remove failover key if the node to failover has terminated to avoid waiting for it indefinitely
# In order to avoid attempts to delete this key from all nodes only the primary is allowed to do it.
if not self.cluster.get_member(failover.candidate, fallback_to_leader=False)\
and self.state_handler.is_leader():
and self.state_handler.is_primary():
logger.warning("manual failover: removing failover key because failover candidate is not running")
self.dcs.manual_failover('', '', version=failover.version)
return None
@@ -973,9 +1014,8 @@ class Ha(object):
# try to pick some other members to failover and check that they are healthy
if failover.leader:
if self.state_handler.name == failover.leader: # I was the leader
# exclude me and desired member which is unhealthy (failover.candidate can be None)
members = [m for m in self.cluster.members if m.name not in (failover.candidate, failover.leader)]
if self.is_failover_possible(members): # check that there are healthy members
# exclude desired member which is unhealthy if it was specified
if self.is_failover_possible(exclude_failover_candidate=bool(failover.candidate)):
return False
else: # I was the leader and it looks like currently I am the only healthy member
return True
@@ -990,6 +1030,7 @@ class Ha(object):
"""Performs a series of checks to determine that the current node is the best candidate.
In case if manual failover/switchover is requested it calls :func:`manual_failover_process_no_leader` method.
:returns: `True` if the current node is among the best candidates to become the new leader.
"""
if time.time() - self._released_leader_key_timestamp < self.dcs.ttl:
@@ -1002,7 +1043,7 @@ class Ha(object):
if ret is not None: # continue if we just deleted the stale failover key as a leader
return ret
if self.state_handler.is_leader():
if self.state_handler.is_primary():
if self.is_paused():
# in pause leader is the healthiest only when no initialize or sysid matches with initialize!
return not self.cluster.initialize or self.state_handler.sysid == self.cluster.initialize
@@ -1028,8 +1069,8 @@ class Ha(object):
if self.cluster.failover:
# When doing a switchover in synchronous mode only synchronous nodes and former leader are allowed to race
if self.is_synchronous_mode() and self.cluster.failover.leader and \
not self.cluster.sync.is_empty and not self.cluster.sync.matches(self.state_handler.name, True):
if self.sync_mode_is_active() and not self.cluster.sync.matches(self.state_handler.name, True) and \
self.cluster.failover.leader:
return False
return self.manual_failover_process_no_leader() or False
@@ -1046,12 +1087,11 @@ class Ha(object):
if failsafe_members and self.state_handler.name not in failsafe_members:
return False
# Race among not only existing cluster members, but also all known members from the failsafe config
all_known_members += [RemoteMember.from_name_and_data(name, {'api_url': url})
for name, url in failsafe_members.items()]
all_known_members += [RemoteMember(name, {'api_url': url}) for name, url in failsafe_members.items()]
all_known_members += self.cluster.members
# When in sync mode, only last known primary and sync standby are allowed to promote automatically.
if self.is_synchronous_mode() and not self.cluster.sync.is_empty:
if self.sync_mode_is_active():
if not self.cluster.sync.matches(self.state_handler.name, True):
return False
# pick between synchronous candidates so we minimize unnecessary failovers/demotions
@@ -1064,7 +1104,7 @@ class Ha(object):
def _delete_leader(self, last_lsn: Optional[int] = None) -> None:
self.set_is_leader(False)
self.dcs.delete_leader(last_lsn)
self.dcs.delete_leader(self.cluster.leader, last_lsn)
self.dcs.reset_cluster()
def release_leader_key_voluntarily(self, last_lsn: Optional[int] = None) -> None:
@@ -1075,13 +1115,15 @@ class Ha(object):
def demote(self, mode: str) -> Optional[bool]:
"""Demote PostgreSQL running as primary.
:param mode: One of offline, graceful or immediate.
offline is used when connection to DCS is not available.
graceful is used when failing over to another node due to user request. May only be called running async.
immediate is used when we determine that we are not suitable for primary and want to failover quickly
without regard for data durability. May only be called synchronously.
immediate-nolock is used when find out that we have lost the lock to be primary. Need to bring down
PostgreSQL as quickly as possible without regard for data durability. May only be called synchronously.
:param mode: One of offline, graceful, immediate or immediate-nolock.
``offline`` is used when connection to DCS is not available.
``graceful`` is used when failing over to another node due to user request. May only be called
running async.
``immediate`` is used when we determine that we are not suitable for primary and want to failover
quickly without regard for data durability. May only be called synchronously.
``immediate-nolock`` is used when find out that we have lost the lock to be primary. Need to bring
down PostgreSQL as quickly as possible without regard for data durability. May only be called
synchronously.
"""
mode_control = {
'offline': dict(stop='fast', checkpoint=False, release=False, offline=True, async_req=False), # noqa: E241,E501
@@ -1102,9 +1144,7 @@ class Ha(object):
# It could happen if Postgres is still archiving the backlog of WAL files.
# If we know that there are replicas that received the shutdown checkpoint
# location, we can remove the leader key and allow them to start leader race.
# for a manual failover/switchover with a candidate, we should check the requested candidate only
if self.is_failover_possible(self.get_failover_candidates(), cluster_lsn=checkpoint_location):
if self.is_failover_possible(cluster_lsn=checkpoint_location):
self.state_handler.set_role('demoted')
with self._async_executor:
self.release_leader_key_voluntarily(checkpoint_location)
@@ -1194,7 +1234,7 @@ class Ha(object):
:returns: action message if demote was initiated, None if no action was taken"""
failover = self.cluster.failover
if not failover or (self.is_paused() and not self.state_handler.is_leader()):
if not failover or (self.is_paused() and not self.state_handler.is_primary()):
return
if (failover.scheduled_at and not
@@ -1206,19 +1246,11 @@ class Ha(object):
if not failover.candidate or failover.candidate != self.state_handler.name:
if not failover.candidate and self.is_paused():
logger.warning('Failover is possible only to a specific candidate in a paused state')
elif self.is_failover_possible():
ret = self._async_executor.try_run_async('manual failover: demote', self.demote, ('graceful',))
return ret or 'manual failover: demoting myself'
else:
if self.is_synchronous_mode():
members = self.get_failover_candidates(check_sync=True)
if failover.candidate and not members:
logger.warning('Failover candidate=%s does not match with sync_standbys=%s',
failover.candidate, self.cluster.sync.sync_standby)
else:
members = self.get_failover_candidates()
if self.is_failover_possible(members, False): # check that there are healthy members
ret = self._async_executor.try_run_async('manual failover: demote', self.demote, ('graceful',))
return ret or 'manual failover: demoting myself'
else:
logger.warning('manual failover: no healthy members found, failover is not possible')
logger.warning('manual failover: no healthy members found, failover is not possible')
else:
logger.warning('manual failover: I am already the leader, no need to failover')
else:
@@ -1272,7 +1304,7 @@ class Ha(object):
def process_healthy_cluster(self) -> str:
if self.has_lock():
if self.is_paused() and not self.state_handler.is_leader():
if self.is_paused() and not self.state_handler.is_primary():
if self.cluster.failover and self.cluster.failover.candidate == self.state_handler.name:
return 'waiting to become primary after promote...'
@@ -1304,7 +1336,7 @@ class Ha(object):
else:
# Either there is no connection to DCS or someone else acquired the lock
logger.error('failed to update leader lock')
if self.state_handler.is_leader():
if self.state_handler.is_primary():
if self.is_paused():
return 'continue to run as primary after failing to update leader lock in DCS'
self.demote('immediate-nolock')
@@ -1473,13 +1505,11 @@ class Ha(object):
self._async_executor.run_async(self._do_reinitialize, args=(cluster, ))
def handle_long_action_in_progress(self) -> str:
"""
Figure out what to do with the task AsyncExecutor is performing.
"""
"""Figure out what to do with the task AsyncExecutor is performing."""
if self.has_lock() and self.update_lock():
if self._async_executor.scheduled_action == 'doing crash recovery in a single user mode':
time_left = self.global_config.primary_start_timeout - (time.time() - self._crash_recovery_started)
if time_left <= 0 and self.is_failover_possible(self.cluster.members):
if time_left <= 0 and self.is_failover_possible():
logger.info("Demoting self because crash recovery is taking too long")
self.state_handler.cancellable.cancel(True)
self.demote('immediate')
@@ -1543,7 +1573,7 @@ class Ha(object):
self.cancel_initialization()
if result is None:
if not self.state_handler.is_leader():
if not self.state_handler.is_primary():
return 'waiting for end of recovery after bootstrap'
self.state_handler.set_role('master')
@@ -1570,8 +1600,7 @@ class Ha(object):
return 'initialized a new cluster'
def handle_starting_instance(self) -> Optional[str]:
"""Starting up PostgreSQL may take a long time. In case we are the leader we may want to
fail over to."""
"""Starting up PostgreSQL may take a long time. In case we are the leader we may want to fail over to."""
# Check if we are in startup, when paused defer to main loop for manual failovers.
if not self.state_handler.check_for_startup() or self.is_paused():
@@ -1591,7 +1620,7 @@ class Ha(object):
time_left = timeout - self.state_handler.time_in_state()
if time_left <= 0:
if self.is_failover_possible(self.cluster.members):
if self.is_failover_possible():
logger.info("Demoting self because primary startup is taking too long")
self.demote('immediate')
return 'stopped PostgreSQL because of startup timeout'
@@ -1611,7 +1640,8 @@ class Ha(object):
def set_start_timeout(self, value: Optional[int]) -> None:
"""Sets timeout for starting as primary before eligible for failover.
Must be called when async_executor is busy or in the main thread."""
Must be called when async_executor is busy or in the main thread.
"""
self._start_timeout = value
def _run_cycle(self) -> str:
@@ -1727,7 +1757,7 @@ class Ha(object):
elif self.cluster.is_unlocked() and not self.is_paused():
# "bootstrap", but data directory is not empty
if not self.state_handler.cb_called and self.state_handler.is_running() \
and not self.state_handler.is_leader():
and not self.state_handler.is_primary():
self._join_aborted = True
logger.error('No initialize key in DCS and PostgreSQL is running as replica, aborting start')
logger.error('Please first start Patroni on the node running as primary')
@@ -1770,7 +1800,7 @@ class Ha(object):
create_slots = self._sync_replication_slots(False)
if not self.state_handler.cb_called:
if not is_promoting and not self.state_handler.is_leader():
if not is_promoting and not self.state_handler.is_primary():
self._rewind.trigger_check_diverged_lsn()
self.state_handler.call_nowait(CallbackAction.ON_START)
@@ -1795,7 +1825,7 @@ class Ha(object):
def _handle_dcs_error(self) -> str:
if not self.is_paused() and self.state_handler.is_running():
if self.state_handler.is_leader():
if self.state_handler.is_primary():
if self.is_failsafe_mode() and self.check_failsafe_topology():
self.set_is_leader(True)
self._failsafe.set_is_active(time.time())
@@ -1816,7 +1846,9 @@ class Ha(object):
"""Handles replication slots.
:param dcs_failed: bool, indicates that communication with DCS failed (get_cluster() or update_leader())
:returns: list[str], replication slots names that should be copied from the primary"""
:returns: list[str], replication slots names that should be copied from the primary
"""
slots: List[str] = []
@@ -1869,9 +1901,8 @@ class Ha(object):
# If we know that there are replicas that received the shutdown checkpoint
# location, we can remove the leader key and allow them to start leader race.
# for a manual failover/switchover with a candidate, we should check the requested candidate only
if self.is_failover_possible(self.get_failover_candidates(), cluster_lsn=checkpoint_location):
self.dcs.delete_leader(checkpoint_location)
if self.is_failover_possible(cluster_lsn=checkpoint_location):
self.dcs.delete_leader(self.cluster.leader, checkpoint_location)
status['deleted'] = True
else:
self.dcs.write_leader_optime(checkpoint_location)
@@ -1888,7 +1919,7 @@ class Ha(object):
if not self.state_handler.is_running():
if self.is_leader() and not status['deleted']:
checkpoint_location = self.state_handler.latest_checkpoint_location()
self.dcs.delete_leader(checkpoint_location)
self.dcs.delete_leader(self.cluster.leader, checkpoint_location)
self.touch_member()
else:
# XXX: what about when Patroni is started as the wrong user that has access to the watchdog device
@@ -1907,15 +1938,16 @@ class Ha(object):
return self.dcs.watch(leader_version, timeout)
def wakeup(self) -> None:
"""Call of this method will trigger the next run of HA loop if there is
no "active" leader watch request in progress.
"""Trigger the next run of HA loop if there is no "active" leader watch request in progress.
This usually happens on the leader or if the node is running async action"""
self.dcs.event.set()
def get_remote_member(self, member: Union[Leader, Member, None] = None) -> RemoteMember:
""" In case of standby cluster this will tel us from which remote
member to stream. Config can be both patroni config or
cluster.config.data
"""Get remote member node to stream from.
In case of standby cluster this will tell us from which remote member to stream. Config can be both patroni
config or cluster.config.data.
"""
data: Dict[str, Any] = {}
cluster_params = self.global_config.get_standby_cluster_config()
@@ -1929,25 +1961,32 @@ class Ha(object):
data['conn_kwargs'] = conn_kwargs
name = member.name if member else 'remote_member:{}'.format(uuid.uuid1())
return RemoteMember.from_name_and_data(name, data)
return RemoteMember(name, data)
def get_failover_candidates(self, check_sync: bool = False) -> List[Member]:
"""Return list of candidates for either manual or automatic failover.
def get_failover_candidates(self, exclude_failover_candidate: bool) -> List[Member]:
"""Return a list of candidates for either manual or automatic failover.
Mainly used to later be passed to ``Ha.is_failover_possible()``.
Exclude non-sync members when in synchronous mode, the current node (its checks are always performed earlier)
and the candidate if required. If failover candidate exclusion is not requested and a candidate is specified
in the /failover key, return the candidate only.
The result is further evaluated in the caller :func:`Ha.is_failover_possible` to check if any member is actually
healthy enough and is allowed to poromote.
:param check_sync: if ``True``, also check against the sync key members
:param exclude_failover_candidate: if ``True``, exclude :attr:`failover.candidate` from the candidates.
:returns: a list of ``Member`` ojects or an empty list if there is no candidate available
:returns: a list of :class:`Member` ojects or an empty list if there is no candidate available.
"""
failover = self.cluster.failover
if check_sync:
exclude = [self.state_handler.name] + ([failover.candidate] if failover and exclude_failover_candidate else [])
def is_eligible(node: Member) -> bool:
# TODO: allow manual failover (=no leader specified) to async node
# every sync_standby or the candidate specified if is in sync_standbys
return [m for m in self.cluster.members
if self.cluster.sync.matches(m.name)
and (not failover or not failover.candidate or m.name == failover.candidate)]
else:
# every member or the candidate specified
return [m for m in self.cluster.members
if not failover or not failover.candidate or m.name == failover.candidate]
if self.sync_mode_is_active() and not self.cluster.sync.matches(node.name):
return False
# Don't spend time on "nofailover" nodes checking.
# We also don't need nodes which we can't query with the api in the list.
return node.name not in exclude and \
not node.nofailover and bool(node.api_url) and \
(not failover or not failover.candidate or node.name == failover.candidate)
return list(filter(is_eligible, self.cluster.members))

View File

@@ -21,17 +21,17 @@ _LOGGER = logging.getLogger(__name__)
def debug_exception(self: logging.Logger, msg: object, *args: Any, **kwargs: Any) -> None:
"""Add full stack trace info to debug log messages and partial to others.
Handle :func:`exception` calls for *self*.
Handle :func:`~self.exception` calls for *self*.
.. note::
* If *self* log level is set to ``DEBUG``, then issue a ``DEBUG`` message with the complete stack trace;
* If *self* log level is ``INFO`` or higher, then issue an ``ERROR`` message with only the last line of
the stack trace.
:param self: logger for which :func:`exception` will be processed.
:param self: logger for which :func:`~self.exception` will be processed.
:param msg: the message related to the exception to be logged.
:param args: positional arguments to be passed to :func:`self.debug` or :func:`loger_obj.error`.
:param kwargs: keyword arguments to be passed to :func:`self.debug` or :func:`loger_obj.error`.
:param args: positional arguments to be passed to :func:`~self.debug` or :func:`~self.error`.
:param kwargs: keyword arguments to be passed to :func:`~self.debug` or :func:`~self.error`.
"""
kwargs.pop("exc_info", False)
if self.isEnabledFor(logging.DEBUG):
@@ -44,16 +44,16 @@ def debug_exception(self: logging.Logger, msg: object, *args: Any, **kwargs: Any
def error_exception(self: logging.Logger, msg: object, *args: Any, **kwargs: Any) -> None:
"""Add full stack trace info to error messages.
Handle :func:`exception` calls for *self*.
Handle :func:`~self.exception` calls for *self*.
.. note::
* By default issue an ``ERROR`` message with the complete stack trace. If you do not want to show the complete
stack trace, call with ``exc_info=False``.
stack trace, call with ``exc_info=False``.
:param self: logger for which :func:`exception` will be processed.
:param self: logger for which :func:`~self.exception` will be processed.
:param msg: the message related to the exception to be logged.
:param args: positional arguments to be passed to :func:`loger_obj.error`.
:param kwargs: keyword arguments to be passed to :func:`loger_obj.error`.
:param args: positional arguments to be passed to :func:`~self.error`.
:param kwargs: keyword arguments to be passed to :func:`~self.error`.
"""
exc_info = kwargs.pop("exc_info", True)
self.error(msg, *args, exc_info=exc_info, **kwargs)
@@ -140,7 +140,7 @@ class ProxyHandler(logging.Handler):
def emit(self, record: logging.LogRecord) -> None:
"""Emit each log record that is handled.
Will push the log record down to :func:`handle` method of the currently configured log handler.
Will push the log record down to :func:`~logging.Handler.handle` method of the currently configured log handler.
:param record: the record that was emitted.
"""
@@ -203,7 +203,7 @@ class PatroniLogger(Thread):
self._root_logger.addHandler(self._proxy_handler)
def update_loggers(self) -> None:
"""Configure loggers' log level as defined in ``log.loggers` section of Patroni configuration.
"""Configure loggers' log level as defined in ``log.loggers`` section of Patroni configuration.
.. note::
It creates logger objects that are not defined yet in the log manager.
@@ -281,7 +281,8 @@ class PatroniLogger(Thread):
.. note::
It is used to remove different handlers that were configured previous to a reload in the configuration,
e.g. if we are switching from :class:`RotatingFileHandler` to class:`StreamHandler` and vice-versa.
e.g. if we are switching from :class:`~logging.handlers.RotatingFileHandler` to
class:`~logging.StreamHandler` and vice-versa.
"""
while True:
with self.log_handler_lock:

View File

@@ -12,7 +12,7 @@ from datetime import datetime
from dateutil import tz
from psutil import TimeoutExpired
from threading import current_thread, Lock
from typing import Any, Callable, Dict, Generator, List, Optional, Union, Tuple, TYPE_CHECKING
from typing import Any, Callable, Dict, Iterator, List, Optional, Union, Tuple, TYPE_CHECKING
from .bootstrap import Bootstrap
from .callback_executor import CallbackAction, CallbackExecutor
@@ -57,8 +57,8 @@ class Postgresql(object):
TL_LSN = ("CASE WHEN pg_catalog.pg_is_in_recovery() THEN 0 "
"ELSE ('x' || pg_catalog.substr(pg_catalog.pg_{0}file_name("
"pg_catalog.pg_current_{0}_{1}()), 1, 8))::bit(32)::int END, " # primary timeline
"CASE WHEN pg_catalog.pg_is_in_recovery() THEN 0 "
"ELSE pg_catalog.pg_{0}_{1}_diff(pg_catalog.pg_current_{0}_{1}(), '0/0')::bigint END, " # write_lsn
"CASE WHEN pg_catalog.pg_is_in_recovery() THEN 0 ELSE "
"pg_catalog.pg_{0}_{1}_diff(pg_catalog.pg_current_{0}{2}_{1}(), '0/0')::bigint END, " # wal(_flush)?_lsn
"pg_catalog.pg_{0}_{1}_diff(pg_catalog.pg_last_{0}_replay_{1}(), '0/0')::bigint, "
"pg_catalog.pg_{0}_{1}_diff(COALESCE(pg_catalog.pg_last_{0}_receive_{1}(), '0/0'), '0/0')::bigint, "
"pg_catalog.pg_is_in_recovery() AND pg_catalog.pg_is_{0}_replay_paused()")
@@ -120,9 +120,9 @@ class Postgresql(object):
if self.is_running(): # we are "joining" already running postgres
self.set_state('running')
self.set_role('master' if self.is_leader() else 'replica')
self.set_role('master' if self.is_primary() else 'replica')
# postpone writing postgresql.conf for 12+ because recovery parameters are not yet known
if self.major_version < 120000 or self.is_leader():
if self.major_version < 120000 or self.is_primary():
self.config.write_postgresql_conf()
hba_saved = self.config.replace_pg_hba()
ident_saved = self.config.replace_pg_ident()
@@ -159,6 +159,11 @@ class Postgresql(object):
def wal_name(self) -> str:
return 'wal' if self._major_version >= 100000 else 'xlog'
@property
def wal_flush(self) -> str:
"""For PostgreSQL 9.6 onwards we want to use pg_current_wal_flush_lsn()/pg_current_xlog_flush_location()."""
return '_flush' if self._major_version >= 90600 else ''
@property
def lsn_name(self) -> str:
return 'lsn' if self._major_version >= 100000 else 'location'
@@ -173,6 +178,7 @@ class Postgresql(object):
"""Returns the monitoring query with a fixed number of fields.
The query text is constructed based on current state in DCS and PostgreSQL version:
1. function names depend on version. wal/lsn for v10+ and xlog/location for pre v10.
2. for primary we query timeline_id (extracted from pg_walfile_name()) and pg_current_wal_lsn()
3. for replicas we query pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), and pg_is_wal_replay_paused()
@@ -182,7 +188,8 @@ class Postgresql(object):
7. if sync replication is enabled we query pg_stat_replication and aggregate the result.
In addition to that we get current values of synchronous_commit and synchronous_standby_names GUCs.
If some conditions are not satisfied we simply put static values instead. E.g., NULL, 0, '', and so on."""
If some conditions are not satisfied we simply put static values instead. E.g., NULL, 0, '', and so on.
"""
extra = ", " + (("pg_catalog.current_setting('synchronous_commit'), "
"pg_catalog.current_setting('synchronous_standby_names'), "
@@ -211,7 +218,7 @@ class Postgresql(object):
else:
extra = "0, NULL, NULL, NULL, NULL, NULL, NULL" + extra
return ("SELECT " + self.TL_LSN + ", {2}").format(self.wal_name, self.lsn_name, extra)
return ("SELECT " + self.TL_LSN + ", {3}").format(self.wal_name, self.lsn_name, self.wal_flush, extra)
@property
def available_gucs(self) -> CaseInsensitiveSet:
@@ -474,14 +481,14 @@ class Postgresql(object):
""":returns: a result set of 'SELECT * FROM pg_stat_replication'."""
return self._cluster_info_state_get('pg_stat_replication') or []
def replication_state_from_parameters(self, is_leader: bool, receiver_state: Optional[str],
def replication_state_from_parameters(self, is_primary: bool, receiver_state: Optional[str],
restore_command: Optional[str]) -> Optional[str]:
"""Figure out the replication state from input parameters.
.. note::
This method could be only called when Postgres is up, running and queries are successfuly executed.
:is_leader: `True` is postgres is not running in recovery
:is_primary: `True` is postgres is not running in recovery
:receiver_state: value from `pg_stat_get_wal_receiver.state` or None if Postgres is older than 9.6
:restore_command: value of ``restore_command`` GUC for PostgreSQL 12+ or
`postgresql.recovery_conf.restore_command` if it is set in Patroni configuration
@@ -490,7 +497,7 @@ class Postgresql(object):
- 'streaming' if replica is streaming according to the `pg_stat_wal_receiver` view;
- 'in archive recovery' if replica isn't streaming and there is a `restore_command`
"""
if self._major_version >= 90600 and not is_leader:
if self._major_version >= 90600 and not is_primary:
if receiver_state == 'streaming':
return 'streaming'
# For Postgres older than 12 we get `restore_command` from Patroni config, otherwise we check GUC
@@ -505,11 +512,11 @@ class Postgresql(object):
:returns: ``streaming``, ``in archive recovery``, or ``None``
"""
return self.replication_state_from_parameters(self.is_leader(),
return self.replication_state_from_parameters(self.is_primary(),
self._cluster_info_state_get('receiver_state'),
self._cluster_info_state_get('restore_command'))
def is_leader(self) -> bool:
def is_primary(self) -> bool:
try:
return bool(self._cluster_info_state_get('timeline'))
except PostgresConnectionException:
@@ -999,7 +1006,7 @@ class Postgresql(object):
@contextmanager
def get_replication_connection_cursor(self, host: Optional[str] = None, port: int = 5432,
**kwargs: Any) -> Generator[Union['cursor', 'Cursor[Any]'], None, None]:
**kwargs: Any) -> Iterator[Union['cursor', 'Cursor[Any]']]:
conn_kwargs = self.config.replication.copy()
conn_kwargs.update(host=host, port=int(port) if port else None, user=conn_kwargs.pop('username'),
connect_timeout=3, replication=1, options='-c statement_timeout=2000')
@@ -1157,9 +1164,9 @@ class Postgresql(object):
return ret
@staticmethod
def _wal_position(is_leader: bool, wal_position: int,
def _wal_position(is_primary: bool, wal_position: int,
received_location: Optional[int], replayed_location: Optional[int]) -> int:
return wal_position if is_leader else max(received_location or 0, replayed_location or 0)
return wal_position if is_primary else max(received_location or 0, replayed_location or 0)
def timeline_wal_position(self) -> Tuple[int, int, Optional[int]]:
# This method could be called from different threads (simultaneously with some other `_query` calls).
@@ -1195,7 +1202,7 @@ class Postgresql(object):
return None
def last_operation(self) -> int:
return self._wal_position(self.is_leader(), self._cluster_info_state_get('wal_position') or 0,
return self._wal_position(self.is_primary(), self._cluster_info_state_get('wal_position') or 0,
self.received_location(), self.replayed_location())
def configure_server_parameters(self) -> None:

View File

@@ -453,11 +453,13 @@ class CitusHandler(Thread):
"""Returns the tuple(i, task), where `i` - is the task index in the self._tasks list
Tasks are picked by following priorities:
1. If there is already a transaction in progress, pick a task
that that will change already affected worker primary.
2. If the coordinator address should be changed - pick a task
with group=0 (coordinators are always in group 0).
3. Pick a task that is the oldest (first from the self._tasks)"""
3. Pick a task that is the oldest (first from the self._tasks)
"""
with self._condition:
if self._in_flight:
@@ -710,7 +712,7 @@ class CitusHandler(Thread):
parameters['wal_level'] = 'logical'
def ignore_replication_slot(self, slot: Dict[str, str]) -> bool:
if isinstance(self._config, dict) and self._postgresql.is_leader() and\
if isinstance(self._config, dict) and self._postgresql.is_primary() and\
slot['type'] == 'logical' and slot['database'] == self._config['database']:
m = CITUS_SLOT_NAME_RE.match(slot['name'])
return bool(m and {'move': 'pgoutput', 'split': 'citus'}.get(m.group(1)) == slot['plugin'])

View File

@@ -6,14 +6,16 @@ import socket
import stat
import time
from contextlib import contextmanager
from urllib.parse import urlparse, parse_qsl, unquote
from types import TracebackType
from typing import Any, Collection, Dict, List, Optional, Union, Tuple, Type, TYPE_CHECKING
from typing import Any, Collection, Dict, Iterator, List, Optional, Union, Tuple, Type, TYPE_CHECKING
from .validator import recovery_parameters, transform_postgresql_parameter_value, transform_recovery_parameter_value
from ..collections import CaseInsensitiveDict, CaseInsensitiveSet
from ..dcs import Leader, Member, RemoteMember, slot_name_from_member_name
from ..exceptions import PatroniFatalException
from ..file_perm import pg_perm
from ..utils import compare_values, parse_bool, parse_int, split_host_port, uri, validate_directory, is_subpath
from ..validator import IntValidator, EnumValidator
@@ -367,6 +369,30 @@ class ConfigHandler(object):
configuration.append('pg_ident.conf')
return configuration
def set_file_permissions(self, filename: str) -> None:
"""Set permissions of file *filename* according to the expected permissions if it resides under PGDATA.
.. note::
Do nothing if the file is not under PGDATA.
:param filename: path to a file which permissions might need to be adjusted.
"""
if is_subpath(self._postgresql.data_dir, filename):
pg_perm.set_permissions_from_data_directory(self._postgresql.data_dir)
os.chmod(filename, pg_perm.file_create_mode)
@contextmanager
def config_writer(self, filename: str) -> Iterator[ConfigWriter]:
"""Create :class:`ConfigWriter` object and set permissions on a *filename*.
:param filename: path to a config file.
:yields: :class:`ConfigWriter` object.
"""
with ConfigWriter(filename) as writer:
yield writer
self.set_file_permissions(filename)
def save_configuration_files(self, check_custom_bootstrap: bool = False) -> bool:
"""
copy postgresql.conf to postgresql.conf.backup to be able to retrieve configuration files
@@ -380,6 +406,7 @@ class ConfigHandler(object):
backup_file = os.path.join(self._postgresql.data_dir, f + '.backup')
if os.path.isfile(config_file):
shutil.copy(config_file, backup_file)
self.set_file_permissions(backup_file)
except IOError:
logger.exception('unable to create backup copies of configuration files')
return True
@@ -393,9 +420,11 @@ class ConfigHandler(object):
if not os.path.isfile(config_file):
if os.path.isfile(backup_file):
shutil.copy(backup_file, config_file)
self.set_file_permissions(config_file)
# Previously we didn't backup pg_ident.conf, if file is missing just create empty
elif f == 'pg_ident.conf':
open(config_file, 'w').close()
self.set_file_permissions(config_file)
except IOError:
logger.exception('unable to restore configuration files from backup')
@@ -409,7 +438,7 @@ class ConfigHandler(object):
if self._postgresql.enforce_hot_standby_feedback:
configuration['hot_standby_feedback'] = 'on'
with ConfigWriter(self._postgresql_conf) as f:
with self.config_writer(self._postgresql_conf) as f:
include = self._config.get('custom_conf') or self._postgresql_base_conf_name
f.writeline("include '{0}'\n".format(ConfigWriter.escape(include)))
for name, value in sorted((configuration).items()):
@@ -439,6 +468,7 @@ class ConfigHandler(object):
if not self.hba_file and not self._config.get('pg_hba'):
with open(self._pg_hba_conf, 'a') as f:
f.write('\n{}\n'.format('\n'.join(config)))
self.set_file_permissions(self._pg_hba_conf)
return True
def replace_pg_hba(self) -> Optional[bool]:
@@ -458,14 +488,14 @@ class ConfigHandler(object):
self.local_replication_address['host'], self.local_replication_address['port'],
0, socket.SOCK_STREAM, socket.IPPROTO_TCP)})
with ConfigWriter(self._pg_hba_conf) as f:
with self.config_writer(self._pg_hba_conf) as f:
for address, t in addresses.items():
f.writeline((
'{0}\treplication\t{1}\t{3}\ttrust\n'
'{0}\tall\t{2}\t{3}\ttrust'
).format(t, self.replication['username'], self._superuser.get('username') or 'all', address))
elif not self.hba_file and self._config.get('pg_hba'):
with ConfigWriter(self._pg_hba_conf) as f:
with self.config_writer(self._pg_hba_conf) as f:
f.writelines(self._config['pg_hba'])
return True
@@ -478,7 +508,7 @@ class ConfigHandler(object):
"""
if not self.ident_file and self._config.get('pg_ident'):
with ConfigWriter(self._pg_ident_conf) as f:
with self.config_writer(self._pg_ident_conf) as f:
f.writelines(self._config['pg_ident'])
return True
@@ -800,9 +830,11 @@ class ConfigHandler(object):
if self._postgresql.major_version >= 120000:
if parse_bool(recovery_params.pop('standby_mode', None)):
open(self._standby_signal, 'w').close()
self.set_file_permissions(self._standby_signal)
else:
self._remove_file_if_exists(self._standby_signal)
open(self._recovery_signal, 'w').close()
self.set_file_permissions(self._recovery_signal)
def restart_required(name: str) -> bool:
if self._postgresql.major_version >= 140000:
@@ -813,8 +845,7 @@ class ConfigHandler(object):
self._current_recovery_params = CaseInsensitiveDict({n: [v, restart_required(n), self._postgresql_conf]
for n, v in recovery_params.items()})
else:
with ConfigWriter(self._recovery_conf) as f:
os.chmod(self._recovery_conf, stat.S_IWRITE | stat.S_IREAD)
with self.config_writer(self._recovery_conf) as f:
self._write_recovery_params(f, recovery_params)
def remove_recovery_conf(self) -> None:
@@ -843,6 +874,7 @@ class ConfigHandler(object):
if overwrite:
try:
with open(self._auto_conf, 'w') as f:
self.set_file_permissions(self._auto_conf)
for raw_line in lines:
f.write(raw_line)
except Exception:

View File

@@ -2,7 +2,7 @@ import logging
from contextlib import contextmanager
from threading import Lock
from typing import Any, Dict, Generator, Union, TYPE_CHECKING
from typing import Any, Dict, Iterator, Union, TYPE_CHECKING
if TYPE_CHECKING: # pragma: no cover
from psycopg import Connection as Connection3, Cursor
from psycopg2 import connection, cursor
@@ -44,7 +44,7 @@ class Connection(object):
@contextmanager
def get_connection_cursor(**kwargs: Any) -> Generator[Union['cursor', 'Cursor[Any]'], None, None]:
def get_connection_cursor(**kwargs: Any) -> Iterator[Union['cursor', 'Cursor[Any]']]:
conn = psycopg.connect(**kwargs)
with conn.cursor() as cur:
yield cur

View File

@@ -280,7 +280,7 @@ class Rewind(object):
"""After promote issue a CHECKPOINT from a new thread and asynchronously check the result.
In case if CHECKPOINT failed, just check that timeline in pg_control was updated."""
if self._state != REWIND_STATUS.CHECKPOINT and self._postgresql.is_leader():
if self._state != REWIND_STATUS.CHECKPOINT and self._postgresql.is_primary():
with self._checkpoint_task_lock:
if self._checkpoint_task:
with self._checkpoint_task:

View File

@@ -1,15 +1,20 @@
"""Replication slot handling.
Provides classes for the creation, monitoring, management and synchronisation of PostgreSQL replication slots.
"""
import logging
import os
import shutil
from collections import defaultdict
from contextlib import contextmanager
from threading import Condition, Thread
from typing import Any, Dict, Generator, List, Optional, Union, Tuple, TYPE_CHECKING
from typing import Any, Dict, Iterator, List, Optional, Union, Tuple, TYPE_CHECKING, Collection
from .connection import get_connection_cursor
from .misc import format_lsn, fsync_dir
from ..dcs import Cluster, Leader
from ..file_perm import pg_perm
from ..psycopg import OperationalError
if TYPE_CHECKING: # pragma: no cover
@@ -42,9 +47,17 @@ def compare_slots(s1: Dict[str, Any], s2: Dict[str, Any], dbid: str = 'database'
class SlotsAdvanceThread(Thread):
"""Daemon process :class:``Thread`` object for advancing logical replication slots on replicas.
This ensures that slot advancing queries sent to postgres do not block the main loop.
"""
def __init__(self, slots_handler: 'SlotsHandler') -> None:
super(SlotsAdvanceThread, self).__init__()
"""Create and start a new thread for handling slot advance queries.
:param slots_handler: The calling class instance for reference to slot information attributes.
"""
super().__init__()
self.daemon = True
self._slots_handler = slots_handler
@@ -58,6 +71,13 @@ class SlotsAdvanceThread(Thread):
self.start()
def sync_slot(self, cur: Union['cursor', 'Cursor[Any]'], database: str, slot: str, lsn: int) -> None:
"""Execute a ``pg_replication_slot_advance`` query and store success for scheduled synchronisation task.
:param cur: database connection cursor.
:param database: name of the database associated with the slot.
:param slot: name of the slot to be synchronised.
:param lsn: last known LSN position
"""
failed = copy = False
try:
cur.execute("SELECT pg_catalog.pg_replication_slot_advance(%s, %s)", (slot, format_lsn(lsn)))
@@ -79,6 +99,11 @@ class SlotsAdvanceThread(Thread):
self._scheduled.pop(database)
def sync_slots_in_database(self, database: str, slots: List[str]) -> None:
"""Synchronise slots for a single database.
:param database: name of the database.
:param slots: list of slot names to synchronise.
"""
with self._slots_handler.get_local_connection_cursor(dbname=database, options='-c statement_timeout=0') as cur:
for slot in slots:
with self._condition:
@@ -87,6 +112,7 @@ class SlotsAdvanceThread(Thread):
self.sync_slot(cur, database, slot, lsn)
def sync_slots(self) -> None:
"""Synchronise slots for all scheduled databases."""
with self._condition:
databases = list(self._scheduled.keys())
for database in databases:
@@ -99,6 +125,12 @@ class SlotsAdvanceThread(Thread):
logger.error('Failed to advance replication slots in database %s: %r', database, e)
def run(self) -> None:
"""Thread main loop entrypoint.
.. note::
Thread will wait until a sync is scheduled from outside, normally triggered during the HA loop or a wakeup
call.
"""
while True:
with self._condition:
if not self._scheduled:
@@ -107,6 +139,14 @@ class SlotsAdvanceThread(Thread):
self.sync_slots()
def schedule(self, advance_slots: Dict[str, Dict[str, int]]) -> Tuple[bool, List[str]]:
"""Trigger a synchronisation of slots.
This is the main entrypoint for Patroni HA loop wakeup call.
:param advance_slots: dictionary containing slots that need to be advanced
:return: tuple of failure status and a list of slots to be copied
"""
with self._condition:
for database, values in advance_slots.items():
self._scheduled[database].update(values)
@@ -118,15 +158,27 @@ class SlotsAdvanceThread(Thread):
return ret
def on_promote(self) -> None:
"""Reset state of the daemon."""
with self._condition:
self._scheduled.clear()
self._failed = False
self._copy_slots = []
class SlotsHandler(object):
class SlotsHandler:
"""Handler for managing and storing information on replication slots in PostgreSQL.
:ivar pg_replslot_dir: system location path of the PostgreSQL replication slots.
:ivar _logical_slots_processing_queue: yet to be processed logical replication slots on the primary
"""
def __init__(self, postgresql: 'Postgresql') -> None:
"""Create an instance with storage attributes for replication slots and schedule the first synchronisation.
:param postgresql: Calling class instance providing interface to PostgreSQL.
"""
self._force_readiness_check = False
self._schedule_load_slots = False
self._postgresql = postgresql
self._advance = None
self._replication_slots: Dict[str, Dict[str, Any]] = {} # already existing replication slots
@@ -135,23 +187,46 @@ class SlotsHandler(object):
self.schedule()
def _query(self, sql: str, *params: Any) -> Union['cursor', 'Cursor[Any]']:
"""Helper method for :meth:`Postgresql.query`.
:param sql: SQL statement to execute.
:param params: parameters to pass through to :meth:`Postgresql.query`.
:returns: query response.
"""
return self._postgresql.query(sql, *params, retry=False)
@staticmethod
def _copy_items(src: Dict[str, Any], dst: Dict[str, Any], keys: Optional[List[str]] = None) -> None:
def _copy_items(src: Dict[str, Any], dst: Dict[str, Any], keys: Optional[Collection[str]] = None) -> None:
"""Select values from *src* dictionary to update in *dst* dictionary for optional supplied *keys*.
:param src: source dictionary that *keys* will be looked up from.
:param dst: destination dictionary to be updated.
:param keys: optional list of keys to be looked up in the source dictionary.
"""
dst.update({key: src[key] for key in keys or ('datoid', 'catalog_xmin', 'confirmed_flush_lsn')})
def process_permanent_slots(self, slots: List[Dict[str, Any]]) -> Dict[str, int]:
"""This methods solves three problems at once (I know, it is weird).
"""Process replication slot information from the host and prepare information used in subsequent cluster tasks.
.. note::
This methods solves three problems.
The ``cluster_info_query`` from :class:``Postgresql`` is executed every HA loop and returns information
about all replication slots that exists on the current host.
Based on this information perform the following actions:
1. For the primary we want to expose to DCS permanent logical slots, therefore build (and return) a dict
that maps permanent logical slot names to ``confirmed_flush_lsn``.
2. detect if one of the previously known permanent slots is missing and schedule resync.
3. Update the local cache with the fresh ``catalog_xmin`` and ``confirmed_flush_lsn`` for every known slot.
The cluster_info_query from `Postgresql` is executed every HA loop and returns
information about all replication slots that exists on the current host.
Based on this information we perform the following actions:
1. For the primary we want to expose to DCS permanent logical slots, therefore the method
builds (and returns) a dict, that maps permanent logical slot names and confirmed_flush_lsns.
2. This method also detects if one of the previously known permanent slots got missing and schedules resync.
3. Updates the local cache with the fresh catalog_xmin and confirmed_flush_lsn for every known slot.
This info is used when performing the check of logical slot readiness on standbys.
:param slots: replication slot information that exists on the current host.
:return: dictionary of logical slot names to ``confirmed_flush_lsn``.
"""
ret: Dict[str, int] = {}
@@ -173,13 +248,23 @@ class SlotsHandler(object):
return ret
def load_replication_slots(self) -> None:
"""Query replication slot information from the database and store it for processing by other tasks.
.. note::
Only supported from PostgreSQL version 9.4 onwards.
Store replication slot ``name``, ``type``, ``plugin``, ``database`` and ``datoid``.
If PostgreSQL version is 10 or newer also store ``catalog_xmin`` and ``confirmed_flush_lsn``.
When using logical slots, store information separately for slot synchronisation on replica nodes.
"""
if self._postgresql.major_version >= 90400 and self._schedule_load_slots:
replication_slots: Dict[str, Dict[str, Any]] = {}
extra = ", catalog_xmin, pg_catalog.pg_wal_lsn_diff(confirmed_flush_lsn, '0/0')::bigint"\
extra = ", catalog_xmin, pg_catalog.pg_wal_lsn_diff(confirmed_flush_lsn, '0/0')::bigint" \
if self._postgresql.major_version >= 100000 else ""
skip_temp_slots = ' WHERE NOT temporary' if self._postgresql.major_version >= 100000 else ''
cursor = self._query('SELECT slot_name, slot_type, plugin, database, datoid'
'{0} FROM pg_catalog.pg_replication_slots{1}'.format(extra, skip_temp_slots))
cursor = self._query(f'SELECT slot_name, slot_type, plugin, database, datoid'
f'{extra} FROM pg_catalog.pg_replication_slots{skip_temp_slots}')
for r in cursor:
value = {'type': r[1]}
if r[1] == 'logical':
@@ -195,16 +280,34 @@ class SlotsHandler(object):
self._force_readiness_check = False
def ignore_replication_slot(self, cluster: Cluster, name: str) -> bool:
"""Check if slot *name* should not be managed by Patroni.
:param cluster: cluster state information object.
:param name: name of the slot to ignore
:returns: ``True`` if slot *name* matches any slot specified in ``ignore_slots`` configuration,
otherwise will pass through and return result of :meth:`CitusHandler.ignore_replication_slot`.
"""
slot = self._replication_slots[name]
if cluster.config:
for matcher in cluster.config.ignore_slots_matchers:
if ((matcher.get("name") is None or matcher["name"] == name)
and all(not matcher.get(a) or matcher[a] == slot.get(a) for a in ('database', 'plugin', 'type'))):
if (
(matcher.get("name") is None or matcher["name"] == name)
and all(not matcher.get(a) or matcher[a] == slot.get(a)
for a in ('database', 'plugin', 'type'))
):
return True
return self._postgresql.citus_handler.ignore_replication_slot(slot)
def drop_replication_slot(self, name: str) -> Tuple[bool, bool]:
"""Returns a tuple(active, dropped)"""
"""Drop a named slot from Postgres.
:param name: name of the slot to be dropped.
:returns: a tuple of ``active`` and ``dropped``. ``active`` is ``True`` if the slot is active,
``dropped`` is ``True`` if the slot was successfully dropped. If the slot was not found return
``False`` for both.
"""
cursor = self._query(('WITH slots AS (SELECT slot_name, active'
' FROM pg_catalog.pg_replication_slots WHERE slot_name = %s),'
' dropped AS (SELECT pg_catalog.pg_drop_replication_slot(slot_name),'
@@ -217,7 +320,19 @@ class SlotsHandler(object):
return row
def _drop_incorrect_slots(self, cluster: Cluster, slots: Dict[str, Any], paused: bool) -> None:
# drop old replication slots which are not presented in desired slots
"""Compare required slots and configured as permanent slots with those found, dropping extraneous ones.
.. note::
Slots that are not contained in *slots* will be dropped.
Slots can be filtered out with ``ignore_slots`` configuration.
Slots that have matching names but do not match attributes in *slots* will also be dropped.
:param cluster: cluster state information object.
:param slots: dictionary of desired slot names as keys with slot attributes as a dictionary value, if known.
:param paused: ``True`` if the patroni cluster is currently in a paused state.
"""
# drop old replication slots which are not presented in desired slots.
for name in set(self._replication_slots) - set(slots):
if not paused and not self.ignore_replication_slot(cluster, name):
active, dropped = self.drop_replication_slot(name)
@@ -229,6 +344,8 @@ class SlotsHandler(object):
logger.debug("Unable to drop unknown replication slot '%s', slot is still active", name)
else:
logger.error("Failed to drop replication slot '%s'", name)
# drop slots with matching names but attributes that do not match, e.g. `plugin` or `database`.
for name, value in slots.items():
if name in self._replication_slots and not compare_slots(value, self._replication_slots[name]):
logger.info("Trying to drop replication slot '%s' because value is changing from %s to %s",
@@ -240,31 +357,57 @@ class SlotsHandler(object):
self._schedule_load_slots = True
def _ensure_physical_slots(self, slots: Dict[str, Any]) -> None:
"""Create any missing physical replication *slots*.
Any failures are logged and do not interrupt creation of all *slots*.
:param slots: A dictionary mapping slot name to slot attributes. This method only considers a slot
if the value is a dictionary with the key ``type`` and a value of ``physical``.
"""
immediately_reserve = ', true' if self._postgresql.major_version >= 90600 else ''
for name, value in slots.items():
if name not in self._replication_slots and value['type'] == 'physical':
try:
self._query(("SELECT pg_catalog.pg_create_physical_replication_slot(%s{0})"
" WHERE NOT EXISTS (SELECT 1 FROM pg_catalog.pg_replication_slots"
" WHERE slot_type = 'physical' AND slot_name = %s)").format(
immediately_reserve), name, name)
self._query(f"SELECT pg_catalog.pg_create_physical_replication_slot(%s{immediately_reserve})"
f" WHERE NOT EXISTS (SELECT 1 FROM pg_catalog.pg_replication_slots"
f" WHERE slot_type = 'physical' AND slot_name = %s)",
name, name)
except Exception:
logger.exception("Failed to create physical replication slot '%s'", name)
self._schedule_load_slots = True
@contextmanager
def get_local_connection_cursor(self, **kwargs: Any) -> Generator[Union['cursor', 'Cursor[Any]'], None, None]:
def get_local_connection_cursor(self, **kwargs: Any) -> Iterator[Union['cursor', 'Cursor[Any]']]:
"""Create a new database connection to local server.
Create a non-blocking connection cursor to avoid the situation where an execution of the query of
``pg_replication_slot_advance`` takes longer than the timeout on a HA loop, which could cause a false
failure state.
:param kwargs: Any keyword arguments to pass to :func:`psycopg.connect`.
:yields: connection cursor object, note implementation varies depending on version of :mod:`psycopg`.
"""
conn_kwargs = self._postgresql.config.local_connect_kwargs
conn_kwargs.update(kwargs)
with get_connection_cursor(**conn_kwargs) as cur:
yield cur
def _ensure_logical_slots_primary(self, slots: Dict[str, Any]) -> None:
"""Create any missing logical replication *slots* on the primary.
If the logical slot already exists, copy state information into the replication slots structure stored in the
class instance.
:param slots: Slots that should exist are supplied in a dictionary, mapping slot name to any attributes.
The method will only consider slots that have a value that is a dictionary with a key ``type``
with a value that is ``logical``.
"""
# Group logical slots to be created by database name
logical_slots: Dict[str, Dict[str, Dict[str, Any]]] = defaultdict(dict)
for name, value in slots.items():
if value['type'] == 'logical':
# If the logical already exists, copy some information about it into the original structure
if self._replication_slots.get(name, {}).get('datoid'):
self._copy_items(self._replication_slots[name], value)
else:
@@ -286,27 +429,51 @@ class SlotsHandler(object):
self._schedule_load_slots = True
def schedule_advance_slots(self, slots: Dict[str, Dict[str, int]]) -> Tuple[bool, List[str]]:
"""Wrapper to ensure slots advance daemon thread is started if not already.
:param slots: dictionary containing slot information.
:return: tuple with the result of the scheduling of slot advancement: ``failed`` and list of slots to copy.
"""
if not self._advance:
self._advance = SlotsAdvanceThread(self)
return self._advance.schedule(slots)
def _ensure_logical_slots_replica(self, cluster: Cluster, slots: Dict[str, Any]) -> List[str]:
"""Update logical *slots* on replicas.
If the logical slot already exists, copy state information into the replication slots structure stored in the
class instance. Slots that exist are also advanced if their ``confirmed_flush_lsn`` is greater than the stored
state of the slot.
As logical slots can only be created when the primary is available, pass the list of slots that need to be
copied back to the caller. They will be created on replicas with :meth:`SlotsHandler.copy_logical_slots`.
:param cluster: object containing stateful information for the cluster.
:param slots: A dictionary mapping slot name to slot attributes. This method only considers a slot
if the value is a dictionary with the key ``type`` and a value of ``logical``.
:returns: list of slots to be copied from the primary.
"""
# Group logical slots to be advanced by database name
advance_slots: Dict[str, Dict[str, int]] = defaultdict(dict)
create_slots: List[str] = [] # And collect logical slots to be created on the replica
create_slots: List[str] = [] # Collect logical slots to be created on the replica
for name, value in slots.items():
if value['type'] == 'logical':
# If the logical already exists, copy some information about it into the original structure
if self._replication_slots.get(name, {}).get('datoid'):
self._copy_items(self._replication_slots[name], value)
if cluster.slots and name in cluster.slots:
try: # Skip slots that doesn't need to be advanced
if value['confirmed_flush_lsn'] < int(cluster.slots[name]):
advance_slots[value['database']][name] = int(cluster.slots[name])
except Exception as e:
logger.error('Failed to parse "%s": %r', cluster.slots[name], e)
elif cluster.slots and name in cluster.slots: # We want to copy only slots with feedback in a DCS
create_slots.append(name)
if value['type'] != 'logical':
continue
# If the logical already exists, copy some information about it into the original structure
if self._replication_slots.get(name, {}).get('datoid'):
self._copy_items(self._replication_slots[name], value)
if cluster.slots and name in cluster.slots:
try: # Skip slots that don't need to be advanced
if value['confirmed_flush_lsn'] < int(cluster.slots[name]):
advance_slots[value['database']][name] = int(cluster.slots[name])
except Exception as e:
logger.error('Failed to parse "%s": %r', cluster.slots[name], e)
elif cluster.slots and name in cluster.slots: # We want to copy only slots with feedback in a DCS
create_slots.append(name)
error, copy_slots = self.schedule_advance_slots(advance_slots)
if error:
@@ -315,6 +482,20 @@ class SlotsHandler(object):
def sync_replication_slots(self, cluster: Cluster, nofailover: bool,
replicatefrom: Optional[str] = None, paused: bool = False) -> List[str]:
"""During the HA loop read, check and alter replication slots found in the cluster.
Read physical and logical slots found on the primary, then compare to those configured in the DCS.
Drop any slots that do not match those required by configuration and are not configured as permanent.
Create any missing physical slots. If we are the leader then logical slots too, otherwise if logical slots
are known and active create them on replica nodes.
:param cluster: object containing stateful information for the cluster.
:param nofailover: ``True`` if this node has been tagged to not be a failover candidate.
:param replicatefrom: the tag containing the node to replicate from.
:param paused: ``True`` if the cluster is in maintenance mode.
:returns: list of logical replication slots names that should be copied from the primary.
"""
ret = []
if self._postgresql.major_version >= 90400 and cluster.config:
try:
@@ -327,7 +508,7 @@ class SlotsHandler(object):
self._ensure_physical_slots(slots)
if self._postgresql.is_leader():
if self._postgresql.is_primary():
self._logical_slots_processing_queue.clear()
self._ensure_logical_slots_primary(slots)
elif cluster.slots and slots:
@@ -342,7 +523,17 @@ class SlotsHandler(object):
return ret
@contextmanager
def _get_leader_connection_cursor(self, leader: Leader) -> Generator[Union['cursor', 'Cursor[Any]'], None, None]:
def _get_leader_connection_cursor(self, leader: Leader) -> Iterator[Union['cursor', 'Cursor[Any]']]:
"""Create a new database connection to the leader.
.. note::
Uses rewind user credentials because it has enough permissions to read files from PGDATA.
Sets the options ``connect_timeout`` to ``3`` and ``statement_timeout`` to ``2000``.
:param leader: object with information on the leader
:yields: connection cursor object, note implementation varies depending on version of ``psycopg``.
"""
conn_kwargs = leader.conn_kwargs(self._postgresql.config.rewind_credentials)
conn_kwargs['dbname'] = self._postgresql.database
with get_connection_cursor(connect_timeout=3, options="-c statement_timeout=2000", **conn_kwargs) as cur:
@@ -351,16 +542,16 @@ class SlotsHandler(object):
def check_logical_slots_readiness(self, cluster: Cluster, replicatefrom: Optional[str]) -> bool:
"""Determine whether all known logical slots are synchronised from the leader.
1) Retrieve the current ``catalog_xmin`` value for the physical slot from the cluster leader, and
2) using previously stored list of "unready" logical slots, those which have yet to be checked hence have no
stored slot attributes,
3) store logical slot ``catalog_xmin`` when the physical slot ``catalog_xmin`` becomes valid.
1) Retrieve the current ``catalog_xmin`` value for the physical slot from the cluster leader, and
2) using previously stored list of "unready" logical slots, those which have yet to be checked hence have no
stored slot attributes,
3) store logical slot ``catalog_xmin`` when the physical slot ``catalog_xmin`` becomes valid.
:param cluster: object containing stateful information for the cluster.
:param replicatefrom: name of the member that should be used to replicate from.
:param cluster: object containing stateful information for the cluster.
:param replicatefrom: name of the member that should be used to replicate from.
:returns: ``False`` if any issue while checking logical slots readiness, ``True`` otherwise.
"""
:returns: ``False`` if any issue while checking logical slots readiness, ``True`` otherwise.
"""
catalog_xmin = None
if self._logical_slots_processing_queue and cluster.leader:
slot_name = cluster.get_my_slot_name_on_primary(self._postgresql.name, replicatefrom)
@@ -445,6 +636,11 @@ class SlotsHandler(object):
logger.info('Logical slot %s is safe to be used after a failover', name)
def copy_logical_slots(self, cluster: Cluster, create_slots: List[str]) -> None:
"""Create logical replication slots on standby nodes.
:param cluster: object containing stateful information for the cluster.
:param create_slots: list of slot names to copy from the primary.
"""
leader = cluster.leader
if not leader:
return
@@ -471,31 +667,48 @@ class SlotsHandler(object):
logger.error("Failed to copy logical slots from the %s via postgresql connection: %r", leader.name, e)
if copy_slots and self._postgresql.stop():
pg_perm.set_permissions_from_data_directory(self._postgresql.data_dir)
for name, value in copy_slots.items():
slot_dir = os.path.join(self._postgresql.slots_handler.pg_replslot_dir, name)
slot_dir = os.path.join(self.pg_replslot_dir, name)
slot_tmp_dir = slot_dir + '.tmp'
if os.path.exists(slot_tmp_dir):
shutil.rmtree(slot_tmp_dir)
os.makedirs(slot_tmp_dir)
os.chmod(slot_tmp_dir, pg_perm.dir_create_mode)
fsync_dir(slot_tmp_dir)
with open(os.path.join(slot_tmp_dir, 'state'), 'wb') as f:
slot_filename = os.path.join(slot_tmp_dir, 'state')
with open(slot_filename, 'wb') as f:
os.chmod(slot_filename, pg_perm.file_create_mode)
f.write(value['data'])
f.flush()
os.fsync(f.fileno())
if os.path.exists(slot_dir):
shutil.rmtree(slot_dir)
os.rename(slot_tmp_dir, slot_dir)
os.chmod(slot_dir, pg_perm.dir_create_mode)
fsync_dir(slot_dir)
self._logical_slots_processing_queue[name] = None
fsync_dir(self._postgresql.slots_handler.pg_replslot_dir)
fsync_dir(self.pg_replslot_dir)
self._postgresql.start()
def schedule(self, value: Optional[bool] = None) -> None:
"""Schedule the loading of slot information from the database.
:param value: the optional value can be used to unschedule if set to ``False`` or force it to be ``True``.
If it is omitted the value will be ``True`` if this PostgreSQL node supports slot replication.
"""
if value is None:
value = self._postgresql.major_version >= 90400
self._schedule_load_slots = self._force_readiness_check = value
def on_promote(self) -> None:
"""Entry point from HA cycle used when a standby node is to be promoted to primary.
.. note::
If logical replication slot synchronisation is enabled then slot advancement will be triggered.
If any logical slots that were copied are yet to be confirmed as ready a warning message will be logged.
"""
if self._advance:
self._advance.on_promote()

View File

@@ -182,7 +182,7 @@ class _ReplicaList(List[_Replica]):
swapping, but only if lag on this member is exceeding a threshold (``maximum_lag_on_syncnode``).
:ivar max_lsn: maximum value of ``_Replica.lsn`` among all values. In case if there is just one
element in the list we take value of ``pg_current_wal_lsn()``.
element in the list we take value of ``pg_current_wal_flush_lsn()``.
"""
def __init__(self, postgresql: 'Postgresql', cluster: Cluster) -> None:
@@ -283,12 +283,13 @@ END;$$""")
self._ready_replicas[replica.application_name] = replica.pid
def current_state(self, cluster: Cluster) -> Tuple[CaseInsensitiveSet, CaseInsensitiveSet]:
"""Finds best candidates to be the synchronous standbys.
"""Find the best candidates to be the synchronous standbys.
Current synchronous standby is always preferred, unless it has disconnected or does not want to be a
synchronous standby any longer.
Standbys are selected based on values from the global configuration:
- `maximum_lag_on_syncnode`: would help swapping unhealthy sync replica in case if it stops
responding (or hung). Please set the value high enough so it won't unncessarily swap sync
standbys during high loads. Any value less or equal of 0 keeps the behavior backward compatible.
@@ -338,7 +339,7 @@ END;$$""")
sync_param = next(iter(sync), None)
if not (self._postgresql.config.set_synchronous_standby_names(sync_param)
and self._postgresql.state == 'running' and self._postgresql.is_leader()) or has_asterisk:
and self._postgresql.state == 'running' and self._postgresql.is_primary()) or has_asterisk:
return
time.sleep(0.1) # Usualy it takes 1ms to reload postgresql.conf, but we will give it 100ms

View File

@@ -178,10 +178,11 @@ class ValidatorFactory:
:returns: the Patroni validator object that corresponds to the specification found in *validator*.
:raises :class:`ValidatorFactoryNoType`: if *validator* contains no ``type`` key.
:raises :class:`ValidatorFactoryInvalidType`: if ``type`` key from *validator* contains an invalid value.
:raises :class:`ValidatorFactoryInvalidSpec`: if *validator* contains an invalid set of attributes for the
given ``type``.
:raises:
:class:`ValidatorFactoryNoType`: if *validator* contains no ``type`` key.
:class:`ValidatorFactoryInvalidType`: if ``type`` key from *validator* contains an invalid value.
:class:`ValidatorFactoryInvalidSpec`: if *validator* contains an invalid set of attributes for the given
``type``.
:Example:
@@ -265,7 +266,8 @@ def _read_postgres_gucs_validators_file(file: str) -> Dict[str, Any]:
:returns: the YAML content parsed into a Python object. If any issue is faced while reading/parsing the file, then
return ``None``.
:raises :class:`InvalidGucValidatorsFile`: if faces an issue while reading or parsing *file*.
:raises:
:class:`InvalidGucValidatorsFile`: if faces an issue while reading or parsing *file*.
"""
try:
with open(file, encoding='UTF-8') as stream:
@@ -462,11 +464,13 @@ def transform_postgresql_parameter_value(version: int, name: str, value: Any,
:param value: value of the Postgres GUC.
:param available_gucs: a set of all GUCs available in Postgres *version*. Each item is the name of a Postgres
GUC. Used for a couple purposes:
* Disallow writing GUCs to ``postgresql.conf`` that does not exist in Postgres *version*;
* Avoid ignoring GUC *name* if it does not have a validator in ``parameters``, but is a valid GUC in Postgres
*version*.
:returns: The return value may be one among
* Disallow writing GUCs to ``postgresql.conf`` that does not exist in Postgres *version*;
* Avoid ignoring GUC *name* if it does not have a validator in ``parameters``, but is a valid GUC in
Postgres *version*.
:returns: The return value may be one among:
* The original *value* if *name* seems to be an extension GUC (contains a period '.'); or
* ``None`` if **name** is a recovery GUC; or
* *value* transformed to the expected format for GUC *name* in Postgres *version* using validators defined in
@@ -490,10 +494,11 @@ def transform_recovery_parameter_value(version: int, name: str, value: Any,
:param value: value of the Postgres recovery GUC.
:param available_gucs: a set of all GUCs available in Postgres *version*. Each item is the name of a Postgres
GUC. Used for a couple purposes:
* Disallow writing GUCs to ``recovery.conf`` (or ``postgresql.conf`` depending on *version*), that does not
exist in Postgres *version*;
* Avoid ignoring recovery GUC *name* if it does not have a validator in ``recovery_parameters``, but is a valid
GUC in Postgres *version*.
* Disallow writing GUCs to ``recovery.conf`` (or ``postgresql.conf`` depending on *version*), that does not
exist in Postgres *version*;
* Avoid ignoring recovery GUC *name* if it does not have a validator in ``recovery_parameters``, but is a
valid GUC in Postgres *version*.
:returns: *value* transformed to the expected format for recovery GUC *name* in Postgres *version* using validators
defined in ``recovery_parameters``. It can also return ``None``. See :func:`_transform_parameter_value`.

View File

@@ -1,7 +1,8 @@
"""Abstraction layer for ``psycopg`` module.
"""Abstraction layer for :mod:`psycopg` module.
This module is able to handle both ``pyscopg2`` and ``psycopg3``, and it exposes a common interface for both.
``psycopg2`` takes precedence. ``psycopg3`` will only be used if ``psycopg2`` is either absent or older than ``2.5.4``.
This module is able to handle both :mod:`pyscopg2` and :mod:`psycopg`, and it exposes a common interface for both.
:mod:`psycopg2` takes precedence. :mod:`psycopg` will only be used if :mod:`psycopg2` is either absent or older than
``2.5.4``.
"""
from typing import Any, Optional, TYPE_CHECKING, Union
if TYPE_CHECKING: # pragma: no cover
@@ -28,7 +29,7 @@ try:
"""Quote *value* as a SQL literal.
.. note::
*value* is quoted through ``psycopg`` adapters.
*value* is quoted through :mod:`psycopg2` adapters.
:param value: value to be quoted.
:param conn: if a connection is given then :func:`quote_literal` checks if any special handling based on server
@@ -44,14 +45,14 @@ except ImportError:
from psycopg import connect as __connect, sql, Error, DatabaseError, OperationalError, ProgrammingError
def _connect(dsn: Optional[str] = None, **kwargs: Any) -> 'Connection[Any]':
"""Call ``psycopg.connect`` with ``dsn`` and ``**kwargs``.
"""Call :func:`psycopg.connect` with *dsn* and ``**kwargs``.
.. note::
Will create ``server_version`` attribute in the returning connection, so it keeps compatibility with the
object that would be returned by ``psycopg2.connect``.
object that would be returned by :func:`psycopg2.connect`.
:param dsn: DSN to call ``psycopg.connect`` with.
:param kwargs: keyword arguments to call ``psycopg.connect`` with.
:param dsn: DSN to call :func:`psycopg.connect` with.
:param kwargs: keyword arguments to call :func:`psycopg.connect` with.
:returns: a connection to the database.
"""
@@ -89,11 +90,11 @@ def connect(*args: Any, **kwargs: Any) -> Union['connection', 'Connection[Any]']
It also enforces ``search_path=pg_catalog`` for non-replication connections to mitigate security issues as
Patroni relies on superuser connections.
:param args: positional arguments to call ``connect`` function from ``psycopg`` module.
:param kwargs: keyword arguments to call ``connect`` function from ``psycopg`` module.
:param args: positional arguments to call :func:`~psycopg.connect` function from :mod:`psycopg` module.
:param kwargs: keyword arguments to call :func:`~psycopg.connect` function from :mod:`psycopg` module.
:returns: a connection to the database. Can be either a :class:`psycopg.Connection` if using ``psycopg3``, or a
:class:`psycopg2.extensions.connection` if using ``psycopg2``.
:returns: a connection to the database. Can be either a :class:`psycopg.Connection` if using :mod:`psycopg`, or a
:class:`psycopg2.extensions.connection` if using :mod:`psycopg2`.
"""
if kwargs and 'replication' not in kwargs and kwargs.get('fallback_application_name') != 'Patroni ctl':
options = [kwargs['options']] if 'options' in kwargs else []
@@ -109,7 +110,7 @@ def quote_ident(value: Any, conn: Optional[Union['cursor', 'connection', 'Connec
:param value: value to be quoted.
:param conn: connection to evaluate the returning string into. Can be either a :class:`psycopg.Connection` if
using ``psycopg3``, or a :class:`psycopg2.extensions.connection` if using ``psycopg2``.
using :mod:`psycopg`, or a :class:`psycopg2.extensions.connection` if using :mod:`psycopg2`.
:returns: *value* quoted as a SQL identifier.
"""

View File

@@ -34,10 +34,11 @@ class PatroniRequest(object):
"""Create a new :class:`PatroniRequest` instance with given *config*.
:param config: Patroni YAML configuration.
:param insecure: how to deal with SSL certs verification
:param insecure: how to deal with SSL certs verification:
* If ``True`` it will perform REST API requests without verifying SSL certs; or
* If ``False`` it will perform REST API requests and verify SSL certs; or
* If ``None`` it will behave according to the value of ``ctl -> insecure`` configuration; or
* If ``None`` it will behave according to the value of ``ctl.insecure`` configuration; or
* If none of the above applies, then it falls back to ``False``.
"""
self._insecure = insecure
@@ -51,7 +52,7 @@ class PatroniRequest(object):
:param config: Patroni YAML configuration.
:param name: name of the setting value to be retrieved.
:returns: value of ``ctl -> *name*`` if present, ``None`` otherwise.
:returns: value of ``ctl.*name*`` if present, ``None`` otherwise.
"""
return config.get('ctl', {}).get(name, default)
@@ -83,12 +84,13 @@ class PatroniRequest(object):
:param config: Patroni YAML configuration.
:param name: prefix of the Patroni SSL related setting name. Currently, supports these:
* ``cert``: gets translated to ``certfile``
* ``key``: gets translated to ``keyfile``
Will attempt to fetch the requested key first from ``ctl`` section.
:returns: value of ``ctl -> *name*file`` if present, ``None`` otherwise.
:returns: value of ``ctl.*name*file`` if present, ``None`` otherwise.
"""
value = self._get_ctl_value(config, name + 'file')
self._apply_pool_param(name + '_file', value)
@@ -99,13 +101,13 @@ class PatroniRequest(object):
Configure these HTTP headers for requests:
* ``authorization``: based on Patroni' CTL or REST API authentication config;
* ``user-agent``: based on `patroni.utils.USER_AGENT`.
* ``authorization``: based on Patroni' CTL or REST API authentication config;
* ``user-agent``: based on ``patroni.utils.USER_AGENT``.
Also configure SSL related settings for requests:
* ``ca_certs`` is configured if ``ctl -> cacert`` or ``restapi -> cafile`` is available;
* ``cert``, ``key`` and ``key_password`` are configured if ``ctl -> certfile`` is available.
* ``ca_certs`` is configured if ``ctl.cacert`` or ``restapi.cafile`` is available;
* ``cert``, ``key`` and ``key_password`` are configured if ``ctl.certfile`` is available.
:param config: Patroni YAML configuration.
"""

View File

@@ -16,6 +16,7 @@ import platform
import random
import re
import socket
import subprocess
import sys
import tempfile
import time
@@ -129,8 +130,8 @@ def parse_bool(value: Any) -> Union[bool, None]:
.. note::
The parsing is case-insensitive, and takes into consideration these values:
* ``on``, ``true``, ``yes``, and ``1`` as ``True``.
* ``off``, ``false``, ``no``, and ``0`` as ``False``.
* ``on``, ``true``, ``yes``, and ``1`` as ``True``.
* ``off``, ``false``, ``no``, and ``0`` as ``False``.
:param value: value to be parsed to :class:`bool`.
@@ -245,14 +246,16 @@ def convert_to_base_unit(value: Union[int, float], unit: str, base_unit: Optiona
"""Convert *value* as a *unit* of compute information or time to *base_unit*.
:param value: value to be converted to the base unit.
:param unit: unit of *value*. Accepts these units (case sensitive)
* For space: ``B``, ``kB``, ``MB``, ``GB``, or ``TB``;
* For time: ``d``, ``h``, ``min``, ``s``, ``ms``, or ``us``.
:param unit: unit of *value*. Accepts these units (case sensitive):
* For space: ``B``, ``kB``, ``MB``, ``GB``, or ``TB``;
* For time: ``d``, ``h``, ``min``, ``s``, ``ms``, or ``us``.
:param base_unit: target unit in the conversion. May contain the target unit with an associated value, e.g
``512MB``. Accepts these units (case sensitive)
* For space: ``B``, ``kB``, or ``MB``;
* For time: ``ms``, ``s``, or ``min``.
``512MB``. Accepts these units (case sensitive):
* For space: ``B``, ``kB``, or ``MB``;
* For time: ``ms``, ``s``, or ``min``.
:returns: *value* in *unit* converted to *base_unit*. Returns ``None`` if *unit* or *base_unit* is invalid.
@@ -402,7 +405,8 @@ def compare_values(vartype: str, unit: Optional[str], old_value: Any, new_value:
"""Check if *old_value* and *new_value* are equivalent after parsing them as *vartype*.
:param vartpe: the target type to parse *old_value* and *new_value* before comparing them. Accepts any among of the
following (case sensitive)
following (case sensitive):
* ``bool``: parse values using :func:`parse_bool`; or
* ``integer``: parse values using :func:`parse_int`; or
* ``real``: parse values using :func:`parse_real`; or
@@ -459,7 +463,7 @@ def compare_values(vartype: str, unit: Optional[str], old_value: Any, new_value:
def _sleep(interval: Union[int, float]) -> None:
"""Wrap :func:`time.sleep`.
"""Wrap :func:`~time.sleep`.
:param interval: Delay execution for a given number of seconds. The argument may be a floating point number for
subsecond precision.
@@ -467,6 +471,18 @@ def _sleep(interval: Union[int, float]) -> None:
time.sleep(interval)
def read_stripped(file_path: str) -> Iterator[str]:
"""Iterate over stripped lines in the given file.
:param file_path: path to the file to read from
:yields: each line from the given file stripped
"""
with open(file_path) as f:
for line in f:
yield line.strip()
class RetryFailedError(PatroniException):
"""Maximum number of attempts exhausted in retry operation."""
@@ -536,6 +552,7 @@ class Retry(object):
"""Set next cycle delay.
It will be the minimum value between:
* current delay with ``backoff``; or
* ``max_delay``.
"""
@@ -549,10 +566,14 @@ class Retry(object):
def ensure_deadline(self, timeout: float, raise_ex: Optional[Exception] = None) -> bool:
"""Calculates, sets, and checks the remaining deadline time.
:param timeout: if the *deadline* is smaller than the provided *timeout* value raise *raise_ex* exception
:param raise_ex: the exception object that will be raised if the *deadline* is smaller than provided *timeout*
:returns: `False` if *deadline* is smaller than a provided *timeout* and *raise_ex* isn't set. Otherwise `True`
:raises Exception: if calculated deadline is smaller than provided *timeout*
:param timeout: if the *deadline* is smaller than the provided *timeout* value raise *raise_ex* exception.
:param raise_ex: the exception object that will be raised if the *deadline* is smaller than provided *timeout*.
:returns: ``False`` if *deadline* is smaller than a provided *timeout* and *raise_ex* isn't set. Otherwise
``True``.
:raises:
:class:`Exception`: *raise_ex* if calculated deadline is smaller than provided *timeout*.
"""
self.deadline = self.stoptime - time.time()
if self.deadline < timeout:
@@ -565,9 +586,10 @@ class Retry(object):
"""Call a function *func* with arguments ``*args`` and ``*kwargs`` in a loop.
*func* will be called until one of the following conditions is met:
* It completes without throwing one of the configured ``retry_exceptions``; or
* ``max_retries`` is exceeded.; or
* ``deadline`` is exceeded.
* It completes without throwing one of the configured ``retry_exceptions``; or
* ``max_retries`` is exceeded.; or
* ``deadline`` is exceeded.
.. note::
* It will set loop stop time based on ``deadline`` attribute.
@@ -576,9 +598,10 @@ class Retry(object):
:param func: function to call.
:param args: positional arguments to call *func* with.
:params kwargs: keyword arguments to call *func* with.
:raises :class:`RetryFailedError`
* If ``max_tries`` is exceeded; or
* If ``deadline`` is exceeded.
:raises:
:class:`RetryFailedError`:
* If ``max_tries`` is exceeded; or
* If ``deadline`` is exceeded.
"""
self.reset()
@@ -613,7 +636,8 @@ def polling_loop(timeout: Union[int, float], interval: Union[int, float] = 1) ->
:param timeout: for how long (in seconds) from now it should keep returning values.
:param interval: for how long to sleep before returning a new value.
:rtype: Iterator[:class:`int`] with current iteration counter, starting from ``0``.
:yields: current iteration counter, starting from ``0``.
"""
start_time = time.time()
iteration = 0
@@ -627,14 +651,16 @@ def polling_loop(timeout: Union[int, float], interval: Union[int, float] = 1) ->
def split_host_port(value: str, default_port: Optional[int]) -> Tuple[str, int]:
"""Extract host(s) and port from *value*.
:param value: string from where host(s) and port will be extracted. Accepts either of these formats
* ``host:port``; or
* ``host1,host2,...,hostn:port``.
:param value: string from where host(s) and port will be extracted. Accepts either of these formats:
* ``host:port``; or
* ``host1,host2,...,hostn:port``.
Each ``host`` portion of *value* can be either:
* A FQDN; or
* An IPv4 address; or
* An IPv6 address, with or without square brackets.
* A FQDN; or
* An IPv4 address; or
* An IPv6 address, with or without square brackets.
:param default_port: if no port can be found in *param*, use *default_port* instead.
@@ -669,18 +695,23 @@ def uri(proto: str, netloc: Union[List[str], Tuple[str, Union[int, str]], str],
:param proto: the URI protocol.
:param netloc: the URI host(s) and port. Can be specified in either way among
* A :class:`list` or :class:`tuple`. The second item should be a port, and the first item should be composed of
hosts in either of these formats:
* ``host``; or.
* ``host1,host2,...,hostn``.
* A :class:`str` in either of these formats:
* ``host:port``; or
* ``host1,host2,...,hostn:port``.
In all cases, each ``host`` portion of *netloc* can be either:
* An FQDN; or
* An IPv4 address; or
* An IPv6 address, with or without square brackets.
* An FQDN; or
* An IPv4 address; or
* An IPv6 address, with or without square brackets.
:param path: the URI path.
:param user: the authenticating user, if any.
@@ -698,10 +729,11 @@ def uri(proto: str, netloc: Union[List[str], Tuple[str, Union[int, str]], str],
def iter_response_objects(response: HTTPResponse) -> Iterator[Dict[str, Any]]:
"""Iterate over the chunks of a :class:`HTTPResponse` and yield each JSON document that is found along the way.
"""Iterate over the chunks of a :class:`~urllib3.response.HTTPResponse` and yield each JSON document that is found.
:param response: the HTTP response from which JSON documents will be retrieved.
:rtype: Iterator[:class:`dict`] with current JSON document.
:yields: current JSON document.
"""
prev = ''
decoder = JSONDecoder()
@@ -730,33 +762,36 @@ def iter_response_objects(response: HTTPResponse) -> Iterator[Dict[str, Any]]:
def cluster_as_json(cluster: 'Cluster', global_config: Optional['GlobalConfig'] = None) -> Dict[str, Any]:
"""Get a JSON representation of *cluster*.
:param cluster: the :class:`Cluster` object to be parsed as JSON.
:param global_config: optional :class:`GlobalConfig` object to check the cluster state.
:param cluster: the :class:`~patroni.dcs.Cluster` object to be parsed as JSON.
:param global_config: optional :class:`~patroni.config.GlobalConfig` object to check the cluster state.
if not provided will be instantiated from the `Cluster.config`.
:returns: JSON representation of *cluster*.
These are the possible keys in the returning object depending on the available information in *cluster*:
* ``members``: list of members in the cluster. Each value is a :class:`dict` that may have the following keys:
* ``name``: the name of the host (unique in the cluster). The ``members`` list is sorted by this key;
* ``role``: ``leader``, ``standby_leader``, ``sync_standby``, or ``replica``;
* ``state``: ``stopping``, ``stopped``, ``stop failed``, ``crashed``, ``running``, ``starting``,
``start failed``, ``restarting``, ``restart failed``, ``initializing new cluster``, ``initdb failed``,
``running custom bootstrap script``, ``custom bootstrap failed``, or ``creating replica``;
* ``api_url``: REST API URL based on ``restapi->connect_address`` configuration;
* ``host``: PostgreSQL host based on ``postgresql->connect_address``;
* ``port``: PostgreSQL port based on ``postgresql->connect_address``;
* ``timeline``: PostgreSQL current timeline;
* ``pending_restart``: ``True`` if PostgreSQL is pending to be restarted;
* ``scheduled_restart``: scheduled restart timestamp, if any;
* ``tags``: any tags that were set for this member;
* ``lag``: replication lag, if applicable;
* ``pause``: ``True`` if cluster is in maintenance mode;
* ``scheduled_switchover``: if a switchover has been scheduled, then it contains this entry with these keys:
* ``at``: timestamp when switchover was scheduled to occur;
* ``from``: name of the member to be demoted;
* ``to``: name of the member to be promoted.
* ``members``: list of members in the cluster. Each value is a :class:`dict` that may have the following keys:
* ``name``: the name of the host (unique in the cluster). The ``members`` list is sorted by this key;
* ``role``: ``leader``, ``standby_leader``, ``sync_standby``, or ``replica``;
* ``state``: ``stopping``, ``stopped``, ``stop failed``, ``crashed``, ``running``, ``starting``,
``start failed``, ``restarting``, ``restart failed``, ``initializing new cluster``, ``initdb failed``,
``running custom bootstrap script``, ``custom bootstrap failed``, or ``creating replica``;
* ``api_url``: REST API URL based on ``restapi->connect_address`` configuration;
* ``host``: PostgreSQL host based on ``postgresql->connect_address``;
* ``port``: PostgreSQL port based on ``postgresql->connect_address``;
* ``timeline``: PostgreSQL current timeline;
* ``pending_restart``: ``True`` if PostgreSQL is pending to be restarted;
* ``scheduled_restart``: scheduled restart timestamp, if any;
* ``tags``: any tags that were set for this member;
* ``lag``: replication lag, if applicable;
* ``pause``: ``True`` if cluster is in maintenance mode;
* ``scheduled_switchover``: if a switchover has been scheduled, then it contains this entry with these keys:
* ``at``: timestamp when switchover was scheduled to occur;
* ``from``: name of the member to be demoted;
* ``to``: name of the member to be promoted.
"""
if not global_config:
from patroni.config import get_global_config
@@ -832,15 +867,18 @@ def validate_directory(d: str, msg: str = "{} {}") -> None:
If the directory does not exist, :func:`validate_directory` will attempt to create it.
:param d: the directory to be checked.
:param msg: a message to be thrown when raising :class:`PatroniException`, if any issue is faced. It must contain
2 placeholders to be used by :func:`format`:
* The first placeholder will be replaced with path *d*;
* The second placeholder will be replaced with the error condition.
:param msg: a message to be thrown when raising :class:`~patroni.exceptions.PatroniException`, if any issue is
faced. It must contain 2 placeholders to be used by :func:`format`:
:raises :class:`PatroniException`: if any issue is observed while validating *d*. Can be thrown in these situations
* *d* did not exist, and :func:`validate_directory` was not able to create it; or
* *d* is an existing directory, but Patroni is not able to write to that directory; or
* *d* is an existing file, not a directory.
* The first placeholder will be replaced with path *d*;
* The second placeholder will be replaced with the error condition.
:raises:
:class:`~patroni.exceptions.PatroniException`: if any issue is observed while validating *d*. Can be thrown if:
* *d* did not exist, and :func:`validate_directory` was not able to create it; or
* *d* is an existing directory, but Patroni is not able to write to that directory; or
* *d* is an existing file, not a directory.
"""
if not os.path.exists(d):
try:
@@ -895,13 +933,22 @@ def keepalive_socket_options(timeout: int, idle: int, cnt: int = 3) -> Iterator[
:param idle: value for ``TCP_KEEPIDLE``.
:param cnt: value for ``TCP_KEEPCNT``.
:rtype: Iterator[Tuple[:class:`int`, :class:`int`, :class:`int`]] of all keepalive related socket options to be
set. The first item in the tuple is the protocol, the second item is the option, and the third item is the
value to be used. The return values depend on the platform:
* ``Windows``: yield ``SO_KEEPALIVE``;
* ``Linux``: yield ``SO_KEEPALIVE``, ``TCP_USER_TIMEOUT``, ``TCP_KEEPIDLE`, ``TCP_KEEPINTVL``, and
``TCP_KEEPCNT``;
* ``MacOS``: yield ``SO_KEEPALIVE``, ``TCP_KEEPIDLE`, ``TCP_KEEPINTVL``, and ``TCP_KEEPCNT``
:yields: all keepalive related socket options to be set. The first item in the tuple is the protocol, the second
item is the option, and the third item is the value to be used. The return values depend on the platform:
* ``Windows``:
* ``SO_KEEPALIVE``.
* ``Linux``:
* ``SO_KEEPALIVE``;
* ``TCP_USER_TIMEOUT``;
* ``TCP_KEEPIDLE``;
* ``TCP_KEEPINTVL``;
* ``TCP_KEEPCNT``.
* ``MacOS``:
* ``SO_KEEPALIVE``;
* ``TCP_KEEPIDLE``;
* ``TCP_KEEPINTVL``;
* ``TCP_KEEPCNT``.
"""
yield (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
@@ -939,7 +986,7 @@ def enable_keepalive(sock: socket.socket, timeout: int, idle: int, cnt: int = 3)
:param idle: value for ``TCP_KEEPIDLE``.
:param cnt: value for ``TCP_KEEPCNT``.
:returns: output of :func:`socket.ioctl` if we are on Windows, nothing otherwise.
:returns: output of :func:`~socket.ioctl` if we are on Windows, nothing otherwise.
"""
SIO_KEEPALIVE_VALS = getattr(socket, 'SIO_KEEPALIVE_VALS', None)
if SIO_KEEPALIVE_VALS is not None: # Windows
@@ -953,23 +1000,27 @@ def enable_keepalive(sock: socket.socket, timeout: int, idle: int, cnt: int = 3)
def unquote(string: str) -> str:
"""Unquote a fully quoted *string*.
:param string: The string to be checked for quoting.
:returns: The string with quotes removed, if it is a fully quoted single string, or the original string if quoting
is not detected, or unquoting was not possible.
:Examples:
A *string* with quotes will have those quotes removed
>>> unquote('"a quoted string"')
'a quoted string'
A *string* with multiple quotes will be returned as is
>>> unquote('"a multi" "quoted string"')
'"a multi" "quoted string"'
So will a *string* with unbalanced quotes
>>> unquote('unbalanced "quoted string')
'unbalanced "quoted string'
:param string: The string to be checked for quoting.
:returns: The string with quotes removed, if it is a fully quoted single string,
or the original string if quoting is not detected, or unquoting was not possible.
"""
try:
ret = split(string)
@@ -977,3 +1028,36 @@ def unquote(string: str) -> str:
except ValueError:
ret = string
return ret
def get_major_version(bin_dir: Optional[str] = None, bin_name: str = 'postgres') -> str:
"""Get the major version of PostgreSQL.
It is based on the output of ``postgres --version``.
:param bin_dir: path to the PostgreSQL binaries directory. If ``None`` or an empty string, it will use the first
*bin_name* binary that is found by the subprocess in the ``PATH``.
:param bin_name: name of the postgres binary to call (``postgres`` by default)
:returns: the PostgreSQL major version.
:raises:
:exc:`~patroni.exceptions.PatroniException`: if the postgres binary call failed due to :exc:`OSError`.
:Example:
* Returns `9.6` for PostgreSQL 9.6.24
* Returns `15` for PostgreSQL 15.2
"""
if not bin_dir:
binary = bin_name
else:
binary = os.path.join(bin_dir, bin_name)
try:
version = subprocess.check_output([binary, '--version']).decode()
except OSError as e:
raise PatroniException(f'Failed to get postgres version: {e}')
version = re.match(r'^[^\s]+ [^\s]+ (\d+)(\.(\d+))?', version)
if TYPE_CHECKING: # pragma: no cover
assert version is not None
return '.'.join([version.group(1), version.group(3)]) if int(version.group(1)) < 10 else version.group(1)

View File

@@ -3,26 +3,26 @@
This module contains facilities for validating configuration of Patroni processes.
:var schema: configuration schema of the daemon launched by `patroni` command.
:var schema: configuration schema of the daemon launched by ``patroni`` command.
"""
import os
import re
import shutil
import socket
import subprocess
from typing import Any, Dict, Union, Iterator, List, Optional as OptionalType, Tuple, TYPE_CHECKING
from typing import Any, Dict, Union, Iterator, List, Optional as OptionalType, Tuple
from .collections import CaseInsensitiveSet
from .dcs import dcs_modules
from .exceptions import ConfigParseError
from .utils import parse_int, split_host_port, data_directory_is_empty
from .utils import parse_int, split_host_port, data_directory_is_empty, get_major_version
def data_directory_empty(data_dir: str) -> bool:
"""Check if PostgreSQL data directory is empty.
:param data_dir: path to the PostgreSQL data directory to be checked.
:returns: ``True`` if the data directory is empty.
"""
if os.path.isfile(os.path.join(data_dir, "global", "pg_control")):
@@ -33,12 +33,14 @@ def data_directory_empty(data_dir: str) -> bool:
def validate_connect_address(address: str) -> bool:
"""Check if options related to connection address were properly configured.
:param address: address to be validated in the format
``host:ip``.
:param address: address to be validated in the format ``host:ip``.
:returns: ``True`` if the address is valid.
:raises :class:`patroni.exceptions.ConfigParseError`:
* If the address is not in the expected format; or
* If the host is set to not allowed values (``127.0.0.1``, ``0.0.0.0``, ``*``, ``::1``, or ``localhost``).
:raises:
:class:`~patroni.exceptions.ConfigParseError`:
* If the address is not in the expected format; or
* If the host is set to not allowed values (``127.0.0.1``, ``0.0.0.0``, ``*``, ``::1``, or ``localhost``).
"""
try:
host, _ = split_host_port(address, 1)
@@ -52,20 +54,25 @@ def validate_connect_address(address: str) -> bool:
def validate_host_port(host_port: str, listen: bool = False, multiple_hosts: bool = False) -> bool:
"""Check if host(s) and port are valid and available for usage.
:param host_port: the host(s) and port to be validated. It can be in either of these formats
:param host_port: the host(s) and port to be validated. It can be in either of these formats:
* ``host:ip``, if *multiple_hosts* is ``False``; or
* ``host_1,host_2,...,host_n:port``, if *multiple_hosts* is ``True``.
:param listen: if the address is expected to be available for binding. ``False`` means it expects to connect to that
address, and ``True`` that it expects to bind to that address.
:param multiple_hosts: if *host_port* can contain multiple hosts.
:returns: ``True`` if the host(s) and port are valid.
:raises: :class:`patroni.exceptions.ConfigParserError`:
* If the *host_port* is not in the expected format; or
* If ``*`` was specified along with more hosts in *host_port*; or
* If we are expecting to bind to an address that is already in use; or
* If we are not able to connect to an address that we are expecting to do so; or
* If :class:`socket.gaierror` is thrown by socket module when attempting to connect to the given address(es).
:raises:
:class:`~patroni.exceptions.ConfigParseError`:
* If the *host_port* is not in the expected format; or
* If ``*`` was specified along with more hosts in *host_port*; or
* If we are expecting to bind to an address that is already in use; or
* If we are not able to connect to an address that we are expecting to do so; or
* If :class:`~socket.gaierror` is thrown by socket module when attempting to connect to the given
address(es).
"""
try:
hosts, port = split_host_port(host_port, 1)
@@ -105,6 +112,7 @@ def validate_host_port_list(value: List[str]) -> bool:
Call :func:`validate_host_port` with each item in *value*.
:param value: list of host(s) and port items to be validated.
:returns: ``True`` if all items are valid.
"""
assert all([validate_host_port(v) for v in value]), "didn't pass the validation"
@@ -117,6 +125,7 @@ def comma_separated_host_port(string: str) -> bool:
Call :func:`validate_host_port_list` with a list represented by the CSV *string*.
:param string: comma-separated list of host and port items.
:returns: ``True`` if all items in the CSV string are valid.
"""
return validate_host_port_list([s.strip() for s in string.split(",")])
@@ -128,7 +137,7 @@ def validate_host_port_listen(host_port: str) -> bool:
Call :func:`validate_host_port` with *listen* set to ``True``.
:param host_port: the host and port to be validated. Must be in the format
`host:ip`.
``host:ip``.
:returns: ``True`` if the host and port are valid and available for binding.
"""
@@ -141,8 +150,9 @@ def validate_host_port_listen_multiple_hosts(host_port: str) -> bool:
Call :func:`validate_host_port` with both *listen* and *multiple_hosts* set to ``True``.
:param host_port: the host(s) and port to be validated. It can be in either of these formats
* `host:ip`; or
* `host_1,host_2,...,host_n:port`
* ``host:ip``; or
* ``host_1,host_2,...,host_n:port``
:returns: ``True`` if the host(s) and port are valid and available for binding.
"""
@@ -153,8 +163,11 @@ def is_ipv4_address(ip: str) -> bool:
"""Check if *ip* is a valid IPv4 address.
:param ip: the IP to be checked.
:returns: ``True`` if the IP is an IPv4 address.
:raises :class:`patroni.exceptions.ConfigParserError`: if *ip* is not a valid IPv4 address.
:raises:
:class:`~patroni.exceptions.ConfigParseError`: if *ip* is not a valid IPv4 address.
"""
try:
socket.inet_aton(ip)
@@ -167,8 +180,11 @@ def is_ipv6_address(ip: str) -> bool:
"""Check if *ip* is a valid IPv6 address.
:param ip: the IP to be checked.
:returns: ``True`` if the IP is an IPv6 address.
:raises :class:`patroni.exceptions.ConfigParserError`: if *ip* is not a valid IPv6 address.
:raises:
:class:`~patroni.exceptions.ConfigParseError`: if *ip* is not a valid IPv6 address.
"""
try:
socket.inet_pton(socket.AF_INET6, ip)
@@ -187,31 +203,6 @@ def get_bin_name(bin_name: str) -> str:
return (schema.data.get('postgresql', {}).get('bin_name', {}) or {}).get(bin_name, bin_name)
def get_major_version(bin_dir: OptionalType[str] = None) -> str:
"""Get the major version of PostgreSQL.
It is based on the output of ``postgres --version``.
:param bin_dir: path to PostgreSQL binaries directory. If ``None`` it will use the first ``postgres`` binary that
is found by subprocess in the ``PATH``.
:returns: the PostgreSQL major version.
:Example:
* Returns `9.6` for PostgreSQL 9.6.24
* Returns `15` for PostgreSQL 15.2
"""
if not bin_dir:
binary = get_bin_name('postgres')
else:
binary = os.path.join(bin_dir, get_bin_name('postgres'))
version = subprocess.check_output([binary, '--version']).decode()
version = re.match(r'^[^\s]+ [^\s]+ (\d+)(\.(\d+))?', version)
if TYPE_CHECKING: # pragma: no cover
assert version is not None
return '.'.join([version.group(1), version.group(3)]) if int(version.group(1)) < 10 else version.group(1)
def validate_data_dir(data_dir: str) -> bool:
"""Validate the value of ``postgresql.data_dir`` configuration option.
@@ -222,14 +213,17 @@ def validate_data_dir(data_dir: str) -> bool:
* Point to a non-empty directory that seems to contain a valid PostgreSQL data directory.
:param data_dir: the value of ``postgresql.data_dir`` configuration option.
:returns: ``True`` if the PostgreSQL data directory is valid.
:raises :class:`patroni.exceptions.ConfigParserError`:
* If no *data_dir* was given; or
* If *data_dir* is a file and not a directory; or
* If *data_dir* is a non-empty directory and:
* ``PG_VERSION`` file is not available in the directory
* ``pg_wal``/``pg_xlog`` is not available in the directory
* ``PG_VERSION`` content does not match the major version reported by ``postgres --version``
:raises:
:class:`~patroni.exceptions.ConfigParseError`:
* If no *data_dir* was given; or
* If *data_dir* is a file and not a directory; or
* If *data_dir* is a non-empty directory and:
* ``PG_VERSION`` file is not available in the directory
* ``pg_wal``/``pg_xlog`` is not available in the directory
* ``PG_VERSION`` content does not match the major version reported by ``postgres --version``
"""
if not data_dir:
raise ConfigParseError("is an empty string")
@@ -246,7 +240,7 @@ def validate_data_dir(data_dir: str) -> bool:
raise ConfigParseError("data dir for the cluster is not empty, but doesn't contain"
" \"{}\" directory".format(waldir))
bin_dir = schema.data.get("postgresql", {}).get("bin_dir", None)
major_version = get_major_version(bin_dir)
major_version = get_major_version(bin_dir, get_bin_name('postgres'))
if pgversion != major_version:
raise ConfigParseError("data_dir directory postgresql version ({}) doesn't match with "
"'postgres --version' output ({})".format(pgversion, major_version))
@@ -270,11 +264,12 @@ def validate_binary_name(bin_name: str) -> bool:
:returns: ``True`` if the conditions are true
:raises :class:`patroni.exceptions.ConfigParserError`: if:
* *bin_name* is not set; or
* the path join of the ``postgresql.bin_dir`` plus *bin_name* does not exist; or
* the path join as above is not executable; or
* the *bin_name* cannot be found in the system PATH
:raises:
:class:`~patroni.exceptions.ConfigParseError` if:
* *bin_name* is not set; or
* the path join of the ``postgresql.bin_dir`` plus *bin_name* does not exist; or
* the path join as above is not executable; or
* the *bin_name* cannot be found in the system PATH
"""
if not bin_name:
@@ -301,7 +296,7 @@ class Result(object):
.. note::
``error`` attribute is only set if ``status`` is failed.
``error`` attribute is only set if *status* is failed.
:param status: if the validation succeeded.
:param error: error message related to the validation that was performed, if the validation failed.
@@ -348,8 +343,8 @@ class Case(object):
"url": str,
})
That will check that ``host`` configuration, if given, is valid based on ``validate_host_port`` function, and
will also check that ``url`` configuration, if given, is a ``str`` instance.
That will check that ``host`` configuration, if given, is valid based on :func:`validate_host_port`, and will
also check that ``url`` configuration, if given, is a ``str`` instance.
"""
self._schema = schema
@@ -375,7 +370,7 @@ class Or(object):
The outer :class:`Or` is used to define that ``host`` and ``hosts`` are possible options in this scope.
The inner :class`Or` in the ``hosts`` key value is used to define that ``hosts`` option is valid if either of
the functions ``comma_separated_host_port`` or ``validate_host_port`` succeed to validate it.
:func:`comma_separated_host_port` or :func:`validate_host_port` succeed to validate it.
"""
self.args = args
@@ -417,12 +412,12 @@ class Directory(object):
self.contains_executable = contains_executable
def _check_executables(self, path: OptionalType[str] = None) -> Iterator[Result]:
"""Check that all executables from contains_executable list exist within the given directory or within PATH.
"""Check that all executables from contains_executable list exist within the given directory or within ``PATH``.
:param path: optional path to the base directory against which executables will be validated.
If not provided, check within PATH.
:rtype: Iterator[:class:`Result`] objects with the error message containing the name of the executable,
if any check fails.
If not provided, check within ``PATH``.
:yields: objects with the error message containing the name of the executable, if any check fails.
"""
for program in self.contains_executable or []:
if not shutil.which(program, path=path):
@@ -432,8 +427,9 @@ class Directory(object):
"""Check if the expected paths and executables can be found under *name* directory.
:param name: path to the base directory against which paths and executables will be validated.
Check against PATH if name is not provided.
:rtype: Iterator[:class:`Result`] objects with the error message related to the failure, if any check fails.
Check against ``PATH`` if name is not provided.
:yields: objects with the error message related to the failure, if any check fails.
"""
if not name:
yield from self._check_executables()
@@ -481,12 +477,13 @@ class Schema(object):
be performed against each one of them. The validations will be performed whenever the :class:`Schema` object is
called, or its :func:`validate` method is called.
:ivar validator: validator of the configuration schema. Can be any of these
:ivar validator: validator of the configuration schema. Can be any of these:
* :class:`str`: defines that a string value is required; or
* :class:`type`: any subclass of `type`, defines that a value of the given type is required; or
* `callable`: any callable object, defines that validation will follow the code defined in the callable
object. If the callable object contains an ``expected_type`` attribute, then it will check if the
configuration value is of the expected type before calling the code of the callable object; or
* :class:`type`: any subclass of :class:`type`, defines that a value of the given type is required; or
* ``callable``: any callable object, defines that validation will follow the code defined in the callable
object. If the callable object contains an ``expected_type`` attribute, then it will check if the
configuration value is of the expected type before calling the code of the callable object; or
* :class:`list`: list representing one or more values in the configuration; or
* :class:`dict`: dictionary representing the YAML configuration tree.
"""
@@ -503,11 +500,12 @@ class Schema(object):
nodes, when it performs checks of the actual setting values.
:param validator: validator of the configuration schema. Can be any of these:
* :class:`str`: defines that a string value is required; or
* :class:`type`: any subclass of :class:`type`, defines that a value of the given type is required; or
* `callable`: Any callable object, defines that validation will follow the code defined in the callable
object. If the callable object contains an ``expected_type`` attribute, then it will check if the
configuration value is of the expected type before calling the code of the callable object; or
* ``callable``: Any callable object, defines that validation will follow the code defined in the callable
object. If the callable object contains an ``expected_type`` attribute, then it will check if the
configuration value is of the expected type before calling the code of the callable object; or
* :class:`list`: list representing it expects to contain one or more values in the configuration; or
* :class:`dict`: dictionary representing the YAML configuration tree.
@@ -515,18 +513,22 @@ class Schema(object):
to stop.
If *validator* is a :class:`dict`, then you should follow these rules:
* For the keys it can be either:
* A :class:`str` instance. It will be the name of the configuration option; or
* An :class:`Optional` instance. The ``name`` attribute of that object will be the name of the
configuration option, and that class makes this configuration option as optional to the
user, allowing it to not be specified in the YAML; or
configuration option, and that class makes this configuration option as optional to the
user, allowing it to not be specified in the YAML; or
* An :class:`Or` instance. The ``args`` attribute of that object will contain a tuple of
configuration option names. At least one of them should be specified by the user in the YAML;
* For the values it can be either:
* A new :class:`dict` instance. It will represent a new level in the YAML configuration tree; or
* A :class:`Case` instance. This is required if the key of this value is an :class:`Or` instance,
and the :class:`Case` instance is used to map each of the ``args`` in :class:`Or` to their
corresponding base validator in :class:`Case`; or
and the :class:`Case` instance is used to map each of the ``args`` in :class:`Or` to their
corresponding base validator in :class:`Case`; or
* An :class:`Or` instance with one or more base validators; or
* A :class:`list` instance with a single item which is the base validator; or
* A base validator.
@@ -549,15 +551,16 @@ class Schema(object):
})
This sample schema defines that your YAML configuration follows these rules:
* It must contain an ``application_name`` entry which value should be a :class:`str` instance;
* It must contain a ``bind.host`` entry which value should be valid as per function ``validate_host``;
* It must contain a ``bind.port`` entry which value should be an :class:`int` instance;
* It must contain a ``aliases`` entry which value should be a :class:`list` of :class:`str` instances;
* It may optionally contain a ``data_directory`` entry, with a value which should be a string;
* It must contain at least one of ``log_to_file`` or ``log_to_db``, with a value which should be a
:class:`bool` instance;
* It must contain a ``version`` entry which value should be either an :class:`int` or a :class:`float`
instance.
* It must contain an ``application_name`` entry which value should be a :class:`str` instance;
* It must contain a ``bind.host`` entry which value should be valid as per function ``validate_host``;
* It must contain a ``bind.port`` entry which value should be an :class:`int` instance;
* It must contain a ``aliases`` entry which value should be a :class:`list` of :class:`str` instances;
* It may optionally contain a ``data_directory`` entry, with a value which should be a string;
* It must contain at least one of ``log_to_file`` or ``log_to_db``, with a value which should be a
:class:`bool` instance;
* It must contain a ``version`` entry which value should be either an :class:`int` or a :class:`float`
instance.
"""
self.validator = validator
@@ -565,6 +568,7 @@ class Schema(object):
"""Perform validation of data using the rules defined in this schema.
:param data: configuration to be validated against ``validator``.
:returns: list of errors identified while validating the *data*, if any.
"""
errors: List[str] = []
@@ -579,14 +583,15 @@ class Schema(object):
It first checks that *data* argument type is compliant with the type of ``validator`` attribute.
Additionally:
* If ``validator`` attribute is a callable object, calls it to validate *data* argument. Before doing so, if
`validator` contains an ``expected_type`` attribute, check if *data* argument is compliant with that
expected type.
* If ``validator`` attribute is an iterable object (:class:`dict`, :class:`list`, :class:`Directory` or
:class:`Or`), then it iterates over it to validate each of the corresponding entries in *data* argument.
* If ``validator`` attribute is a callable object, calls it to validate *data* argument. Before doing so, if
`validator` contains an ``expected_type`` attribute, check if *data* argument is compliant with that
expected type.
* If ``validator`` attribute is an iterable object (:class:`dict`, :class:`list`, :class:`Directory` or
:class:`Or`), then it iterates over it to validate each of the corresponding entries in *data* argument.
:param data: configuration to be validated against ``validator``.
:rtype: Iterator[:class:`Result`] objects with the error message related to the failure, if any check fails.
:yields: objects with the error message related to the failure, if any check fails.
"""
self.data = data
@@ -627,7 +632,7 @@ class Schema(object):
Only :class:`dict`, :class:`list`, :class:`Directory` and :class:`Or` objects are considered iterable objects.
:rtype: Iterator[:class:`Result`] objects with the error message related to the failure, if any check fails.
:yields: objects with the error message related to the failure, if any check fails.
"""
if isinstance(self.validator, dict):
if not isinstance(self.data, dict):
@@ -655,7 +660,7 @@ class Schema(object):
def iter_dict(self) -> Iterator[Result]:
"""Iterate over a :class:`dict` based ``validator`` to validate the corresponding entries in ``data``.
:rtype: Iterator[:class:`Result`] objects with the error message related to the failure, if any check fails.
:yields: objects with the error message related to the failure, if any check fails.
"""
# One key in `validator` attribute (`key` variable) can be mapped to one or more keys in `data` attribute (`d`
# variable), depending on the `key` type.
@@ -678,12 +683,12 @@ class Schema(object):
path=(d + ("." + v.path if v.path else "")), level=v.level, data=v.data)
def iter_or(self) -> Iterator[Result]:
"""Perform all validations defined in an `Or` object for a given configuration option.
"""Perform all validations defined in an :class:`Or` object for a given configuration option.
This method can be only called against leaf nodes in the configuration tree. :class:`Or` objects defined in the
``validator`` keys will be handled by :func:`iter_dict` method.
:rtype: Iterator[:class:`Result`] objects with the error message related to the failure, if any check fails.
:yields: objects with the error message related to the failure, if any check fails.
"""
results: List[Result] = []
for a in self.validator.args:
@@ -709,7 +714,7 @@ class Schema(object):
:param key: key from the ``validator`` attribute.
:rtype: Iterator[str], keys that should be used to access corresponding value in the ``data`` attribute.
:yields: keys that should be used to access corresponding value in the ``data`` attribute.
"""
# If the key was defined as a `str` object in `validator` attribute, then it is already the final key to access
# the `data` dictionary.
@@ -736,12 +741,11 @@ class Schema(object):
def _get_type_name(python_type: Any) -> str:
"""Get a user friendly name for a given Python type.
"""Get a user-friendly name for a given Python type.
:param python_type: Python type which user friendly name should be taken.
Returns:
User friendly name of the given Python type.
:returns: User friendly name of the given Python type.
"""
types: Dict[Any, str] = {str: 'a string', int: 'an integer', float: 'a number',
bool: 'a boolean', list: 'an array', dict: 'a dictionary'}
@@ -762,11 +766,11 @@ def assert_(condition: bool, message: str = "Wrong value") -> None:
class IntValidator(object):
"""Validate an integer setting.
:cvar expected_type: the expect Python type for an integer setting (:class:`int`).
:cvar expected_type: the expected Python type for an integer setting (:class:`int`).
:ivar min: minimum allowed value for the setting, if any.
:ivar max: maximum allowed value for the setting, if any.
:ivar base_unit: the base unit to convert the value to before checking if it's within `min` and `max` range.
:ivar raise_assert: if an ``assert`` call should be performed regarding expected type and valid range.
:ivar base_unit: the base unit to convert the value to before checking if it's within *min* and *max* range.
:ivar raise_assert: if an ``assert`` test should be performed regarding expected type and valid range.
"""
expected_type = int
@@ -778,7 +782,7 @@ class IntValidator(object):
:param min: minimum allowed value for the setting, if any.
:param max: maximum allowed value for the setting, if any.
:param base_unit: the base unit to convert the value to before checking if it's within *min* and *max* range.
:param raise_assert: if an ``assert`` call should be performed regarding expected type and valid range.
:param raise_assert: if an ``assert`` test should be performed regarding expected type and valid range.
"""
self.min = min
self.max = max
@@ -789,11 +793,13 @@ class IntValidator(object):
"""Check if *value* is a valid integer and within the expected range.
.. note::
If ``raise_assert`` is ``True`` and *value* is not valid, then an ``AssertionError`` will be triggered.
If ``raise_assert`` is ``True`` and *value* is not valid, then an :class:`AssertionError` will be triggered.
:param value: value to be checked against the rules defined for this :class:`IntValidator` instance.
:returns: ``True`` if *value* is valid and within the expected range.
"""
value = parse_int(value, self.base_unit) or ""
value = parse_int(value, self.base_unit)
ret = isinstance(value, int)\
and (self.min is None or value >= self.min)\
and (self.max is None or value <= self.max)

View File

@@ -2,4 +2,4 @@
:var __version__: the current Patroni version.
"""
__version__ = '3.0.4'
__version__ = '3.1.0'

5
requirements.docs.txt Normal file
View File

@@ -0,0 +1,5 @@
sphinx>=4
sphinx_rtd_theme
sphinxcontrib-apidoc
sphinx-github-style
pyyaml

View File

@@ -118,6 +118,21 @@ class MockCursor(object):
'"state":"streaming","sync_state":"async","sync_priority":0}]'
now = datetime.datetime.now(tzutc)
self.results = [(now, 0, '', 0, '', False, now, 'streaming', None, replication_info)]
elif sql.startswith('SELECT name, current_setting(name) FROM pg_settings'):
self.results = [('data_directory', 'data'),
('hba_file', os.path.join('data', 'pg_hba.conf')),
('ident_file', os.path.join('data', 'pg_ident.conf')),
('max_connections', 42),
('max_locks_per_transaction', 73),
('max_prepared_transactions', 0),
('max_replication_slots', 21),
('max_wal_senders', 37),
('track_commit_timestamp', 'off'),
('wal_level', 'replica'),
('listen_addresses', '6.6.6.6'),
('port', 1984),
('archive_command', 'my archive command'),
('cluster_name', 'my_cluster')]
elif sql.startswith('SELECT name, setting'):
self.results = [('wal_segment_size', '2048', '8kB', 'integer', 'internal'),
('wal_block_size', '8192', None, 'integer', 'internal'),
@@ -163,11 +178,20 @@ class MockCursor(object):
pass
class MockConnectionInfo(object):
def parameter_status(self, param_name):
if param_name == 'is_superuser':
return 'on'
return '0'
class MockConnect(object):
server_version = 99999
autocommit = False
closed = 0
info = MockConnectionInfo()
def cursor(self):
return MockCursor(self)

View File

@@ -36,6 +36,7 @@ class MockPostgresql(object):
pending_restart = True
wal_name = 'wal'
lsn_name = 'lsn'
wal_flush = '_flush'
POSTMASTER_START_TIME = 'pg_catalog.pg_postmaster_start_time()'
TL_LSN = 'CASE WHEN pg_catalog.pg_is_in_recovery()'
citus_handler = Mock()

View File

@@ -85,6 +85,7 @@ class TestConfig(unittest.TestCase):
@patch('os.path.exists', Mock(return_value=True))
@patch('os.remove', Mock(side_effect=IOError))
@patch('os.close', Mock(side_effect=IOError))
@patch('os.chmod', Mock())
@patch('shutil.move', Mock(return_value=None))
@patch('json.dump', Mock())
def test_save_cache(self):

View File

@@ -0,0 +1,334 @@
import os
import psutil
import socket
import unittest
from . import MockConnect, MockCursor, MockConnectionInfo
from copy import deepcopy
from mock import MagicMock, Mock, PropertyMock, mock_open, patch
from patroni.__main__ import main as _main
from patroni.config import Config
from patroni.config_generator import AbstractConfigGenerator, get_address
from patroni.utils import patch_config
from . import psycopg_connect
@patch('patroni.psycopg.connect', psycopg_connect)
@patch('socket.getaddrinfo', Mock(return_value=[(0, 0, 0, 0, ('1.9.8.4', 1984))]))
@patch('builtins.open', MagicMock())
@patch('subprocess.check_output', Mock(return_value=b"postgres (PostgreSQL) 16.2"))
@patch('psutil.Process.exe', Mock(return_value='/bin/dir/from/running/postgres'))
@patch('psutil.Process.__init__', Mock(return_value=None))
class TestGenerateConfig(unittest.TestCase):
no_value_msg = '#FIXME'
_HOSTNAME = socket.gethostname()
_IP = sorted(socket.getaddrinfo(_HOSTNAME, 0, socket.AF_UNSPEC, socket.SOCK_STREAM, 0), key=lambda x: x[0])[0][4][0]
def setUp(self):
self.maxDiff = None
os.environ['PATRONI_SCOPE'] = 'scope_from_env'
os.environ['PATRONI_POSTGRESQL_BIN_DIR'] = '/bin/from/env'
os.environ['PATRONI_SUPERUSER_USERNAME'] = 'su_user_from_env'
os.environ['PATRONI_SUPERUSER_PASSWORD'] = 'su_pwd_from_env'
os.environ['PATRONI_REPLICATION_USERNAME'] = 'repl_user_from_env'
os.environ['PATRONI_REPLICATION_PASSWORD'] = 'repl_pwd_from_env'
os.environ['PATRONI_REWIND_USERNAME'] = 'rewind_user_from_env'
os.environ['PGUSER'] = 'pguser_from_env'
os.environ['PGPASSWORD'] = 'pguser_pwd_from_env'
os.environ['PATRONI_RESTAPI_CONNECT_ADDRESS'] = 'localhost:8080'
os.environ['PATRONI_RESTAPI_LISTEN'] = 'localhost:8080'
os.environ['PATRONI_POSTGRESQL_BIN_POSTGRES'] = 'custom_postgres_bin_from_env'
self.environ = deepcopy(os.environ)
dynamic_config = Config.get_default_config()
dynamic_config['postgresql']['parameters'] = dict(dynamic_config['postgresql']['parameters'])
del dynamic_config['standby_cluster']
dynamic_config['postgresql']['parameters']['wal_keep_segments'] = 8
dynamic_config['postgresql']['use_pg_rewind'] = True
self.config = {
'scope': self.environ['PATRONI_SCOPE'],
'name': self._HOSTNAME,
'bootstrap': {
'dcs': dynamic_config
},
'postgresql': {
'connect_address': self.no_value_msg + ':5432',
'data_dir': self.no_value_msg,
'listen': self.no_value_msg + ':5432',
'pg_hba': ['host all all all md5',
f'host replication {self.environ["PATRONI_REPLICATION_USERNAME"]} all md5'],
'authentication': {'superuser': {'username': self.environ['PATRONI_SUPERUSER_USERNAME'],
'password': self.environ['PATRONI_SUPERUSER_PASSWORD']},
'replication': {'username': self.environ['PATRONI_REPLICATION_USERNAME'],
'password': self.environ['PATRONI_REPLICATION_PASSWORD']},
'rewind': {'username': self.environ['PATRONI_REWIND_USERNAME']}},
'bin_dir': self.environ['PATRONI_POSTGRESQL_BIN_DIR'],
'bin_name': {'postgres': self.environ['PATRONI_POSTGRESQL_BIN_POSTGRES']},
'parameters': {'password_encryption': 'md5'}
},
'restapi': {
'connect_address': self.environ['PATRONI_RESTAPI_CONNECT_ADDRESS'],
'listen': self.environ['PATRONI_RESTAPI_LISTEN']
}
}
def _set_running_instance_config_vals(self):
# values are taken from tests/__init__.py
conf = {
'scope': 'my_cluster',
'bootstrap': {
'dcs': {
'postgresql': {
'parameters': {
'max_connections': 42,
'max_locks_per_transaction': 73,
'max_replication_slots': 21,
'max_wal_senders': 37,
'wal_level': 'replica',
'wal_keep_segments': None
},
'use_pg_rewind': None
}
}
},
'postgresql': {
'connect_address': f'{self._IP}:bar',
'listen': '6.6.6.6:1984',
'data_dir': 'data',
'bin_dir': '/bin/dir/from/running',
'parameters': {
'archive_command': 'my archive command',
'hba_file': os.path.join('data', 'pg_hba.conf'),
'ident_file': os.path.join('data', 'pg_ident.conf'),
'password_encryption': None
},
'authentication': {
'superuser': {
'username': 'foobar',
'password': 'qwerty',
'channel_binding': 'prefer',
'gssencmode': 'prefer',
'sslmode': 'prefer'
},
'replication': {
'username': self.no_value_msg,
'password': self.no_value_msg
},
'rewind': None
},
}
}
patch_config(self.config, conf)
def _get_running_instance_open_res(self):
hba_content = '\n'.join(self.config['postgresql']['pg_hba'] + ['#host all all all md5',
' host all all all md5',
'',
'hostall all all md5'])
ident_content = '\n'.join(['# something very interesting', ' '])
self.config['postgresql']['pg_hba'] += ['host all all all md5']
return [
mock_open(read_data=hba_content)(),
mock_open(read_data=ident_content)(),
mock_open(read_data='1984')(),
mock_open()()
]
@patch('os.makedirs')
@patch('yaml.safe_dump')
def test_generate_sample_config_pre_13_dir_creation(self, mock_config_dump, mock_makedir):
with patch('sys.argv', ['patroni.py', '--generate-sample-config', '/foo/bar.yml']), \
patch('subprocess.check_output', Mock(return_value=b"postgres (PostgreSQL) 9.4.3")) as pg_bin_mock, \
self.assertRaises(SystemExit) as e:
_main()
self.assertEqual(e.exception.code, 0)
self.assertEqual(self.config, mock_config_dump.call_args[0][0])
mock_makedir.assert_called_once()
pg_bin_mock.assert_called_once_with([os.path.join(self.environ['PATRONI_POSTGRESQL_BIN_DIR'],
self.environ['PATRONI_POSTGRESQL_BIN_POSTGRES']),
'--version'])
@patch('os.makedirs', Mock())
@patch('yaml.safe_dump')
def test_generate_sample_config_16(self, mock_config_dump):
conf = {
'bootstrap': {
'dcs': {
'postgresql': {
'parameters': {
'wal_keep_size': '128MB',
'wal_keep_segments': None
},
}
}
},
'postgresql': {
'parameters': {
'password_encryption': 'scram-sha-256'
},
'pg_hba': ['host all all all scram-sha-256',
f'host replication {self.environ["PATRONI_REPLICATION_USERNAME"]} all scram-sha-256'],
'authentication': {
'rewind': {
'username': self.environ['PATRONI_REWIND_USERNAME'],
'password': self.no_value_msg}
},
}
}
patch_config(self.config, conf)
with patch('sys.argv', ['patroni.py', '--generate-sample-config', '/foo/bar.yml']), \
self.assertRaises(SystemExit) as e:
_main()
self.assertEqual(e.exception.code, 0)
self.assertEqual(self.config, mock_config_dump.call_args[0][0])
@patch('os.makedirs', Mock())
@patch('yaml.safe_dump')
def test_generate_config_running_instance_16(self, mock_config_dump):
self._set_running_instance_config_vals()
with patch('builtins.open', Mock(side_effect=self._get_running_instance_open_res())), \
patch('sys.argv', ['patroni.py', '--generate-config',
'--dsn', 'host=foo port=bar user=foobar password=qwerty']), \
self.assertRaises(SystemExit) as e:
_main()
self.assertEqual(e.exception.code, 0)
self.assertEqual(self.config, mock_config_dump.call_args[0][0])
@patch('os.makedirs', Mock())
@patch('yaml.safe_dump')
def test_generate_config_running_instance_16_connect_from_env(self, mock_config_dump):
self._set_running_instance_config_vals()
# su auth params and connect host from env
os.environ['PGCHANNELBINDING'] = \
self.config['postgresql']['authentication']['superuser']['channel_binding'] = 'disable'
conf = {
'scope': 'my_cluster',
'bootstrap': {
'dcs': {
'postgresql': {
'parameters': {
'max_connections': 42,
'max_locks_per_transaction': 73,
'max_replication_slots': 21,
'max_wal_senders': 37,
'wal_level': 'replica',
'wal_keep_segments': None
},
'use_pg_rewind': None
}
}
},
'postgresql': {
'connect_address': f'{self._IP}:1984',
'authentication': {
'superuser': {
'username': self.environ['PGUSER'],
'password': self.environ['PGPASSWORD'],
'gssencmode': None,
'sslmode': None
},
},
}
}
patch_config(self.config, conf)
with patch('builtins.open', Mock(side_effect=self._get_running_instance_open_res())), \
patch('sys.argv', ['patroni.py', '--generate-config']), \
patch.object(MockConnect, 'server_version', PropertyMock(return_value=160000)), \
self.assertRaises(SystemExit) as e:
_main()
self.assertEqual(e.exception.code, 0)
self.assertEqual(self.config, mock_config_dump.call_args[0][0])
def test_generate_config_running_instance_errors(self):
# 1. Wrong DSN format
with patch('sys.argv', ['patroni.py', '--generate-config', '--dsn', 'host:foo port:bar user:foobar']), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Failed to parse DSN string', e.exception.code)
# 2. User is not a superuser
with patch('sys.argv', ['patroni.py',
'--generate-config', '--dsn', 'host=foo port=bar user=foobar password=pwd_from_dsn']), \
patch.object(MockCursor, 'rowcount', PropertyMock(return_value=0), create=True), \
patch.object(MockConnectionInfo, 'parameter_status', Mock(return_value='off')), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('The provided user does not have superuser privilege', e.exception.code)
# 3. Error while calling postgres --version
with patch('subprocess.check_output', Mock(side_effect=OSError)), \
patch('sys.argv', ['patroni.py', '--generate-sample-config']), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Failed to get postgres version:', e.exception.code)
with patch('sys.argv', ['patroni.py', '--generate-config']):
# 4. empty postmaster.pid
with patch('builtins.open', Mock(side_effect=[mock_open(read_data='hba_content')(),
mock_open(read_data='ident_content')(),
mock_open(read_data='')()])), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Failed to obtain postmaster pid from postmaster.pid file', e.exception.code)
# 5. Failed to open postmaster.pid
with patch('builtins.open', Mock(side_effect=[mock_open(read_data='hba_content')(),
mock_open(read_data='ident_content')(),
OSError])), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Error while reading postmaster.pid file', e.exception.code)
# 6. Invalid postmaster pid
with patch('builtins.open', Mock(side_effect=[mock_open(read_data='hba_content')(),
mock_open(read_data='ident_content')(),
mock_open(read_data='1984')()])), \
patch('psutil.Process.__init__', Mock(return_value=None)), \
patch('psutil.Process.exe', Mock(side_effect=psutil.NoSuchProcess(1984))), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn("Obtained postmaster pid doesn't exist", e.exception.code)
# 7. Failed to open pg_hba
with patch('builtins.open', Mock(side_effect=OSError)), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Failed to read pg_hba.conf', e.exception.code)
# 8. Failed to open pg_ident
with patch('builtins.open', Mock(side_effect=[mock_open(read_data='hba_content')(), OSError])), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Failed to read pg_ident.conf', e.exception.code)
# 9. Failed PG connecttion
from . import psycopg
with patch('patroni.psycopg.connect', side_effect=psycopg.Error), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Failed to establish PostgreSQL connection', e.exception.code)
# 10. An unexpected error
with patch.object(AbstractConfigGenerator, '__init__', side_effect=psycopg.Error), \
self.assertRaises(SystemExit) as e:
_main()
self.assertIn('Unexpected exception', e.exception.code)
def test_get_address(self):
with patch('socket.getaddrinfo', Mock(side_effect=Exception)), \
patch('logging.warning') as mock_warning:
self.assertEqual(get_address(), (self.no_value_msg, self.no_value_msg))
self.assertIn('Failed to obtain address: %r', mock_warning.call_args_list[0][0])

View File

@@ -197,9 +197,10 @@ class TestConsul(unittest.TestCase):
@patch.object(consul.Consul.KV, 'delete', Mock(return_value=True))
def test_delete_leader(self):
self.c.delete_leader()
leader = self.c.get_cluster().leader
self.c.delete_leader(leader)
self.c._name = 'other'
self.c.delete_leader()
self.c.delete_leader(leader)
@patch.object(consul.Consul.KV, 'put', Mock(return_value=True))
def test_initialize(self):

View File

@@ -313,7 +313,7 @@ class TestEtcd(unittest.TestCase):
self.assertFalse(self.etcd.cancel_initialization())
def test_delete_leader(self):
self.assertFalse(self.etcd.delete_leader())
self.assertFalse(self.etcd.delete_leader(self.etcd.get_cluster().leader))
def test_delete_cluster(self):
self.assertFalse(self.etcd.delete_cluster())

View File

@@ -298,9 +298,10 @@ class TestEtcd3(BaseTestEtcd3):
self.etcd3.cancel_initialization()
def test_delete_leader(self):
self.etcd3.delete_leader()
leader = self.etcd3.get_cluster().leader
self.etcd3.delete_leader(leader)
self.etcd3._name = 'other'
self.etcd3.delete_leader()
self.etcd3.delete_leader(leader)
def test_delete_cluster(self):
self.etcd3.delete_cluster()
@@ -312,7 +313,7 @@ class TestEtcd3(BaseTestEtcd3):
self.etcd3.set_sync_state_value('', 1)
def test_delete_sync_state(self):
self.etcd3.delete_sync_state()
self.etcd3.delete_sync_state('1')
def test_watch(self):
self.etcd3.set_ttl(10)

33
tests/test_file_perm.py Normal file
View File

@@ -0,0 +1,33 @@
import unittest
import stat
from mock import Mock, patch
from patroni.file_perm import pg_perm
class TestFilePermissions(unittest.TestCase):
@patch('os.stat')
@patch('os.umask')
@patch('patroni.file_perm.logger.error')
def test_set_umask(self, mock_logger, mock_umask, mock_stat):
mock_umask.side_effect = Exception
mock_stat.return_value.st_mode = stat.S_IRWXU | stat.S_IRGRP | stat.S_IXGRP
pg_perm.set_permissions_from_data_directory('test')
# umask is called with PG_MODE_MASK_GROUP
self.assertEqual(mock_umask.call_args[0][0], stat.S_IWGRP | stat.S_IRWXO)
self.assertEqual(mock_logger.call_args[0][0], 'Can not set umask to %03o: %r')
mock_umask.reset_mock()
mock_stat.return_value.st_mode = stat.S_IRWXU
pg_perm.set_permissions_from_data_directory('test')
# umask is called with PG_MODE_MASK_OWNER (permissions changed from group to owner)
self.assertEqual(mock_umask.call_args[0][0], stat.S_IRWXG | stat.S_IRWXO)
@patch('os.stat', Mock(side_effect=FileNotFoundError))
@patch('patroni.file_perm.logger.error')
def test_set_permissions_from_data_directory(self, mock_logger):
pg_perm.set_permissions_from_data_directory('test')
self.assertEqual(mock_logger.call_args[0][0], 'Can not check permissions on %s: %r')

View File

@@ -162,7 +162,7 @@ def run_async(self, func, args=()):
@patch.object(Postgresql, 'is_running', Mock(return_value=MockPostmaster()))
@patch.object(Postgresql, 'is_leader', Mock(return_value=True))
@patch.object(Postgresql, 'is_primary', Mock(return_value=True))
@patch.object(Postgresql, 'timeline_wal_position', Mock(return_value=(1, 10, 1)))
@patch.object(Postgresql, '_cluster_info_state_get', Mock(return_value=10))
@patch.object(Postgresql, 'data_directory_empty', Mock(return_value=False))
@@ -224,7 +224,7 @@ class TestHa(PostgresInit):
@patch.object(Postgresql, 'received_timeline', Mock(return_value=None))
def test_touch_member(self):
self.p._major_version = 110000
self.p.is_leader = false
self.p.is_primary = false
self.p.timeline_wal_position = Mock(return_value=(0, 1, 0))
self.p.replica_cached_timeline = Mock(side_effect=Exception)
with patch.object(Postgresql, '_cluster_info_state_get', Mock(return_value='streaming')):
@@ -320,7 +320,7 @@ class TestHa(PostgresInit):
@patch.object(Rewind, 'rewind_or_reinitialize_needed_and_possible', Mock(return_value=True))
@patch.object(Rewind, 'can_rewind', PropertyMock(return_value=True))
def test_crash_recovery_before_rewind(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.is_running = false
self.p.controldata = lambda: {'Database cluster state': 'in archive recovery',
'Database system identifier': SYSID}
@@ -365,7 +365,7 @@ class TestHa(PostgresInit):
@patch.object(Cluster, 'is_unlocked', Mock(return_value=False))
def test_start_as_readonly(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.is_healthy = true
self.ha.has_lock = true
self.p.controldata = lambda: {'Database cluster state': 'in production', 'Database system identifier': SYSID}
@@ -383,11 +383,11 @@ class TestHa(PostgresInit):
def test_promoted_by_acquiring_lock(self):
self.ha.is_healthiest_node = true
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'promoted self to leader by acquiring session lock')
def test_promotion_cancelled_after_pre_promote_failed(self):
self.p.is_leader = false
self.p.is_primary = false
self.p._pre_promote = false
self.ha._is_healthiest_node = true
self.assertEqual(self.ha.run_cycle(), 'promoted self to leader by acquiring session lock')
@@ -402,7 +402,7 @@ class TestHa(PostgresInit):
@patch.object(Cluster, 'is_unlocked', Mock(return_value=False))
def test_long_promote(self):
self.ha.has_lock = true
self.p.is_leader = false
self.p.is_primary = false
self.p.set_role('primary')
self.assertEqual(self.ha.run_cycle(), 'no action. I am (postgresql0), the leader with the lock')
@@ -413,7 +413,7 @@ class TestHa(PostgresInit):
def test_follow_new_leader_after_failing_to_obtain_lock(self):
self.ha.is_healthiest_node = true
self.ha.acquire_lock = false
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'following new leader after trying and failing to obtain lock')
def test_demote_because_not_healthiest(self):
@@ -422,21 +422,20 @@ class TestHa(PostgresInit):
def test_follow_new_leader_because_not_healthiest(self):
self.ha.is_healthiest_node = false
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'following a different leader because i am not the healthiest node')
@patch.object(Cluster, 'is_unlocked', Mock(return_value=False))
def test_promote_because_have_lock(self):
self.ha.has_lock = true
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'promoted self to leader because I had the session lock')
def test_promote_without_watchdog(self):
self.ha.has_lock = true
self.p.is_leader = true
with patch.object(Watchdog, 'activate', Mock(return_value=False)):
self.assertEqual(self.ha.run_cycle(), 'Demoting self because watchdog could not be activated')
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'Not promoting self because watchdog could not be activated')
def test_leader_with_lock(self):
@@ -462,12 +461,12 @@ class TestHa(PostgresInit):
self.assertEqual(self.ha.run_cycle(), 'demoted self because failed to update leader lock in DCS')
with patch.object(Ha, '_get_node_to_follow', Mock(side_effect=DCSError('foo'))):
self.assertEqual(self.ha.run_cycle(), 'demoted self because failed to update leader lock in DCS')
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'not promoting because failed to update leader lock in DCS')
@patch.object(Cluster, 'is_unlocked', Mock(return_value=False))
def test_follow(self):
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'no action. I am (postgresql0), a secondary, and following a leader ()')
self.ha.patroni.replicatefrom = "foo"
self.p.config.check_recovery_conf = Mock(return_value=(True, False))
@@ -484,13 +483,13 @@ class TestHa(PostgresInit):
def test_follow_in_pause(self):
self.ha.is_paused = true
self.assertEqual(self.ha.run_cycle(), 'PAUSE: continue to run as primary without lock')
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'PAUSE: no action. I am (postgresql0)')
@patch.object(Rewind, 'rewind_or_reinitialize_needed_and_possible', Mock(return_value=True))
@patch.object(Rewind, 'can_rewind', PropertyMock(return_value=True))
def test_follow_triggers_rewind(self):
self.p.is_leader = false
self.p.is_primary = false
self.ha._rewind.trigger_check_diverged_lsn()
self.ha.cluster = get_cluster_initialized_with_leader()
self.assertEqual(self.ha.run_cycle(), 'running pg_rewind from leader')
@@ -544,7 +543,7 @@ class TestHa(PostgresInit):
self.ha.global_config = self.ha.patroni.config.get_global_config(self.ha.cluster)
self.ha.update_failsafe({'name': 'leader', 'api_url': 'http://127.0.0.1:8008/patroni',
'conn_url': 'postgres://127.0.0.1:5432/postgres', 'slots': {'foo': 1000}})
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'DCS is not accessible')
def test_no_dcs_connection_replica_failsafe_not_enabled_but_active(self):
@@ -552,7 +551,7 @@ class TestHa(PostgresInit):
self.ha.cluster = get_cluster_initialized_with_leader()
self.ha.update_failsafe({'name': 'leader', 'api_url': 'http://127.0.0.1:8008/patroni',
'conn_url': 'postgres://127.0.0.1:5432/postgres', 'slots': {'foo': 1000}})
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'DCS is not accessible')
def test_update_failsafe(self):
@@ -591,9 +590,9 @@ class TestHa(PostgresInit):
self.ha.cluster = get_cluster_not_initialized_without_leader()
self.e.initialize = true
self.assertEqual(self.ha.bootstrap(), 'trying to bootstrap a new cluster')
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(), 'waiting for end of recovery after bootstrap')
self.p.is_leader = true
self.p.is_primary = true
self.ha.is_synchronous_mode = true
self.assertEqual(self.ha.run_cycle(), 'running post_bootstrap')
self.assertEqual(self.ha.run_cycle(), 'initialized a new cluster')
@@ -613,7 +612,6 @@ class TestHa(PostgresInit):
self.ha.cluster = get_cluster_not_initialized_without_leader()
self.e.initialize = true
self.ha.bootstrap()
self.p.is_leader = true
with patch.object(Watchdog, 'activate', Mock(return_value=False)), \
patch('patroni.ha.logger.error') as mock_logger:
self.assertEqual(self.ha.post_bootstrap(), 'running post_bootstrap')
@@ -745,10 +743,8 @@ class TestHa(PostgresInit):
self.assertEqual('PAUSE: no action. I am (postgresql0), the leader with the lock', self.ha.run_cycle())
def test_manual_failover_from_leader_in_synchronous_mode(self):
self.p.is_leader = true
self.ha.has_lock = true
self.ha.is_synchronous_mode = true
self.ha.is_failover_possible = false
self.ha.process_sync_replication = Mock()
self.ha.cluster = get_cluster_initialized_with_leader(Failover(0, self.p.name, 'a', None), (self.p.name, None))
self.assertEqual('no action. I am (postgresql0), the leader with the lock', self.ha.run_cycle())
@@ -757,7 +753,7 @@ class TestHa(PostgresInit):
self.assertEqual('manual failover: demoting myself', self.ha.run_cycle())
def test_manual_failover_process_no_leader(self):
self.p.is_leader = false
self.p.is_primary = false
self.ha.cluster = get_cluster_initialized_without_leader(failover=Failover(0, '', self.p.name, None))
self.ha.cluster = get_cluster_initialized_without_leader(failover=Failover(0, '', 'leader', None))
self.p.set_role('replica')
@@ -781,7 +777,7 @@ class TestHa(PostgresInit):
def test_manual_failover_process_no_leader_in_synchronous_mode(self):
self.ha.is_synchronous_mode = true
self.p.is_leader = false
self.p.is_primary = false
# switchover to a specific node, which name doesn't match our name (postgresql0)
self.ha.cluster = get_cluster_initialized_without_leader(failover=Failover(0, 'leader', 'other', None))
@@ -841,14 +837,14 @@ class TestHa(PostgresInit):
self.assertEqual(self.ha.run_cycle(), 'PAUSE: continue to run as primary without lock')
self.ha.cluster = get_cluster_initialized_without_leader(failover=Failover(0, 'leader', 'blabla', None))
self.assertEqual('PAUSE: acquired session lock as a leader', self.ha.run_cycle())
self.p.is_leader = false
self.p.is_primary = false
self.p.set_role('replica')
self.ha.cluster = get_cluster_initialized_without_leader(failover=Failover(0, 'leader', self.p.name, None))
self.assertEqual(self.ha.run_cycle(), 'PAUSE: promoted self to leader by acquiring session lock')
def test_is_healthiest_node(self):
self.ha.is_failsafe_mode = true
self.ha.state_handler.is_leader = false
self.p.is_primary = false
self.ha.patroni.nofailover = False
self.ha.fetch_node_status = get_node_status()
self.ha.dcs._last_failsafe = {'foo': ''}
@@ -862,7 +858,7 @@ class TestHa(PostgresInit):
self.assertFalse(self.ha.is_healthiest_node())
def test__is_healthiest_node(self):
self.p.is_leader = false
self.p.is_primary = false
self.ha.cluster = get_cluster_initialized_without_leader(sync=('postgresql1', self.p.name))
self.ha.global_config = self.ha.patroni.config.get_global_config(self.ha.cluster)
self.assertTrue(self.ha._is_healthiest_node(self.ha.old_cluster.members))
@@ -961,7 +957,7 @@ class TestHa(PostgresInit):
self.assertTrue(self.ha.restart_matches("replica", "9.5.2", False))
def test_process_healthy_cluster_in_pause(self):
self.p.is_leader = false
self.p.is_primary = false
self.ha.is_paused = true
self.p.name = 'leader'
self.ha.cluster = get_cluster_initialized_with_leader()
@@ -972,7 +968,7 @@ class TestHa(PostgresInit):
@patch('patroni.postgresql.mtime', Mock(return_value=1588316884))
@patch('builtins.open', mock_open(read_data='1\t0/40159C0\tno recovery target specified\n'))
def test_process_healthy_standby_cluster_as_standby_leader(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.name = 'leader'
self.ha.cluster = get_standby_cluster_initialized_with_only_leader()
self.p.config.check_recovery_conf = Mock(return_value=(False, False))
@@ -984,7 +980,7 @@ class TestHa(PostgresInit):
self.assertEqual(self.ha.run_cycle(), 'promoted self to a standby leader because i had the session lock')
def test_process_healthy_standby_cluster_as_cascade_replica(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.name = 'replica'
self.ha.cluster = get_standby_cluster_initialized_with_only_leader()
self.assertEqual(self.ha.run_cycle(),
@@ -994,7 +990,7 @@ class TestHa(PostgresInit):
@patch.object(Cluster, 'is_unlocked', Mock(return_value=True))
def test_process_unhealthy_standby_cluster_as_standby_leader(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.name = 'leader'
self.ha.cluster = get_standby_cluster_initialized_with_only_leader()
self.ha.sysid_valid = true
@@ -1004,13 +1000,13 @@ class TestHa(PostgresInit):
@patch.object(Rewind, 'rewind_or_reinitialize_needed_and_possible', Mock(return_value=True))
@patch.object(Rewind, 'can_rewind', PropertyMock(return_value=True))
def test_process_unhealthy_standby_cluster_as_cascade_replica(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.name = 'replica'
self.ha.cluster = get_standby_cluster_initialized_with_only_leader()
self.assertTrue(self.ha.run_cycle().startswith('running pg_rewind from remote_member:'))
def test_recover_unhealthy_leader_in_standby_cluster(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.name = 'leader'
self.p.is_running = false
self.p.follow = false
@@ -1019,7 +1015,7 @@ class TestHa(PostgresInit):
@patch.object(Cluster, 'is_unlocked', Mock(return_value=True))
def test_recover_unhealthy_unlocked_standby_cluster(self):
self.p.is_leader = false
self.p.is_primary = false
self.p.name = 'leader'
self.p.is_running = false
self.p.follow = false
@@ -1079,7 +1075,7 @@ class TestHa(PostgresInit):
check_calls([(update_lock, True), (demote, True)])
self.ha.has_lock = false
self.p.is_leader = false
self.p.is_primary = false
self.assertEqual(self.ha.run_cycle(),
'no action. I am (postgresql0), a secondary, and following a leader (leader)')
check_calls([(update_lock, False), (demote, False)])
@@ -1213,7 +1209,7 @@ class TestHa(PostgresInit):
self.ha.is_synchronous_mode = true
mock_set_sync = self.p.sync_handler.set_synchronous_standby_names = Mock()
self.p.is_leader = false
self.p.is_primary = false
self.p.set_role('replica')
self.ha.has_lock = true
mock_write_sync = self.ha.dcs.write_sync_state = Mock(return_value=SyncState.empty())
@@ -1236,7 +1232,7 @@ class TestHa(PostgresInit):
def test_unhealthy_sync_mode(self):
self.ha.is_synchronous_mode = true
self.p.is_leader = false
self.p.is_primary = false
self.p.set_role('replica')
self.p.name = 'other'
self.ha.cluster = get_cluster_initialized_without_leader(sync=('leader', 'other2'))
@@ -1267,7 +1263,7 @@ class TestHa(PostgresInit):
self.ha.is_synchronous_mode = true
self.p.name = 'other'
self.p.is_leader = false
self.p.is_primary = false
self.p.set_role('replica')
mock_restart = self.p.restart = Mock(return_value=True)
self.ha.cluster = get_cluster_initialized_with_leader(sync=('leader', 'other'))
@@ -1298,6 +1294,19 @@ class TestHa(PostgresInit):
mock_restart.assert_called_once()
self.ha.dcs.get_cluster.assert_not_called()
def test_enable_synchronous_mode(self):
self.ha.is_synchronous_mode = true
self.ha.has_lock = true
self.p.name = 'leader'
self.ha.dcs.write_sync_state = Mock(return_value=SyncState.empty())
with patch('patroni.ha.logger.info') as mock_logger:
self.ha.run_cycle()
self.assertEqual(mock_logger.call_args[0][0], 'Enabled synchronous replication')
self.ha.dcs.write_sync_state = Mock(return_value=None)
with patch('patroni.ha.logger.warning') as mock_logger:
self.ha.run_cycle()
self.assertEqual(mock_logger.call_args[0][0], 'Updating sync state failed')
def test_effective_tags(self):
self.ha._disable_sync = True
self.assertEqual(self.ha.get_effective_tags(), {'foo': 'bar', 'nosync': True})
@@ -1370,7 +1379,7 @@ class TestHa(PostgresInit):
@patch('sys.exit', return_value=1)
def test_abort_join(self, exit_mock):
self.ha.cluster = get_cluster_not_initialized_without_leader()
self.p.is_leader = false
self.p.is_primary = false
self.ha.run_cycle()
exit_mock.assert_called_once_with(1)
@@ -1421,6 +1430,7 @@ class TestHa(PostgresInit):
@patch('os.open', Mock())
@patch('os.fsync', Mock())
@patch('os.close', Mock())
@patch('os.chmod', Mock())
@patch('os.rename', Mock())
@patch('patroni.postgresql.Postgresql.is_starting', Mock(return_value=False))
@patch('builtins.open', mock_open())
@@ -1429,7 +1439,7 @@ class TestHa(PostgresInit):
@patch.object(SlotsHandler, 'sync_replication_slots', Mock(return_value=['ls']))
def test_follow_copy(self):
self.ha.cluster.config.data['slots'] = {'ls': {'database': 'a', 'plugin': 'b'}}
self.p.is_leader = false
self.p.is_primary = false
self.assertTrue(self.ha.run_cycle().startswith('Copying logical slots'))
def test_acquire_lock(self):

View File

@@ -306,25 +306,25 @@ class TestKubernetesConfigMaps(BaseTestKubernetes):
self.k.touch_member({'state': 'running', 'role': 'replica'})
mock_patch_namespaced_pod.assert_called()
self.assertEqual(mock_patch_namespaced_pod.call_args.args[2].metadata.labels['isMaster'], 'false')
self.assertEqual(mock_patch_namespaced_pod.call_args.args[2].metadata.labels['tmp_role'], 'replica')
self.assertEqual(mock_patch_namespaced_pod.call_args[0][2].metadata.labels['isMaster'], 'false')
self.assertEqual(mock_patch_namespaced_pod.call_args[0][2].metadata.labels['tmp_role'], 'replica')
self.k.touch_member({'state': 'running', 'role': 'standby-leader'})
mock_patch_namespaced_pod.assert_called()
self.assertEqual(mock_patch_namespaced_pod.call_args.args[2].metadata.labels['isMaster'], 'false')
self.assertEqual(mock_patch_namespaced_pod.call_args.args[2].metadata.labels['tmp_role'], 'standby-leader')
self.assertEqual(mock_patch_namespaced_pod.call_args[0][2].metadata.labels['isMaster'], 'false')
self.assertEqual(mock_patch_namespaced_pod.call_args[0][2].metadata.labels['tmp_role'], 'standby-leader')
self.k._name = 'p-0'
self.k.touch_member({'role': 'primary'})
mock_patch_namespaced_pod.assert_called()
self.assertEqual(mock_patch_namespaced_pod.call_args.args[2].metadata.labels['isMaster'], 'true')
self.assertEqual(mock_patch_namespaced_pod.call_args.args[2].metadata.labels['tmp_role'], 'master')
self.assertEqual(mock_patch_namespaced_pod.call_args[0][2].metadata.labels['isMaster'], 'true')
self.assertEqual(mock_patch_namespaced_pod.call_args[0][2].metadata.labels['tmp_role'], 'master')
def test_initialize(self):
self.k.initialize()
def test_delete_leader(self):
self.k.delete_leader(1)
self.k.delete_leader(self.k.get_cluster().leader, 1)
def test_cancel_initialization(self):
self.k.cancel_initialization()

View File

@@ -40,6 +40,7 @@ class MockFrozenImporter(object):
@patch('time.sleep', Mock())
@patch('subprocess.call', Mock(return_value=0))
@patch('patroni.psycopg.connect', psycopg_connect)
@patch('urllib3.PoolManager.request', Mock(side_effect=Exception))
@patch.object(ConfigHandler, 'append_pg_hba', Mock())
@patch.object(ConfigHandler, 'write_postgresql_conf', Mock())
@patch.object(ConfigHandler, 'write_recovery_conf', Mock())
@@ -63,6 +64,7 @@ class TestPatroni(unittest.TestCase):
self.assertRaises(SystemExit, _main)
@patch('pkgutil.iter_importers', Mock(return_value=[MockFrozenImporter()]))
@patch('urllib3.PoolManager.request', Mock(side_effect=Exception))
@patch('sys.frozen', Mock(return_value=True), create=True)
@patch.object(HTTPServer, '__init__', Mock())
@patch.object(etcd.Client, 'read', etcd_read)

View File

@@ -346,8 +346,7 @@ class TestPostgresql(BaseTestPostgresql):
@patch.object(Postgresql, 'start', Mock())
def test_follow(self):
self.p.call_nowait(CallbackAction.ON_START)
m = RemoteMember.from_name_and_data('1', {'restore_command': '2', 'primary_slot_name': 'foo',
'conn_kwargs': {'host': 'bar'}})
m = RemoteMember('1', {'restore_command': '2', 'primary_slot_name': 'foo', 'conn_kwargs': {'host': 'bar'}})
self.p.follow(m)
with patch.object(Postgresql, 'ensure_major_version_is_known', Mock(return_value=False)):
self.assertIsNone(self.p.follow(m))
@@ -364,11 +363,11 @@ class TestPostgresql(BaseTestPostgresql):
self.assertRaises(psycopg.ProgrammingError, self.p.query, 'blabla')
@patch.object(Postgresql, 'pg_isready', Mock(return_value=STATE_REJECT))
def test_is_leader(self):
self.assertTrue(self.p.is_leader())
def test_is_primary(self):
self.assertTrue(self.p.is_primary())
self.p.reset_cluster_info_state(None)
with patch.object(Postgresql, '_query', Mock(side_effect=RetryFailedError(''))):
self.assertFalse(self.p.is_leader())
self.assertFalse(self.p.is_primary())
@patch.object(Postgresql, 'controldata', Mock(return_value={'Database cluster state': 'shut down',
'Latest checkpoint location': '0/1ADBC18',
@@ -462,7 +461,7 @@ class TestPostgresql(BaseTestPostgresql):
self.assertIsNone(self.p.call_nowait(CallbackAction.ON_START))
@patch.object(Postgresql, 'is_running', Mock(return_value=MockPostmaster()))
def test_is_leader_exception(self):
def test_is_primary_exception(self):
self.p.start()
self.p.query = Mock(side_effect=psycopg.OperationalError("not supported"))
self.assertTrue(self.p.stop())
@@ -523,8 +522,9 @@ class TestPostgresql(BaseTestPostgresql):
def test_save_configuration_files(self):
self.p.config.save_configuration_files()
@patch('os.path.isfile', Mock(side_effect=[False, True]))
@patch('shutil.copy', Mock(side_effect=IOError))
@patch('os.path.isfile', Mock(side_effect=[False, True, False, True]))
@patch('shutil.copy', Mock(side_effect=[None, IOError]))
@patch('os.chmod', Mock())
def test_restore_configuration_files(self):
self.p.config.restore_configuration_files()
@@ -955,3 +955,10 @@ class TestPostgresql2(BaseTestPostgresql):
gucs = self.p.available_gucs
self.assertIsInstance(gucs, CaseInsensitiveSet)
self.assertEqual(gucs, mock_available_gucs.return_value)
def test_cluster_info_query(self):
self.assertIn('diff(pg_catalog.pg_current_wal_flush_lsn(', self.p.cluster_info_query)
self.p._major_version = 90600
self.assertIn('diff(pg_catalog.pg_current_xlog_flush_location(', self.p.cluster_info_query)
self.p._major_version = 90500
self.assertIn('diff(pg_catalog.pg_current_xlog_location(', self.p.cluster_info_query)

View File

@@ -142,25 +142,25 @@ class TestRaft(unittest.TestCase):
raft._citus_group = '1'
self.assertTrue(raft.manual_failover('foo', 'bar'))
raft._citus_group = '0'
self.assertTrue(raft.take_leader())
cluster = raft.get_cluster()
self.assertIsInstance(cluster, Cluster)
self.assertIsInstance(cluster.workers[1], Cluster)
leader = cluster.leader
self.assertTrue(raft.delete_leader(leader))
self.assertTrue(raft._sync_obj.set(raft.status_path, '{"optime":1234567,"slots":{"ls":12345}}'))
leader = raft.get_cluster().leader
raft.get_cluster()
self.assertTrue(raft.update_leader(leader, '1', failsafe={'foo': 'bat'}))
self.assertTrue(raft._sync_obj.set(raft.failsafe_path, '{"foo"}'))
self.assertTrue(raft._sync_obj.set(raft.status_path, '{'))
raft.get_citus_coordinator()
self.assertTrue(raft.delete_sync_state())
self.assertTrue(raft.delete_leader())
self.assertTrue(raft.set_history_value(''))
self.assertTrue(raft.delete_cluster())
raft._citus_group = '1'
self.assertTrue(raft.delete_cluster())
raft._citus_group = None
raft.get_cluster()
self.assertTrue(raft.take_leader())
raft.get_cluster()
raft.watch(None, 0.001)
raft._sync_obj.destroy()

View File

@@ -253,8 +253,8 @@ class TestRewind(BaseTestPostgresql):
mock_logger_info.call_args[0])
mock_logger_info.reset_mock()
mock_subprocess_call.assert_called_once()
self.assertEqual(mock_subprocess_call.call_args.args[0], ['command 000000000000000000000000'])
self.assertEqual(mock_subprocess_call.call_args.kwargs['shell'], True)
self.assertEqual(mock_subprocess_call.call_args[0][0], ['command 000000000000000000000000'])
self.assertEqual(mock_subprocess_call.call_args[1]['shell'], True)
mock_subprocess_call.reset_mock()
# failed archive_command call

View File

@@ -48,7 +48,7 @@ class TestSlotsHandler(BaseTestPostgresql):
self.s.sync_replication_slots(cluster, False)
mock_debug.assert_called_once()
self.p.set_role('replica')
with patch.object(Postgresql, 'is_leader', Mock(return_value=False)), \
with patch.object(Postgresql, 'is_primary', Mock(return_value=False)), \
patch.object(SlotsHandler, 'drop_replication_slot') as mock_drop:
self.s.sync_replication_slots(cluster, False, paused=True)
mock_drop.assert_not_called()
@@ -67,6 +67,23 @@ class TestSlotsHandler(BaseTestPostgresql):
with patch.object(Postgresql, 'major_version', PropertyMock(return_value=90618)):
self.s.sync_replication_slots(cluster, False)
def test_cascading_replica_sync_replication_slots(self):
"""Test sync with a cascading replica so physical slots are present on a replica."""
config = ClusterConfig(1, {'slots': {'ls': {'database': 'a', 'plugin': 'b'}}}, 1)
cascading_replica = Member(0, 'test-2', 28, {
'state': 'running', 'conn_url': 'postgres://replicator:rep-pass@127.0.0.1:5436/postgres',
'tags': {'replicatefrom': 'postgresql0'}
})
cluster = Cluster(True, config, self.leader, 0,
[self.me, self.other, self.leadermem, cascading_replica],
None, SyncState.empty(), None, {'ls': 10}, None)
self.p.set_role('replica')
with patch.object(Postgresql, '_query') as mock_query, \
patch.object(Postgresql, 'is_primary', Mock(return_value=False)):
mock_query.return_value = [('ls', 'logical', 'b', 'a', 5, 12345, 105)]
ret = self.s.sync_replication_slots(cluster, False)
self.assertEqual(ret, [])
def test_process_permanent_slots(self):
config = ClusterConfig(1, {'slots': {'ls': {'database': 'a', 'plugin': 'b'}},
'ignore_slots': [{'name': 'blabla'}]}, 1)
@@ -89,7 +106,7 @@ class TestSlotsHandler(BaseTestPostgresql):
"confirmed_flush_lsn": 12345, "catalog_xmin": 105}])
self.assertEqual(self.p.slots(), {})
@patch.object(Postgresql, 'is_leader', Mock(return_value=False))
@patch.object(Postgresql, 'is_primary', Mock(return_value=False))
def test__ensure_logical_slots_replica(self):
self.p.set_role('replica')
self.cluster.slots['ls'] = 12346
@@ -116,7 +133,7 @@ class TestSlotsHandler(BaseTestPostgresql):
@patch.object(Postgresql, 'stop', Mock(return_value=True))
@patch.object(Postgresql, 'start', Mock(return_value=True))
@patch.object(Postgresql, 'is_leader', Mock(return_value=False))
@patch.object(Postgresql, 'is_primary', Mock(return_value=False))
def test_check_logical_slots_readiness(self):
self.s.copy_logical_slots(self.cluster, ['ls'])
with patch.object(MockCursor, '__iter__', Mock(return_value=iter([('postgresql0', None)]))), \
@@ -130,7 +147,7 @@ class TestSlotsHandler(BaseTestPostgresql):
@patch.object(Postgresql, 'stop', Mock(return_value=True))
@patch.object(Postgresql, 'start', Mock(return_value=True))
@patch.object(Postgresql, 'is_leader', Mock(return_value=False))
@patch.object(Postgresql, 'is_primary', Mock(return_value=False))
def test_on_promote(self):
self.s.schedule_advance_slots({'foo': {'bar': 100}})
self.s.copy_logical_slots(self.cluster, ['ls'])

View File

@@ -202,7 +202,7 @@ class TestZooKeeper(unittest.TestCase):
mock_logger.assert_called_once()
def test_delete_leader(self):
self.assertTrue(self.zk.delete_leader())
self.assertTrue(self.zk.delete_leader(self.zk.get_cluster().leader))
def test_set_failover_value(self):
self.zk.set_failover_value('')

35
tox.ini
View File

@@ -77,6 +77,7 @@ platform =
{[common]platforms}
allowlist_externals =
rm
true
{env:OPEN_CMD}
[testenv:dep]
@@ -174,24 +175,52 @@ platform =
{[common]platforms}
[testenv:docs-{lin,mac,win}]
description = Build Sphinx documentation
description = Build Sphinx documentation in HTML format
labels:
docs
deps =
sphinx>=4
sphinx_rtd_theme
-r requirements.docs.txt
-r requirements.txt
psycopg[binary]
psycopg2-binary
commands =
sphinx-build \
-d "{envtmpdir}{/}doctree" docs "{toxworkdir}{/}docs_out" \
--color -b html \
-T -E -W --keep-going \
{posargs}
commands_post =
- {tty:{env:OPEN_CMD} "{toxworkdir}{/}docs_out{/}index.html":true:}
allowlist_externals =
true
{env:OPEN_CMD}
platform =
{[common]platforms}
[testenv:pdf-{lin,mac,win}]
description = Build Sphinx documentation in PDF format
labels:
docs
deps =
-r requirements.docs.txt
-r requirements.txt
psycopg[binary]
psycopg2-binary
commands =
python -m sphinx -T -E -b latex -d _build/doctrees -D language=en . pdf
- latexmk -r pdf/latexmkrc -cd -C pdf/Patroni.tex
latexmk -r pdf/latexmkrc -cd -pdf -f -dvi- -ps- -jobname=Patroni -interaction=nonstopmode pdf/Patroni.tex
commands_post =
- {tty:{env:OPEN_CMD} "pdf{/}Patroni.pdf":true:}
allowlist_externals =
true
latexmk
{env:OPEN_CMD}
platform =
{[common]platforms}
change_dir = docs
[flake8]
max-line-length = 120
ignore = D401,W503