patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	b901e62ad0	Enhanced checks of replica logical slots safety (#2285 ) The logical slot on a replica is safe to use when the physical replica slot on the primary: 1. has a nonzero/non-null `catalog_xmin` 2. has a `catalog_xmin` that is not newer (greater) than the `catalog_xmin` of any slot on the standby 3. the `catalog_xmin` is known to overtake `catalog_xmin` of logical slots on the primary observed during `1` In case if `1` doesn't take place, Patroni will run an additional check whether the `hot_standby_feedback` is actually in effect and shows the warning in case it is not.	2022-05-10 12:24:47 +02:00
Alexander Kukushkin	5f6197aaad	Don't copy logical slot if there is mismatch with the config (#2274 ) A couple of times we have seen in the wild that the database for the permanent logical slots was changed in the Patroni config. It resulted in the below situation. On the primary: 1. The slot must be dropped before creating it in a different DB. 2. Patroni fails to drop it because the slot is in use. Replica: 1. Patroni notice that the slot exists in the wrong DB and successfully dropping it. 2. Patroni copying the existing slot from the primary by its name with Postgres restart. And the loop repeats while the "wrong" slot exists on the primary. Basically, replicas are continuously restarting, which badly affects availability. In order to solve the problem, we will perform additional checks while copying replication slot files from the primary and discard them if `slot_type`, `database`, or `plugin` don't match our expectations.	2022-04-14 12:10:37 +02:00
Alexander Kukushkin	333d41d9f0	Release 2.1.3 (#2219 ) * Implement missing unit-tests * Bump version * Update release notes	2022-02-18 14:16:15 +01:00
Alexander Kukushkin	fce889cd04	Compatibility with psycopg 3.0 (#2088 ) By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.	2021-11-19 14:32:54 +01:00
Alexander Kukushkin	c7173aadd7	Failover logical slots (#1820 ) Effectively, this PR consists of a few changes: 1. The easy part: In case of permanent logical slots are defined in the global configuration, Patroni on the primary will not only create them, but also periodically update DCS with the current values of `confirmed_flush_lsn` for all these slots. In order to reduce the number of interactions with DCS the new `/status` key was introduced. It will contain the json object with `optime` and `slots` keys. For backward compatibility the `/optime/leader` will be updated if there are members with old Patroni in the cluster. 2. The tricky part: On replicas that are eligible for a failover, Patroni creates the logical replication slot by copying the slot file from the primary and restarting the replica. In order to copy the slot file Patroni opens a connection to the primary with `rewind` or `superuser` credentials and calls `pg_read_binary_file()` function. When the logical slot already exists on the replica Patroni periodically calls `pg_replication_slot_advance()` function, which allows moving the slot forward. 3. Additional requirements: In order to ensure that primary doesn't cleanup tuples from pg_catalog that are required for logical decoding, Patroni enables `hot_standby_feedback` on replicas with logical slots and on cascading replicas if they are used for streaming by replicas with logical slots. 4. When logical slots are copied from to the replica there is a timeframe when it could be not safe to use them after promotion. Right now there is no protection from promoting such a replica. But, Patroni will show the warning with names of the slots that might be not safe to use. Compatibility. The `pg_replication_slot_advance()` function is only available starting from PostgreSQL 11. For older Postgres versions Patroni will refuse to create the logical slot on the primary. The old "permanent slots" feature, which creates logical slots right after promotion and before allowing connections, was removed. Close: https://github.com/zalando/patroni/issues/1749	2021-03-25 16:18:23 +01:00
Mark Mercado	09f2f579d7	Quick attempt at Prometheus (#1848 ) Close https://github.com/zalando/patroni/issues/318	2021-03-04 12:37:29 +01:00
Alexander Kukushkin	13e24d832d	Advanced validation of PostgreSQL parameters (#1674 ) So far Patroni was performing a comparison of the old value (in the `pg_settings`) with the new value (from Patroni configuration or from DCS) in order to figure out if reload or restart is required when the parameter has been changed. If the given parameter was missing in the `pg_settings` Patroni was ignoring it and not writing into the `postgresql.conf`. In case if Postgres is not running, no validation has been performed and parameters and values were written into the config as it is. It is not a very common mistake, but people tend to mistype parameter names or values. Also, it happens that some parameters are removed in specific Postgres versions and some new are added (e.g. `checkpoint_segments` replaced with `min_wal_size` and `max_wal_size` in 9.5 or` wal_keep_segments` was replaced with `wal_keep_size` in 13). Writing nonexistent parameters or invalid values into the `postgresql.conf` makes postgres unstartable. This change doesn't solve the issue 100%, but at least approaching this goal very close.	2020-09-01 16:26:57 +02:00
Alexander Kukushkin	3341c898ff	Add Etcd v3 protocol support via api gRPC-gateway (#1162 ) The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability. Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway. Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd. Known limitations: * The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks. * Watches work only starting from 3.1.0 * Authentication works only starting from 3.3.0 * gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name	2020-07-31 14:33:40 +02:00
Alexander Kukushkin	a68692a3e4	Get rid of kubernetes python module (#1586 ) The official python kubernetes client contains a lot of auto-generated code and therefore very heavy, but we need only a little fraction of it. The naive implementation, that covers all API methods we use, takes about 250 LoC, and about half of it is responsible for the handling of configuration files. Disadvantage: If somebody was using the `patronictl` outside of the pod (on his machine), it might not work anymore (depending on the environment).	2020-07-17 08:31:58 +02:00
ksarabu1	8a62999eaa	replica & async rest API health check enhancement (#1599 ) - ``GET /replica?lag=<max-lag>``: replica check endpoint. - ``GET /asynchronous?lag=<max-lag>`` or ``GET /async&lag=<max-lag>``: asynchronous standby check endpoint. Checks replication latency and returns status code 200 only when the latency is below a specified value. The key leader_optime from DCS is used for the leader WAL position and compute latency on the replica for performance reasons. Please note that the value in leader_optime might be a couple of seconds old (based on loop_wait). Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>	2020-07-15 10:36:48 +02:00
Alexander Kukushkin	98c2081c67	Detect a new timeline in the standby cluster (#1522 ) The standby cluster doesn't know about leader elections in the main cluster and therefore the usual mechanisms of detecting divergences don't work. For example, it could happen that the standby cluster is ahead of the new primary of the main cluster and must be rewound. There is a way to know that the new timeline has been created by checking the presence of a history file in pg_wal. If the new file is there, we will start usual procedures of making sure that we can continue streaming or will run the pg_rewind.	2020-05-29 14:14:47 +02:00
Alexander Kukushkin	6a0d2924a0	Separate received and replayed location (#1514 ) When making a decision whether the running replica is able to stream from the new primary or must be rewound we should use replayed location, therefore we extract received and replayed independently. Reuse the part of the query that extracts the timeline and locations in the REST API.	2020-05-27 13:33:37 +02:00
Alexander Kukushkin	ad5c686c11	Take advantage of pg_stat_wal_recevier (#1513 ) So far Patroni was parsing `recovery.conf` or querying `pg_settings` in order to get the current values of recovery parameters. On PostgreSQL earlier than 12 it could easily happen that the value of `primary_conninfo` in the `recovery.conf` has nothing to do with reality. Luckily for us, on PostgreSQL 9.6+ there is a `pg_stat_wal_receiver` view, which contains current values of `primary_conninfo` and `primary_slot_name`. The password field is masked through, but this is fine, because authentication happens only during opening the connection. All other parameters we compare as usual. Another advantage of `pg_stat_wal_recevier` - it contains the current timeline, therefore on 9.6+ we don't need to use the replication connection trick if walreceiver process is alive. If there is no walreceiver process available or it is not streaming we will stick to old methods.	2020-05-15 18:04:24 +02:00
Alexander Kukushkin	30aa355eb5	Shorten and beautify history log output (#1526 ) when Patroni is trying to figure out the necessity of pg_rewind it could write the content history file from the primary into the log. The history file is growing with every failover/switchover and eventually starts taking too many lines in the log, most of them are not so much useful. Instead of showing the raw data, we will show only 3 lines before the current replica timeline and 2 lines after.	2020-05-15 16:14:25 +02:00
Alexander Kukushkin	337f9efc9e	Improve patronictl list output (#1486 ) The redundant column `Column` will be presented in the table header. Depending on output format `Tags` are serialized differently: * For pretty format YAML is used, every element on the new line * For tsv format for YAML is also used, but all elements and on the same line (similar to JSON) * For json and yaml formats `Tags` are serialized into an appropriate format. <details><summary>Examples of output in pretty formats:</summary> ```bash $ patronictl list + Cluster: batman (6813309862653668387) +---------+----+-----------+---------------------+ \| Member \| Host \| Role \| State \| TL \| Lag in MB \| Tags \| +-------------+----------------+--------+---------+----+-----------+---------------------+ \| postgresql0 \| 127.0.0.1:5432 \| Leader \| running \| 3 \| \| clonefrom: true \| \| \| \| \| \| \| \| noloadbalance: true \| \| \| \| \| \| \| \| nosync: true \| +-------------+----------------+--------+---------+----+-----------+---------------------+ \| postgresql1 \| 127.0.0.1:5433 \| \| running \| 3 \| 0.0 \| \| +-------------+----------------+--------+---------+----+-----------+---------------------+ $ patronictl list badclustername + Cluster: badclustername (uninitialized) ------+ \| Member \| Host \| Role \| State \| TL \| Lag in MB \| +--------+------+------+-------+----+-----------+ +--------+------+------+-------+----+-----------+ ``` </details> <details><summary>Example in tsv format:</summary> ```bash Cluster Member Host Role State TL Lag in MB Pending restart Tags batman postgresql0 127.0.0.1:5432 Leader running 2 batman postgresql1 127.0.0.1:5433 running 2 0 {clonefrom: true, nofailover: true, noloadbalance: true, replicatefrom: postgresql0} batman postgresql2 127.0.0.1:5434 running 2 0 * {replicatefrom: postgres1} ``` </details> In addition to that, `patronictl list` command will stop showing keys with empty values in `json` and `yaml` formats. <details><summary>Examples:</summary> ```yaml $ patronictl list -f yaml - Cluster: batman Host: 127.0.0.1:5432 Member: postgresql0 Role: Leader State: running TL: 2 - Cluster: batman Host: 127.0.0.1:5433 Lag in MB: 0 Member: postgresql1 State: running TL: 2 Tags: clonefrom: true nofailover: true noloadbalance: true replicatefrom: postgresql0 - Cluster: batman Host: 127.0.0.1:5434 Lag in MB: 0 Member: postgresql2 Pending restart: '' State: running TL: 2 Tags: replicatefrom: postgres1 ``` ```json $ patronictl list -f json \| jq . [ { "Cluster": "batman", "Member": "postgresql0", "Host": "127.0.0.1:5432", "Role": "Leader", "State": "running", "TL": 2 }, { "Cluster": "batman", "Member": "postgresql1", "Host": "127.0.0.1:5433", "State": "running", "TL": 2, "Lag in MB": 0, "Tags": { "nofailover": true, "noloadbalance": true, "replicatefrom": "postgresql0", "clonefrom": true } }, { "Cluster": "batman", "Member": "postgresql2", "Host": "127.0.0.1:5434", "State": "running", "TL": 2, "Lag in MB": 0, "Pending restart": "", "Tags": { "replicatefrom": "postgres1" } } ] ``` </details>	2020-04-15 12:19:18 +02:00
ksarabu1	e3335bea1a	Master stop timeout (#1445 ) ## Feature: Postgres stop timeout Switchover/Failover operation hangs on signal_stop (or checkpoint) call when postmaster doesn't respond or hangs for some reason(Issue described in [1371](https://github.com/zalando/patroni/issues/1371)). This is leading to service loss for an extended period of time until the hung postmaster starts responding or it is killed by some other actor. ### master_stop_timeout The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by master_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, master_stop_timeout does not apply.	2020-04-15 12:18:49 +02:00
Alexander Kukushkin	16d1ffdde7	Update timeline on standby cluster (#1332 ) Fixes https://github.com/zalando/patroni/issues/1031	2019-12-20 12:56:00 +01:00
Igor Yanchenko	7ff27d9e10	Make sure unix_socket_directories and stats_temp_directory exist (#1293 ) Upon the start of Patroni and Postgres make sure that unix_socket_directories and stats_temp_directory exist or try to create them. Patroni will exit if failed to create them. Close https://github.com/zalando/patroni/issues/863	2019-12-11 12:26:17 +01:00
Alexander Kukushkin	7793887ea7	Fix tests on windows (#1303 ) and disable junit, it produces a deprecation warning	2019-11-27 14:57:33 +01:00
Alexander Kukushkin	90a4208390	Get rid from requests module (#1296 ) It wasn't used for anything critical anyway, so it doesn't make a lot of sense to keep it as an explicit dependency.	2019-11-22 15:31:55 +01:00
Alexander Kukushkin	828585079f	Improve workflow when PGDATA is not empty during bootstrap (#1217 ) Recently it has happened two times when people tried to deploy the new cluster but postgres data directory wasn't empty and also wasn't valid. In this case Patroni was still creating initialize key in DCS and trying to start the postgres up. Now it will complain about non-empty invalid postgres data directory and exit. Close https://github.com/zalando/patroni/issues/1216	2019-10-25 14:09:44 +02:00
Alexander Kukushkin	367d787ff9	Implement /history and /cluster endpoints (#1191 ) The /history endpoint shows the content of the `history` key in DCS The /cluster endpoint show all cluster members and some service info like pending and scheduled restarts or switchovers. In addition to that implement `patronictl history` Close #586 Close #675 Close #1133	2019-10-22 17:19:02 +02:00
Alexander Kukushkin	3d29cb7e50	Perform pg_ctl reload regardless of config changes (#1204 ) It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config. For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it. In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default). Close https://github.com/zalando/patroni/issues/1198	2019-10-10 14:49:30 +02:00
Alexander Kukushkin	1572c02ced	Use passfile in the primary_conninfo instead of password (#1194 ) Fixed a few minor issues related to the #1134 and #1122 Close https://github.com/zalando/patroni/issues/1185	2019-10-09 18:04:14 +02:00
Alexander Kukushkin	0b1b1e3b54	Compatibility with postgresql 12 (#1068 ) * use `SHOW primary_conninfo` instead of parsing config file on pg12 * strip out standby and recovery parameters from postgresql.auto.conf before starting the postgres 12 Patroni config remains backward compatible. Despite for example `restore_command` converted to a GUC starting from postgresql 12, in the Patroni configuration you can still keep it in the `postgresql.recovery_conf` section. If you put it into `postgresql.parameters.restore_command`, that will also work, but it is important not to mix both ways: ```yaml # is OK postgresql: parameters: restore_command: my_restore_command archive_cleanup_command: my_archive_cleanup_command # is OK postgresql: recovery_conf: restore_command: my_restore_command archive_cleanup_command: my_archive_cleanup_command # is NOT ok postgresql: parameters: restore_command: my_restore_command recovery_conf: archive_cleanup_command: my_archive_cleanup_command ```	2019-08-02 16:00:55 +02:00
Alexander Kukushkin	a4bd6a9b4b	Refactor postgresql class (#1060 ) * Convert postgresql.py into a package * Factor out cancellable process into a separate class * Factor out connection handler into a separate class * Move postmaster into postgresql package * Factor out pg_rewind into a separate class * Factor out bootstrap into a separate class * Factor out slots handler into a separate class * Factor out postgresql config handler into a separate class * Move callback_executor into postgresql package This is just a careful refactoring, without code changes.	2019-05-21 16:02:47 +02:00

26 Commits