patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 10:20:10 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	0a1d9b0a25	Get rid from distutils module dependency (#1146 ) We are using only one function from there, `find_executable()` and it is better to implement a similar function in Patroni rather than add `distutils` module into requirements.txt	2019-08-26 09:38:47 +02:00
Alexander Kukushkin	278bf9852b	Release 1.6.0 (#1131 ) * Implement missing tests and do a few minor fixes * Bump version to 1.6.0 * Update release notes	2019-08-05 15:08:04 +02:00
Alexander Kukushkin	0b1b1e3b54	Compatibility with postgresql 12 (#1068 ) * use `SHOW primary_conninfo` instead of parsing config file on pg12 * strip out standby and recovery parameters from postgresql.auto.conf before starting the postgres 12 Patroni config remains backward compatible. Despite for example `restore_command` converted to a GUC starting from postgresql 12, in the Patroni configuration you can still keep it in the `postgresql.recovery_conf` section. If you put it into `postgresql.parameters.restore_command`, that will also work, but it is important not to mix both ways: ```yaml # is OK postgresql: parameters: restore_command: my_restore_command archive_cleanup_command: my_archive_cleanup_command # is OK postgresql: recovery_conf: restore_command: my_restore_command archive_cleanup_command: my_archive_cleanup_command # is NOT ok postgresql: parameters: restore_command: my_restore_command recovery_conf: archive_cleanup_command: my_archive_cleanup_command ```	2019-08-02 16:00:55 +02:00
Alexander Kukushkin	4a24b79b73	IPv6 support (#1122 ) fixes https://github.com/zalando/patroni/issues/1121	2019-08-02 11:34:29 +02:00
Rafia Sabih	5cc3afc037	Enhance dialogues for scheduled switchover and restart (#1119 ) Enhance dialogue for switchover and restart In case of schedule switchover or restart, mention time if any, when confirming.	2019-08-02 11:21:26 +02:00
Alexander Kukushkin	1a6db4f5af	Reverse logic around checkpoint_after_promote (#1084 ) It will be set to false in the JSON only until the checkpoint actually happened. The next improvement of `bba9066315`	2019-06-17 10:42:31 +02:00
Alexander Kukushkin	fad6d26a3a	Small refactoring of postgresql/bootstrap (#1086 ) the main purpose of this PR is simplifying #1068 It is mostly necessary for future support of pg12, where there will be no recovery.conf anymore, but `keep_existing_recovery_conf` parameter still needs to be supported due to backward compatibility.	2019-06-17 10:41:13 +02:00
Alexander Kukushkin	37f03790cc	Implement two-step logging (#1080 ) A few times we observed that Patroni HA loop was blocked for a few minutes due to not being able to write logs to stderr. This is a very rare condition which we hit so far only on k8s. This commit makes Patroni resilient to such kind of problems. All log messages first are written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread. The maximum queue size is configurable and the default value is 1000. This should be enough to keep more than one hour of log messages with default settings and when Patroni cluster operates normally (without big issues). In case if we hit the maximum size of the queue further logs will be discarded until the queue size will be reduced. The number of discarded messages will be reported into the log later. In addition to that, the number of non-flushed and discarded messages (if there are any), will be reported via Patroni REST API as: ```json "logger_queue_size": X, "logger_records_lost": Y` ```	2019-06-13 14:18:49 +02:00
wilfriedroset	2384d9e735	Add API route /health (#1079 ) close #119	2019-06-11 15:22:52 +02:00
Alexander Kukushkin	a4bd6a9b4b	Refactor postgresql class (#1060 ) * Convert postgresql.py into a package * Factor out cancellable process into a separate class * Factor out connection handler into a separate class * Move postmaster into postgresql package * Factor out pg_rewind into a separate class * Factor out bootstrap into a separate class * Factor out slots handler into a separate class * Factor out postgresql config handler into a separate class * Move callback_executor into postgresql package This is just a careful refactoring, without code changes.	2019-05-21 16:02:47 +02:00
Alexander Kukushkin	e54dfa508d	Consider sync node as a healthy even when the former leader is ahead (#1059 ) Fixes https://github.com/zalando/patroni/issues/1054	2019-05-13 16:32:53 +02:00
Alexander Kukushkin	4b48653d09	More standby cluster bugfixes (#1053 ) 1. use the default port is 5432 when only standby_cluster.host is defined 2. check that standby_cluster replica can be bootstrapped without connection to the standby_cluster leader against `create_replica_methods` defined in the `standby_cluster` config instead of the `postgresql` section. 3. Don't fallback to the create_replica_methods defined in the `postgresql` section when bootstrapping a member of the standby cluster. 4. Make sure we specify the database when connecting to the leader.	2019-05-13 14:19:22 +02:00
Alexander Kukushkin	c7db134a20	Make pyscopg2 version parsing more robust (#1052 ) non-released version can have unparsable suffixes, for example 2.8.3.dev0	2019-05-13 14:18:56 +02:00
Alexander Kukushkin	bba9066315	Make it possible to run pg_rewind without superuser on pg11+ (#1035 ) * expose the current patroni version in DCS * expose `checkpoint_after_promote` flag in DCS as an indicator that pg_rewind could be safely executed * other nodes will wait until this flag is set instead of connecting as superuser and issuing the CHECKPOINT * define `postgresql.authention.rewind` with credentials for pg_rewind in patroni configuration files. * create user for pg_rewind if postgres is 11+ * grant execute on functions required for pg_rewind to rewind user	2019-05-02 14:07:26 +02:00
Alexander Kukushkin	f0b784fe7f	Manage pg_ident.conf with Patroni (#1037 ) This functionality works similarly to the `pg_hba`: If the `postgresql.pg_ident` is defined in the config file or DCS, Patroni will write its value to pg_ident.conf, however, if `postgresql.parameters.ident_file` is defined, Patroni will assume that pg_ident is managed from outside and not update the file.	2019-04-23 16:16:53 +02:00
Julien Riou	663026c34c	Use SSLContext to wrap REST API socket (#1039 ) Using `ssl.wrap_socket` is deprecated and was still allowing soon-to-be-deprecated protocols like TLS 1.1. Now using `SSLContext.create_default_context()` to produce a secure SSL context to wrap the REST API server's socket.	2019-04-23 11:23:22 +02:00
Alexander Kukushkin	51b085a76d	Don't wait until the previous callback finish is kill failed (#1036 ) Such wait was happening in the main thread and blocking HA loop. After all the executor thread was doing absolutely the same.	2019-04-15 15:49:06 +02:00
Alexander Kukushkin	7c0c9599fc	Remove psycopg2 from requirements (#1023 ) Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation. This PR also changes the following behavior: * `pip install patroni` will fail if psycopg2 is not installed * Patroni will check psycopg2 upon start and fail if it can't be found or outdated. Closes https://github.com/zalando/patroni/issues/1021	2019-04-15 14:30:16 +02:00
Pavlo Golub	b53a29c022	Fix unit-tests for Windows (#1014 ) Closes #1013	2019-04-02 13:58:17 +02:00
Alexander Kukushkin	e38fe78b56	Fix callbacks behavior (mostly for standby cluster) (#998 ) First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. In addition to that, the REST API will return standby_leader role for the leader of the standby cluster. Closes https://github.com/zalando/patroni/issues/988	2019-03-29 10:28:07 +01:00
Alexander Kukushkin	680444ae13	Reduce lock time taken by dcs.get_cluster() (#989 ) `dcs.cluster` and `dcs.get_cluster()` are using the same lock resource and therefore when get_cluster call is slow due to the slowness of DCS it was also affecting the `dcs.cluster` call, which in return was making health-check requests slow.	2019-03-12 22:37:11 +01:00
Alexander Kukushkin	92720882aa	Reset is_leader flag for every removal of leader key (#990 ) This is the next improvement of #777	2019-03-12 22:10:46 +01:00
Alexander Kukushkin	13c88e8b7a	Replace self-execute with multiprocessing.Process (#994 ) In addition to that transfer postmaster pid to Patroni process with the help of multiprocessing.Pipe instead of using stdin-stdout pipes. Closes https://github.com/zalando/patroni/issues/992	2019-03-12 10:40:37 +01:00
Alexander Kukushkin	4a4258fc3f	Mock external resources (#995 ) unit tests should not accidentally hit running Postgres, DCS or filesystem unless we want it explicitly.	2019-03-12 10:39:42 +01:00
Alexander Kukushkin	c64d51f79c	Better support for static etcd cluster (#986 ) if the `etcd.use_proxies` is set to true, Patroni will stick to the list of hosts specified in the `etcd.hosts` and avoid doing topology discovery. Such mode might be useful when you know that you connect to the etcd cluster via the set of proxies or when th etcd cluster has static topology.	2019-03-07 11:36:36 +01:00
Alexander Kukushkin	e670122c80	Use create_replica_methods from standby_cluster for replica bootstrap (#981 ) It might happen that the standby cluster is configured to be created and replay WAL files from the different source than when it is running not in standby mode. This is necessary to avoid writing WAL files and backups into the old place after promotion. The easiest way to achieve such behavior is passing RemoteMember object to `Postgresql.clone` method instead of the usual Member object.	2019-02-21 11:37:50 +01:00
Alexander Kukushkin	0c516de147	Create headless service associated with $SCOPE-config endpoint (#958 ) if there is no service defined k8s assumes that endpoint is orphaned and removes it. Patroni tries to create the service only in case if use_endpoints is enabled if the following cases: 1. Upon start 2. When it tries to (re-)create the config endpoint If for some reason creation of the service has failed, Patroni will retry it on every cycle of HA loop. Usually it fails due to lack of permissions and if you don't want to give such permissions to the service account used by Patroni, you can create the service explicitly in the deployment manifest.	2019-02-15 13:35:04 +01:00
Alexander Kukushkin	739329b590	Make it possible to automatically reinit the former master (#948 ) If the pg_rewind is disabled or can't be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing. So far Patroni was able to remove the data directory only after failed attempt to run pg_rewind. This commit fixes it. If the `postgresql.remove_data_directory_on_diverged_timelines` is set, Patroni will wipe the data directory and reinitialize the former master automatically. Fixes: https://github.com/zalando/patroni/issues/941	2019-01-30 12:37:21 +01:00
Alexander Kukushkin	2c128520cf	Python34 compatibility (#933 ) and some other minor fixes. Closes https://github.com/zalando/patroni/issues/932	2019-01-16 14:40:05 +01:00
Alexander Kukushkin	381a5b80d2	Release 1.5.4 (#931 ) * Bump version * Update release notes * Make it possible to configure registration of Service in Consul via env variables	2019-01-15 12:14:19 +01:00
Alexander Kukushkin	71dae6a905	Optionally consider node not healthy if it is not on the latest timeline (#892 ) The latest timeline is calculated from the `/history` key in DCS. In case there is no such key or it contains some garbage we consider the node healthy. Closes https://github.com/zalando/patroni/issues/890	2019-01-15 11:16:30 +01:00
Alexander Kukushkin	e080ded44b	Make logging configurable via YAML file (#927 ) It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process. Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs. Example of log section in the config file: ```yaml log: dir: /where/to/write/patroni/logs # if not specified, write logs to stderr file_size: 50000000 # 50MB file_num: 10 # keep history of 10 files dateformat: '%Y-%m-%d %H:%M:%S' loggers: # increase log verbosity for etcd.client and urllib3 etcd.client: DEBUG urllib3: DEBUG ```	2019-01-15 08:42:13 +01:00
Alexander Kukushkin	994863c18d	Refactor wait_for_user_backends_to_close method (#917 ) 1. Log only debug level messages on any kind of error 2. Update regexp for matching postgres aux processes to make it compatible with postgres 11 Fixes https://github.com/zalando/patroni/issues/914	2019-01-14 14:55:45 +01:00
Dmitry Dolgov	11f7ceb521	Do not check types of standby_cluster configuration (#924 ) Simply allow valid keys	2019-01-14 14:16:15 +01:00
Alexander Kukushkin	f1d7ccf36e	Make sure we refresh session at least once per HA loop (#880 ) Fixes https://github.com/zalando/patroni/issues/879	2018-12-03 16:35:14 +01:00
Alexander Kukushkin	9bf074acfb	Compatibility with python3 (#883 ) Change of `loop_wait` was causing Patroni to disconnect from zookeeper and never reconnect back. The error was happening only with python3 due to a difference in implementation of `select.select` function.	2018-11-30 11:40:34 +01:00
Alexander Kukushkin	fb01aaebc5	Compatibility with kazoo-2.6.0 (#872 ) Recently 2.6.0 was release which changes the way how create_connection method is called. Before it was passing two arguments, and in the new version all argument names are specified explicitly.	2018-11-19 14:26:20 +01:00
Alexander Kukushkin	0f666e69f3	Prefix system tables, views and functions with pg_catalog (#845 ) and implement missing unit tests	2018-11-01 16:17:40 +01:00
Alexander Kukushkin	2efd97baab	Permanent replication slots (#819 ) Permanent replication slots are preserved on failover/switchover, that is Patroni on the new primary will create configured replication slots right after doing promote. Slots could be configured with the help of `patronictl edit-config`. The initial configuration could be also done in the `bootstrap.dcs` ```yaml slots: permanent_physical_1: type: physical permanent_logical_1: type: logical database: foo plugin: pgoutput ``` It is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots. Closes https://github.com/zalando/patroni/issues/656	2018-10-31 11:37:42 +01:00
Alexander Kukushkin	f70edefc65	A few bugfixes in the "standby cluster" workflow (#823 ) * Always run `pg_rewind` against the remote master * Always use the remote master as the source when "recovering" stopped standby leader * Use remote master as the source when "recovering" the node in the unhealthy cluster * Use the local dbname as the fallback when doing `pg_rewind` from the remote master * `no_replication_slot` is the allowed key in the `RemoteMember` object * Make it possible to "bootstrap" the new `standby_cluster` with existing (and valid) data directory. There is one prerequisite though, there should be no `patroni.dynamic.json` file in it!	2018-10-09 13:30:48 +02:00
Alexander Kukushkin	76d1b4cfd8	Minor fixes (#808 ) * Use `shutil.move` instead of `os.replace`, which is available only from 3.3 * Introduce standby-leader health-check and consul service * Improve unit tests, some lines were not covered * rename `assertEquals` -> `assertEqual`, due to deprecation warning	2018-09-19 16:32:33 +02:00
Pavel Kirillov	2e9cb412e4	Register service in consul (#802 ) Кegister service 'scope_name' with tag 'master' or 'replica' example with scope 'pgsql-pgpi' ```[root@pgpi1 ~]# host -t SRV pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul. pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV master.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: master.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi2.node.dc.consul. [root@pgpi1 ~]# host -t SRV replica.pgsql-pgpi.service.consul. 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: replica.pgsql-pgpi.service.consul has SRV record 1 1 5432 pgpi1.node.dc.consul.``` Fixes: https://github.com/zalando/patroni/issues/771	2018-09-07 15:17:56 +02:00
Dmitry Dolgov	dd7c3c349f	[WIP] Standby cluster implementation (#679 ) Implementation of "standby cluster" described in #657. Standby cluster consists of a "standby leader", that replicates from a "remote master" (which is not a part of current patroni cluster and can be anywhere), and cascade replicas, that replicate from the corresponding standby leader. "Standby leader" behaves pretty much like a regular leader, which means that it holds a leader lock in DSC, in case if disappears there will be an election of a new "standby leader". One can define such a cluster using the section "standby_cluster" in patroni config file. This section provides parameters for standby cluster, that will be applied only once during bootstrap and can be changed only through DSC.	2018-09-07 10:10:56 +02:00
Alexander Kukushkin	4ca8a6e506	Make retries of calls to DCS consistent across implementations (#805 ) in addition to that do a small refactoring of zookeeper and consul and try to improve the stability of AT	2018-09-06 08:37:26 +02:00
wilfriedroset	0136f252ab	Add patronictl -k/--insecure flag and suport for restapi cert (#790 ) Fixes https://github.com/zalando/patroni/issues/785	2018-08-29 16:08:13 +02:00
Alexander Kukushkin	90cf930036	Refactor REST API health-checks (#779 ) Make it more readable and easy to understand. Mostly it is needed to implement https://github.com/zalando/patroni/issues/772	2018-08-29 11:35:22 +02:00
Alexander Kukushkin	87e9aab04c	Improve tests (#778 ) * Implement missing unit-tests * Add acceptance tests for ISSUE #776 * Update list of classifiers, keywords and authors	2018-08-29 11:29:37 +02:00
Alexander Kukushkin	0c1ae6fbeb	Respond 200 to master health-check only if update_lock was successful (#713 ) If Patroni gets partitioned it starts receiving stale information from DCS. We can't use this information to determine that we have the leader key. Instead, we will record in Ha object the actual state of acquire/update lock and report as a leader only if it was successful. P.S. despite responding with 200 on `GET /master` postgres was still running read-only.	2018-08-03 17:00:01 +02:00
Alexander Kukushkin	8a3b78ca7b	Rest api thread can raise an exception during shutdown (#711 ) catch it and report	2018-06-14 13:17:50 +02:00
Dmitry Dolgov	f0d23b0b14	Merge pull request #706 from zalando/feature/rename-create-replica-method Rename create_replica_method to create_replica_methods	2018-06-12 14:16:54 +02:00

1 2 3 4 5 ...

568 Commits