patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	278bf9852b	Release 1.6.0 (#1131 ) * Implement missing tests and do a few minor fixes * Bump version to 1.6.0 * Update release notes	2019-08-05 15:08:04 +02:00
Don Seiler	5cb7d1bdc1	Grammar fixes for SETTINGS.rst (#1106 )	2019-07-26 09:34:42 +02:00
Jan Tomsa	7d1a5cad03	Allow to specify consul consistency mode (#1094 ) Allow users to specify consul consistency mode. This option will be passed to the Consul client as kwargs https://github.com/zalando/patroni/blob/master/patroni/dcs/consul.py#L213. The library will then enforce the selected consistency level https://python-consul.readthedocs.io/en/latest/#consul More about consistency mode here https://www.consul.io/api/features/consistency.html	2019-07-01 11:02:26 +02:00
Alexander Kukushkin	37f03790cc	Implement two-step logging (#1080 ) A few times we observed that Patroni HA loop was blocked for a few minutes due to not being able to write logs to stderr. This is a very rare condition which we hit so far only on k8s. This commit makes Patroni resilient to such kind of problems. All log messages first are written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread. The maximum queue size is configurable and the default value is 1000. This should be enough to keep more than one hour of log messages with default settings and when Patroni cluster operates normally (without big issues). In case if we hit the maximum size of the queue further logs will be discarded until the queue size will be reduced. The number of discarded messages will be reported into the log later. In addition to that, the number of non-flushed and discarded messages (if there are any), will be reported via Patroni REST API as: ```json "logger_queue_size": X, "logger_records_lost": Y` ```	2019-06-13 14:18:49 +02:00
Alexander Kukushkin	bba9066315	Make it possible to run pg_rewind without superuser on pg11+ (#1035 ) * expose the current patroni version in DCS * expose `checkpoint_after_promote` flag in DCS as an indicator that pg_rewind could be safely executed * other nodes will wait until this flag is set instead of connecting as superuser and issuing the CHECKPOINT * define `postgresql.authention.rewind` with credentials for pg_rewind in patroni configuration files. * create user for pg_rewind if postgres is 11+ * grant execute on functions required for pg_rewind to rewind user	2019-05-02 14:07:26 +02:00
Alexander Kukushkin	f0b784fe7f	Manage pg_ident.conf with Patroni (#1037 ) This functionality works similarly to the `pg_hba`: If the `postgresql.pg_ident` is defined in the config file or DCS, Patroni will write its value to pg_ident.conf, however, if `postgresql.parameters.ident_file` is defined, Patroni will assume that pg_ident is managed from outside and not update the file.	2019-04-23 16:16:53 +02:00
Alexander Kukushkin	e38fe78b56	Fix callbacks behavior (mostly for standby cluster) (#998 ) First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument. In addition to that, the REST API will return standby_leader role for the leader of the standby cluster. Closes https://github.com/zalando/patroni/issues/988	2019-03-29 10:28:07 +01:00
Alexander Kukushkin	c64d51f79c	Better support for static etcd cluster (#986 ) if the `etcd.use_proxies` is set to true, Patroni will stick to the list of hosts specified in the `etcd.hosts` and avoid doing topology discovery. Such mode might be useful when you know that you connect to the etcd cluster via the set of proxies or when th etcd cluster has static topology.	2019-03-07 11:36:36 +01:00
Alexander Kukushkin	739329b590	Make it possible to automatically reinit the former master (#948 ) If the pg_rewind is disabled or can't be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing. So far Patroni was able to remove the data directory only after failed attempt to run pg_rewind. This commit fixes it. If the `postgresql.remove_data_directory_on_diverged_timelines` is set, Patroni will wipe the data directory and reinitialize the former master automatically. Fixes: https://github.com/zalando/patroni/issues/941	2019-01-30 12:37:21 +01:00
Étienne M	bd2c54581a	Add ETCD_(PROTOCOL\|USERNAME\|PASSWORD) env variables (#947 ) Fix #944	2019-01-30 12:36:50 +01:00
Étienne M	93d157dea3	Document how to start Patroni with an existing data directory (#918 )	2019-01-30 12:35:57 +01:00
Alexander Kukushkin	e080ded44b	Make logging configurable via YAML file (#927 ) It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process. Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs. Example of log section in the config file: ```yaml log: dir: /where/to/write/patroni/logs # if not specified, write logs to stderr file_size: 50000000 # 50MB file_num: 10 # keep history of 10 files dateformat: '%Y-%m-%d %H:%M:%S' loggers: # increase log verbosity for etcd.client and urllib3 etcd.client: DEBUG urllib3: DEBUG ```	2019-01-15 08:42:13 +01:00
Kostiantyn Nemchenko	ce9c7bdadc	Describe Consul registration parameters (#870 ) Mention Consul related parameters introduced by #802	2018-11-21 12:00:40 +01:00
alago197	a13cc8b847	Update SETTINGS.rst with connect_address clarifications (#858 )	2018-11-12 16:56:51 +01:00
Alexander Kukushkin	2efd97baab	Permanent replication slots (#819 ) Permanent replication slots are preserved on failover/switchover, that is Patroni on the new primary will create configured replication slots right after doing promote. Slots could be configured with the help of `patronictl edit-config`. The initial configuration could be also done in the `bootstrap.dcs` ```yaml slots: permanent_physical_1: type: physical permanent_logical_1: type: logical database: foo plugin: pgoutput ``` It is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots. Closes https://github.com/zalando/patroni/issues/656	2018-10-31 11:37:42 +01:00
Alexander Kukushkin	534829d617	Release 1.5.0 (#809 ) Update release notes and bump version	2018-09-20 16:29:00 +02:00
Dmitry Dolgov	dd7c3c349f	[WIP] Standby cluster implementation (#679 ) Implementation of "standby cluster" described in #657. Standby cluster consists of a "standby leader", that replicates from a "remote master" (which is not a part of current patroni cluster and can be anywhere), and cascade replicas, that replicate from the corresponding standby leader. "Standby leader" behaves pretty much like a regular leader, which means that it holds a leader lock in DSC, in case if disappears there will be an election of a new "standby leader". One can define such a cluster using the section "standby_cluster" in patroni config file. This section provides parameters for standby cluster, that will be applied only once during bootstrap and can be changed only through DSC.	2018-09-07 10:10:56 +02:00
wilfriedroset	0136f252ab	Add patronictl -k/--insecure flag and suport for restapi cert (#790 ) Fixes https://github.com/zalando/patroni/issues/785	2018-08-29 16:08:13 +02:00
alago197	936a4238fb	Update some descriptions for the REST API endpoints (#729 ) * Update some descriptions for the REST API endpoints By @alago197	2018-07-10 15:40:53 +02:00
Oleksii Kliukin	41e5f58f2b	Describe synchronous_mode_strict (#710 ) * Describe synchronous_mode_strict Per https://github.com/zalando/patroni/issues/709	2018-06-13 11:12:22 +02:00
erthalion	3d80e49b38	Rename also in settings docs	2018-06-12 13:28:30 +02:00
erthalion	d037aa8afd	Rename create_replica_method to create_replica_methods To make it clear that it's actually an array	2018-06-12 11:33:13 +02:00
Kostiantyn Nemchenko	3110090154	Minor corrections to the documentation. (#654 )	2018-04-16 15:46:46 +02:00
Oleksii Kliukin	4202ad853a	Minor corrections to the documentation. (#599 )	2018-01-10 16:10:12 +01:00
Oleksii Kliukin	84d804e579	Release notes 1.4 (#597 ) Document Kubernetes parameters, environment variables. Describe how Patroni uses Kubernetes.	2018-01-10 11:17:08 +01:00
Alexander Kukushkin	b6425cab85	Allow to specify multiple hosts for etcd (#589 ) This list will be used for initial discovery of etcd cluster members. If for some reason during work this list of hosts has been exhausted (during work), Patroni will return to initial list. In addition to that improve ipv6 compatibility by using a special function for splitting host and port. Fixes https://github.com/zalando/patroni/issues/523	2018-01-04 10:25:06 +01:00
Alexander Kukushkin	2e86fe5991	Consul dc (#559 ) Make it possible to specify dc for consul as PATRONI_CONSUL_DC environment variable and update documentation accordingly.	2017-11-10 11:21:47 +01:00
Alexander Kukushkin	8e9c62d002	Make it possible to change Consul session checks (#543 ) If list of checks is not specified, Consul will use "serfHealth" in addition to TTL based created by Patroni. There are some cases when people want to sacrifice fast detection of network partitioning in favor of ability to tolerate network lags. Fixes https://github.com/zalando/patroni/issues/522	2017-10-12 15:01:31 +02:00
Alexander Kukushkin	3919b322f4	Release 1.3.4 (#515 ) Fix documentation and update release notes	2017-09-08 10:56:09 +02:00
Alexander Kukushkin	5ef01cfdfa	Advanced configuration for Consul (#506 ) * possibility to specify client certs and cacert * possibility to specify token * compatibility with python-consul-0.7.1	2017-08-24 07:56:12 +02:00
Ants Aasma	70d718a058	Simplify watchdog code (#452 ) * Only activate watchdog while master and not paused We don't really need the protections while we are not master. This way we only need to tickle the watchdog when we are updating leader key or while demotion is happening. As implemented we might fail to notice to shut down the watchdog if someone demotes postgres and removes leader key behind Patroni's back. There are probably other similar cases. Basically if the administrator if being actively stupid they might get unexpected restarts. That seems fine. * Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni. Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online. When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work AND it is required. * Reset implementation when config changed while active. * Add watchdog safety margin configuration Defaults to 5 seconds. Basically this is the maximum amount of time that can pass between the calls to odcs.update_leader()` and `watchdog.keepalive()`, which are called right after each other. Should be safe for pretty much any sane scenario and allows the default settings to not trigger watchdog when DCS is not responding. * Cancel bootstrap if watchdog activation fails The system would have demoted itself anyway the next HA loop. Doing it in bootstrap gives at least some other node chance to try bootstrapping in the hope that it is configured correctly. If all nodes are unable to activate they will continue to try until the disk is filled with moved datadirs. Perhaps not ideal behavior, but as the situation is unlikely to resolve itself without administrator intervention it doesn't seem too bad.	2017-07-27 12:16:11 +02:00
Oleksii Kliukin	895eefaa51	Document bootstrapping and replica creation (#478 ) Describe parameters around custom replica creation and bootstrap	2017-07-19 12:25:50 +02:00
jouir	4ca94a5dab	Add config_dir option for configuration files location (#466 ) On debian, the configuration files (postgresql.conf, pg_hba.conf, etc) are not stored in the data directory. It would be great to be able to configure the location of this separate directory. Patroni could override existing configuration files where they are used to be. The default is to store configuration files in the data directory. This setting is targeting custom installations like debian and any others moving configuration files out of the data directory. Fixes #465	2017-07-04 16:14:17 +02:00
Alexander Kukushkin	b576e69362	Manage pg_hba.conf via patroni config or dynamic_configuration (#458 ) So far Patroni was populating pg_hba.conf only when running bootstrap code and after that it was not very handy to manage it's content, because it was necessary to login to every node, change pg_hba.conf manually and run pg_ctl reload. This commit intends to fix it and give Patroni control over pg_hba.conf. It is possible to define pg_hba.conf content via `postgresql.pg_hba` in the patroni configuration file or in the `DCS/config` (dynamic configuration). If the `hba_file` is defined in the `postgresql.parameters`, Patroni will ignore `postgresql.pg_hba`.	2017-06-23 12:38:25 +02:00
Alexander Kukushkin	681b6b507b	Support unix sockets when connecting to a local postgres cluster (#457 ) For backward compatibility this feature is not enabled by default. To enable it you have to set `postgresql.use_unix_socket: true`. If feature is enable, and `unix_socket_directories` is defined and non empty, Patroni will use the first suitable value from it to connect to the local postgres cluster. If the `unix_socket_directories` is not defined, Patroni will assume that default value should be used and will not pass `host` to command line arguments and omit it from connection url. Solves: https://github.com/zalando/patroni/issues/61 In addition to mentioned above, this commit solves couple of bugs: * manual failover with pg_rewind in a pause state was broken * psycopg2 (or libpq, I am not really sure what exactly) doesn't mark cursors connection as closed when we use unix socket and there is an `OperationalError` occurs. We will close such connection on our own.	2017-06-22 11:47:57 +02:00
Oleksii Kliukin	fb89e75ce4	Make patroni documentation available on patroni.readthedocs.io. (#373 ) Run sphnix-quickstart and some workarounds. Sphinx is a logical choice because our docs is already in .rst.	2016-12-20 18:22:57 +01:00
Alexander Kukushkin	d138a8db17	AT for master_start_timeout + minor fixes (#361 )	2016-12-09 12:02:41 +01:00
Ants Aasma	1290b30b84	Introduce starting state and master start timeout. (#295 ) Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind. This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests. * Introduce timeout parameter to restart. When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.	2016-12-08 14:44:27 +01:00
Alexander Kukushkin	b299b12f58	Varios configuration parameters for etcd (#358 ) * Add https and auth support for etcd Also implement support of PATRONI_ETCD_URL and PATRONI_ETCD_SRV environment variables * Implement etcd.proxy etcd.cacert, etcd.cert and etcd.key support Now it should be possible to set up fully encrypted connection to etcd with authorization.	2016-12-06 16:40:21 +01:00
Ants Aasma	7e53a604d4	Add synchronous replication support. (#314 ) Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions. To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over. We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions. On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS. When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock. Added acceptance tests and documentation. Implementation by @ants with extensive review by @CyberDem0n.	2016-10-19 16:12:51 +02:00
Alejandro Martínez	48a6af6994	Add post_init configuration parameter on bootstrap (#296 ) * Add bootstrap post_init configuration parameter * Add documentation By @zenitraM	2016-09-28 15:42:23 +02:00
Alejandro Martínez	f58ff3a96f	Document custom_conf parameter	2016-09-01 17:59:47 +02:00
Alejandro Martínez	1fb562e118	Add custom_conf parameter documentation	2016-08-31 15:38:42 +02:00
Ants Aasma	494887f47e	Enable configuration of PostgreSQL binary locations. (#263 ) Adds a bin_dir parameter to PostgreSQL settings that will be prefixed to all command invocations.	2016-08-18 14:06:11 +02:00
Alexander Kukushkin	659f7617f5	New option: remove_data_directory_on_rewind_failure One more try to fix pg_rewind	2016-07-05 12:11:15 +02:00
Alexander Kukushkin	0318749b56	bugfix: api must report role=master during pg_ctl stop In addition for that make pg_ctl --timeout option configurable. If the stop or start didn't succeeded during given timeout when demoting master, role will be forcibly changed to 'unknown' and all needed callbacks executed.	2016-06-28 14:14:42 +02:00
Alexander Kukushkin	25f20ca7d7	Fix documentation	2016-06-14 10:13:47 +02:00
Alexander Kukushkin	7244739e26	Fix link to the libpq-pgpass.html	2016-06-09 12:10:37 +02:00
Alexander Kukushkin	5314433b70	Merge branch 'feature/dynamic-configuration' of github.com:zalando/patroni into feature/environment-configuration	2016-06-09 11:09:30 +02:00
Alexander Kukushkin	23c5040ce5	Update documentation	2016-06-08 12:35:53 +02:00

1 2

51 Commits