patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	0a1d9b0a25	Get rid from distutils module dependency (#1146 ) We are using only one function from there, `find_executable()` and it is better to implement a similar function in Patroni rather than add `distutils` module into requirements.txt	2019-08-26 09:38:47 +02:00
Alexander Kukushkin	278bf9852b	Release 1.6.0 (#1131 ) * Implement missing tests and do a few minor fixes * Bump version to 1.6.0 * Update release notes	2019-08-05 15:08:04 +02:00
Rafia Sabih	5cc3afc037	Enhance dialogues for scheduled switchover and restart (#1119 ) Enhance dialogue for switchover and restart In case of schedule switchover or restart, mention time if any, when confirming.	2019-08-02 11:21:26 +02:00
Alexander Kukushkin	a4bd6a9b4b	Refactor postgresql class (#1060 ) * Convert postgresql.py into a package * Factor out cancellable process into a separate class * Factor out connection handler into a separate class * Move postmaster into postgresql package * Factor out pg_rewind into a separate class * Factor out bootstrap into a separate class * Factor out slots handler into a separate class * Factor out postgresql config handler into a separate class * Move callback_executor into postgresql package This is just a careful refactoring, without code changes.	2019-05-21 16:02:47 +02:00
Pavlo Golub	b53a29c022	Fix unit-tests for Windows (#1014 ) Closes #1013	2019-04-02 13:58:17 +02:00
Alexander Kukushkin	4a4258fc3f	Mock external resources (#995 ) unit tests should not accidentally hit running Postgres, DCS or filesystem unless we want it explicitly.	2019-03-12 10:39:42 +01:00
Alexander Kukushkin	0f666e69f3	Prefix system tables, views and functions with pg_catalog (#845 ) and implement missing unit tests	2018-11-01 16:17:40 +01:00
Alexander Kukushkin	76d1b4cfd8	Minor fixes (#808 ) * Use `shutil.move` instead of `os.replace`, which is available only from 3.3 * Introduce standby-leader health-check and consul service * Improve unit tests, some lines were not covered * rename `assertEquals` -> `assertEqual`, due to deprecation warning	2018-09-19 16:32:33 +02:00
Dmitry Dolgov	dd7c3c349f	[WIP] Standby cluster implementation (#679 ) Implementation of "standby cluster" described in #657. Standby cluster consists of a "standby leader", that replicates from a "remote master" (which is not a part of current patroni cluster and can be anywhere), and cascade replicas, that replicate from the corresponding standby leader. "Standby leader" behaves pretty much like a regular leader, which means that it holds a leader lock in DSC, in case if disappears there will be an election of a new "standby leader". One can define such a cluster using the section "standby_cluster" in patroni config file. This section provides parameters for standby cluster, that will be applied only once during bootstrap and can be changed only through DSC.	2018-09-07 10:10:56 +02:00
wilfriedroset	0136f252ab	Add patronictl -k/--insecure flag and suport for restapi cert (#790 ) Fixes https://github.com/zalando/patroni/issues/785	2018-08-29 16:08:13 +02:00
Alexander Kukushkin	87e9aab04c	Improve tests (#778 ) * Implement missing unit-tests * Add acceptance tests for ISSUE #776 * Update list of classifiers, keywords and authors	2018-08-29 11:29:37 +02:00
Alexander Kukushkin	a1e5c8e1cb	A few iprovements in patronictl (#601 ) * make switchover work with an old patroni * exclude leader from candidates when interactively running failover	2018-01-17 15:33:08 +01:00
Alexander Kukushkin	18786464a1	Rename failover to switchover and make new failover work without leader (#588 ) In addition to that implement /switchover endpoint as an alias to /failover endpoint and implement more checks like: * candidate must be provided for a failover * switchover can't be scheduled in a pause state * and so on Fixes https://github.com/zalando/patroni/issues/585 Fixes https://github.com/zalando/patroni/issues/520	2018-01-05 15:17:56 +01:00
Alexander Kukushkin	3a96ffa718	Expose pause state of every member to DCS and via REST (#592 ) and implement patronictl pause\|resume --wait on top of that Fixes https://github.com/zalando/patroni/issues/349	2018-01-05 15:16:45 +01:00
Alexander Kukushkin	6b01d2787f	More improvements in patronictl (#590 ) Make specifying cluster_name optional for some more commands. If it is not specified, it's value would be taken from config file.	2018-01-04 12:26:13 +01:00
Ants Aasma	15d1767402	Some improvements to patronictl (#571 ) * Use scope from config file when listing members * Add version command to patronictl * Only delete leader on shutdown when we have the lock to avoid exceptions when leader key does not exist * Add a timestamp option to list command. * YAML format for patronictl output * Fix API request to get version	2018-01-04 10:35:22 +01:00
Alexander Kukushkin	0e01bb33bb	Improve patronictl reinit (#576 ) Make it possible to cancel a running task if you want to reinitialize replica. There are two possible ways to trigger it: 1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed 2. if you are using `--force` argument with `patronictl reinit`	2018-01-04 10:31:44 +01:00
Alexander Kukushkin	bd847fd2cc	Patronictl extended info (#567 ) * Show information about scheduled failover and maintenance mode when showing list of cluster members. Fixes https://github.com/zalando/patroni/issues/557 * Fix postgres version check functions (postgres 10 and above compatibility) and apply pep8 formatting to the tests. * Bump some configuration parameters to match with postgres 10 defaults. * Fix name of contributor in release notes.	2017-11-28 12:10:05 +01:00
Alexander Kukushkin	e3a01727a9	Implement missing tests and add pg-10 support to wale_restore(#446 ) in addition to that get rid from two modules and fix formatting of tests	2017-05-22 12:01:02 +02:00
Ants Aasma	644b741969	Add config editing to patronictl (#428 ) Current UI to change cluster configuration is somewhat unfriendly, involving a curl command, knowing the REST API endpoint, knowing the specific syntax to call it with and writing a JSON document. I added two commands in this branch to make this a bit easier, `show-config` and `edit-config` (names are merely placeholders, any opinions on better ones?). * `patronictl show-config clustername` fetches the config from DCS, formats it as YAML and outputs it. * `patronictl edit-config clustername` fetches the config, formats it as YAML, invokes $EDITOR on it, then shows user the diff and after confirmation applies the changed config to DCS, guarding for concurrent modifications. * `patronictl edit-config clustername --set synchronous_mode=true --set postgresql.use_slots=true` will set the specific key-value pairs. There are also some UI capabilities I'm less sure of, but included them here as I already implemented them. * If output is a tty then the diffs are colored. I'm not sure if this feature is cool enough to pull the weight of adding a dependency on cdiff. Or maybe someone knows of another more task focused diff coloring library? * `patronictl edit-config clustername --pg work_mem=100MB` - Shorthand for `--set postgresql.parameters.work_mem=100MB` * `patronictl edit-config clustername --apply changes.yaml` - apply changes from a yaml file. * `patronictl edit-config clustername --replace new-config.yaml` - replace config with new version.	2017-05-19 16:25:21 +02:00
Ants Aasma	1290b30b84	Introduce starting state and master start timeout. (#295 ) Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind. This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests. * Introduce timeout parameter to restart. When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.	2016-12-08 14:44:27 +01:00
Alexander Kukushkin	038b5aed72	Improve leader watch functionality (#356 ) Previously replicas were always watching for leader key (even if the postgres was not in the running there). It was not a big issue, but it was not possible to interrupt such watch in cases if the postgres started up or stopped successfully. Also it was delaying update_member call and we had kind of stale information in DCS up to `loop_wait` seconds. This commit changes such behavior. If the async_executor is busy by starting/stopping or restarting postgres we will not watch for leader key but waiting for event from async_executor up to `loop_wait` seconds. Async executor will fire such event only in case if the function it was calling returned something what could be evaluated to boolean True. Such functionality is really needed to change the way how we are making decision about necessity of pg_rewind. It will require to have a local postgres running and for us it is really important to get such notification as soon as possible.	2016-11-22 16:22:30 +01:00
Alexander Kukushkin	66543f41a3	BUGFIX: Try all cluster members when doing pause/resume (#351 ) Previously it was not possible to pause/resume unhealthy cluster.	2016-11-04 18:43:19 +02:00
Alexander Kukushkin	37b020e7a3	Various bugfixes and improvements: (#346 ) * Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb... * fix check of python version: 0x0300000 => 0x3000000 * Update leader key before restart and demote	2016-11-04 18:42:56 +02:00
Ants Aasma	7e53a604d4	Add synchronous replication support. (#314 ) Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions. To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over. We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions. On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS. When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock. Added acceptance tests and documentation. Implementation by @ants with extensive review by @CyberDem0n.	2016-10-19 16:12:51 +02:00
Alexander Kukushkin	67c4b6b105	Make --dcs and --config-file options global (#322 )	2016-09-27 16:25:24 +02:00
Alexander Kukushkin	0b1bfeca5b	Make sure that we are running and testing latest versions of everything (#303 )	2016-09-19 13:32:53 +02:00
Murat Kabilov	799d4c9bb8	Disable command renamed to pause	2016-08-29 14:30:19 +02:00
Murat Kabilov	22e4af3fb1	Fix failover in the paused state	2016-08-29 12:04:30 +02:00
Murat Kabilov	3d1fe3fa49	Introduce is_paused method in the Cluster	2016-08-29 09:29:49 +02:00
Murat Kabilov	89ef5da5ae	Add tests for api; add checks for ctl and api for the paused state case	2016-08-29 08:36:35 +02:00
Murat Kabilov	4e61ef06a8	Add coverage in requirements Add some tests for patroni ctl	2016-08-24 18:08:23 +02:00
Alexander Kukushkin	fa7aa71092	Always call on_start callback when starting Patroni (#262 ) When Patroni was "joining" already running postgres it was not calling callbacks, what in some cases causing issues (callback could be used to change routing/load-balancer or assign/remove floating (service) ip. In addition to that we should `start` postgres instead of `restart`-ing it when doing recovery, because in this case 'on_start' callback should be called, instead of 'on_restart'	2016-08-18 09:35:13 +02:00
Oleksii Kliukin	179131893e	Merge branch 'master' into feature/ctl_scaffolding	2016-08-10 11:49:08 +02:00
Alexander Kukushkin	8ef7178ddf	Refactor code dealing with database connection string/params (#255 ) In the original code we were parsing/deparsing url-style connection strings back and forth. That was not really resource greedy but rather annoying. Also it was not really obvious how to switch all local connections to unix-sockets (preferably). This commit isolates different use-cases of working with connection strings and minimizes amount of code parsing and deparsing them. Also it introduces one new helper method in the `Member` object - `conn_kwargs`. This method can accept as a parameter dict object with credentials (username and password). As a result it returns dict object which could be used by `psycopg2.connect` or for building connection urls for pg_rewind, pg_basebackup or some other replica creation methods. Params for local connection are builded in the `_local_connect_kwargs` method and could be changed to unix-socket later easily.	2016-08-10 10:19:52 +02:00
Alexander Kukushkin	413a84836b	Update etcd topology only after original request succeed (#254 ) There is no point to try to update topology until original request is not performed. Also for us it is more important to execute original request rather then keep topology of etcd cluster in sync. In addition to that implement the same retry-timeout logic in the `machines` property which already is used in `api_execute` method.	2016-08-10 10:17:37 +02:00
Murat Kabilov	a47a2bceff	Manage scheduled restarts using patronictl (#248 ) Manage scheduled restarts using patronictl	2016-08-09 12:54:48 +02:00
Oleksii Kliukin	ac7abfdd74	Minor fixes, address final rounds of code review.	2016-08-09 10:00:46 +02:00
Oleksii Kliukin	9fd01f6af4	Remove unused imports.	2016-08-08 16:48:14 +02:00
Oleksii Kliukin	d9102d2703	Remove the necessity of creating a RESTAPI object. - We don't want to export RestApi object, since it initializes the socket and listens on it. - Change get_dcs, so that the explicit scope passed to it will take priority over the one in the configuration file.	2016-08-08 16:15:57 +02:00
Oleksii Kliukin	53f991df0f	More code-review related fixes - Add missing delete_cluster. - Simplify parts of the code by removing exception handlers where they are not needed. - Fix typos.	2016-08-08 15:30:33 +02:00
Oleksii Kliukin	eeb8f1b694	Further address code reviews. - Fix the issue in ctl that would result in setting the listen_address to True. - Minor stylistic issues. - Add unit-tests.	2016-08-08 12:21:01 +02:00
Alexander Kukushkin	17f317665f	Merge pull request #221 from zalando/feature/patronictl-auth patronictl will send authorization header if it is configured	2016-06-16 12:57:14 +02:00
Alexander Kukushkin	010a2961cb	Merge pull request #220 from zalando/feature/patronictl-newconf Feature/patronictl newconf	2016-06-16 12:56:47 +02:00
Alexander Kukushkin	9f5276dd2b	patronictl will send authorization header if it is configured username:password can be configured in the 'restapi' section of config file or via environment	2016-06-16 12:16:16 +02:00
Alexander Kukushkin	bd6070e2b0	Make patronictl use config.py for loading config_file config.py is not only loading config_file but also can build configuration from environment variables.	2016-06-16 08:50:44 +02:00
Alexander Kukushkin	57807ff337	Don't expose replication user/passwd in DCS	2016-06-15 09:34:04 +02:00
Alexander Kukushkin	5f4e582660	Merge branch 'master' of github.com:zalando/patroni into feature/dynamic-configuration	2016-06-09 11:04:28 +02:00
Alexander Kukushkin	50d118c3aa	Split ZooKeeper and Exhibitor Originally Exhibitor was supported in the ZooKeeper class and configuration for Exhibitor was taken also from `zookeeper` section in the yaml config file. In fact, Exhibitor just extends ZooKeeper and now it is reflected in the code and also Exhibitor got it's own section in the config.yaml file. It will make it easier to configure Exhibitor hosts and port via environment variables when PR#211 will be merged.	2016-06-08 19:21:18 +02:00
Alexander Kukushkin	b3ada161cf	Implement possibility to configure `retry_timeout` globally Previously it was hardcoded all over the place.	2016-05-31 10:30:53 +02:00

1 2

85 Commits