Commit Graph

85 Commits

Author SHA1 Message Date
Alexander Kukushkin
0a1d9b0a25 Get rid from distutils module dependency (#1146)
We are using only one function from there, `find_executable()` and it is better to implement a similar function in Patroni rather than add `distutils` module into requirements.txt
2019-08-26 09:38:47 +02:00
Alexander Kukushkin
278bf9852b Release 1.6.0 (#1131)
* Implement missing tests and do a few minor fixes
* Bump version to 1.6.0
* Update release notes
2019-08-05 15:08:04 +02:00
Rafia Sabih
5cc3afc037 Enhance dialogues for scheduled switchover and restart (#1119)
Enhance dialogue for switchover and restart

In case of schedule switchover or restart, mention time if any, when confirming.
2019-08-02 11:21:26 +02:00
Alexander Kukushkin
a4bd6a9b4b Refactor postgresql class (#1060)
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package

This is just a careful refactoring, without code changes.
2019-05-21 16:02:47 +02:00
Pavlo Golub
b53a29c022 Fix unit-tests for Windows (#1014)
Closes #1013
2019-04-02 13:58:17 +02:00
Alexander Kukushkin
4a4258fc3f Mock external resources (#995)
unit tests should not accidentally hit running Postgres, DCS or filesystem unless we want it explicitly.
2019-03-12 10:39:42 +01:00
Alexander Kukushkin
0f666e69f3 Prefix system tables, views and functions with pg_catalog (#845)
and implement missing unit tests
2018-11-01 16:17:40 +01:00
Alexander Kukushkin
76d1b4cfd8 Minor fixes (#808)
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
*  Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
2018-09-19 16:32:33 +02:00
Dmitry Dolgov
dd7c3c349f [WIP] Standby cluster implementation (#679)
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
2018-09-07 10:10:56 +02:00
wilfriedroset
0136f252ab Add patronictl -k/--insecure flag and suport for restapi cert (#790)
Fixes https://github.com/zalando/patroni/issues/785
2018-08-29 16:08:13 +02:00
Alexander Kukushkin
87e9aab04c Improve tests (#778)
* Implement missing unit-tests
* Add acceptance tests for ISSUE #776
* Update list of classifiers, keywords and authors
2018-08-29 11:29:37 +02:00
Alexander Kukushkin
a1e5c8e1cb A few iprovements in patronictl (#601)
* make switchover work with an old patroni
* exclude leader from candidates when interactively running failover
2018-01-17 15:33:08 +01:00
Alexander Kukushkin
18786464a1 Rename failover to switchover and make new failover work without leader (#588)
In addition to that implement /switchover endpoint as an alias to /failover endpoint and implement more checks like:
* candidate must be provided for a failover
* switchover can't be scheduled in a pause state
* and so on

Fixes https://github.com/zalando/patroni/issues/585
Fixes https://github.com/zalando/patroni/issues/520
2018-01-05 15:17:56 +01:00
Alexander Kukushkin
3a96ffa718 Expose pause state of every member to DCS and via REST (#592)
and implement patronictl pause|resume --wait on top of that

Fixes https://github.com/zalando/patroni/issues/349
2018-01-05 15:16:45 +01:00
Alexander Kukushkin
6b01d2787f More improvements in patronictl (#590)
Make specifying cluster_name optional for some more commands.
If it is not specified, it's value would be taken from config file.
2018-01-04 12:26:13 +01:00
Ants Aasma
15d1767402 Some improvements to patronictl (#571)
* Use scope from config file when listing members

* Add version command to patronictl

* Only delete leader on shutdown when we have the lock to avoid exceptions when leader key does not exist

* Add a timestamp option to list command.

* YAML format for patronictl output

* Fix API request to get version
2018-01-04 10:35:22 +01:00
Alexander Kukushkin
0e01bb33bb Improve patronictl reinit (#576)
Make it possible to cancel a running task if you want to reinitialize replica.
There are two possible ways to trigger it:
1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed
2. if you are using `--force` argument with `patronictl reinit`
2018-01-04 10:31:44 +01:00
Alexander Kukushkin
bd847fd2cc Patronictl extended info (#567)
* Show information about scheduled failover and maintenance mode when showing list of cluster members. Fixes https://github.com/zalando/patroni/issues/557

* Fix postgres version check functions (postgres 10 and above compatibility) and apply pep8 formatting to the tests.
* Bump some configuration parameters to match with postgres 10 defaults.
* Fix name of contributor in release notes.
2017-11-28 12:10:05 +01:00
Alexander Kukushkin
e3a01727a9 Implement missing tests and add pg-10 support to wale_restore(#446)
in addition to that get rid from two modules and fix formatting of tests
2017-05-22 12:01:02 +02:00
Ants Aasma
644b741969 Add config editing to patronictl (#428)
Current UI to change cluster configuration is somewhat unfriendly, involving a curl command, knowing the REST API endpoint, knowing the specific syntax to call it with and writing a JSON document. I added two commands in this branch to make this a bit easier, `show-config` and `edit-config` (names are merely placeholders, any opinions on better ones?).

* `patronictl show-config clustername` fetches the config from DCS, formats it as YAML and outputs it.

* `patronictl edit-config clustername` fetches the config, formats it as YAML, invokes $EDITOR on it, then shows user the diff and after confirmation applies the changed config to DCS, guarding for concurrent modifications.

* `patronictl edit-config clustername --set synchronous_mode=true --set postgresql.use_slots=true` will set the specific key-value pairs.

There are also some UI capabilities I'm less sure of, but included them here as I already implemented them.

* If output is a tty then the diffs are colored. I'm not sure if this feature is cool enough to pull the weight of adding a dependency on cdiff. Or maybe someone knows of another more task focused diff coloring library?
* `patronictl edit-config clustername --pg work_mem=100MB` - Shorthand for `--set postgresql.parameters.work_mem=100MB`
* `patronictl edit-config clustername --apply changes.yaml` - apply changes from a yaml file.
* `patronictl edit-config clustername --replace new-config.yaml` - replace config with new version.
2017-05-19 16:25:21 +02:00
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Alexander Kukushkin
038b5aed72 Improve leader watch functionality (#356)
Previously replicas were always watching for leader key (even if the
postgres was not in the running there). It was not a big issue, but it
was not possible to interrupt such watch in cases if the postgres
started up or stopped successfully. Also it was delaying update_member
call and we had kind of stale information in DCS up to `loop_wait`
seconds. This commit changes such behavior. If the async_executor is
busy by starting/stopping or restarting postgres we will not watch for
leader key but waiting for event from async_executor up to `loop_wait`
seconds. Async executor will fire such event only in case if the
function it was calling returned something what could be evaluated to
boolean True.

Such functionality is really needed to change the way how we are making
decision about necessity of pg_rewind. It will require to have a local
postgres running and for us it is really important to get such
notification as soon as possible.
2016-11-22 16:22:30 +01:00
Alexander Kukushkin
66543f41a3 BUGFIX: Try all cluster members when doing pause/resume (#351)
Previously it was not possible to pause/resume unhealthy cluster.
2016-11-04 18:43:19 +02:00
Alexander Kukushkin
37b020e7a3 Various bugfixes and improvements: (#346)
* Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb...

* fix check of python version: 0x0300000 => 0x3000000

* Update leader key before restart and demote
2016-11-04 18:42:56 +02:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alexander Kukushkin
67c4b6b105 Make --dcs and --config-file options global (#322) 2016-09-27 16:25:24 +02:00
Alexander Kukushkin
0b1bfeca5b Make sure that we are running and testing latest versions of everything (#303) 2016-09-19 13:32:53 +02:00
Murat Kabilov
799d4c9bb8 Disable command renamed to pause 2016-08-29 14:30:19 +02:00
Murat Kabilov
22e4af3fb1 Fix failover in the paused state 2016-08-29 12:04:30 +02:00
Murat Kabilov
3d1fe3fa49 Introduce is_paused method in the Cluster 2016-08-29 09:29:49 +02:00
Murat Kabilov
89ef5da5ae Add tests for api; add checks for ctl and api for the paused state case 2016-08-29 08:36:35 +02:00
Murat Kabilov
4e61ef06a8 Add coverage in requirements
Add some tests for patroni ctl
2016-08-24 18:08:23 +02:00
Alexander Kukushkin
fa7aa71092 Always call on_start callback when starting Patroni (#262)
When Patroni was "joining" already running postgres it was not calling
callbacks, what in some cases causing issues (callback could be used to
change routing/load-balancer or assign/remove floating (service) ip.

In addition to that we should `start` postgres instead of `restart`-ing
it when doing recovery, because in this case 'on_start' callback should
be called, instead of 'on_restart'
2016-08-18 09:35:13 +02:00
Oleksii Kliukin
179131893e Merge branch 'master' into feature/ctl_scaffolding 2016-08-10 11:49:08 +02:00
Alexander Kukushkin
8ef7178ddf Refactor code dealing with database connection string/params (#255)
In the original code we were parsing/deparsing url-style connection
strings back and forth. That was not really resource greedy but rather
annoying. Also it was not really obvious how to switch all local
connections to unix-sockets (preferably).

This commit isolates different use-cases of working with connection
strings and minimizes amount of code parsing and deparsing them. Also it
introduces one new helper method in the `Member` object - `conn_kwargs`.
This method can accept as a parameter dict object with credentials
(username and password). As a result it returns dict object which could
be used by `psycopg2.connect` or for building connection urls for
pg_rewind, pg_basebackup or some other replica creation methods.

Params for local connection are builded in the `_local_connect_kwargs`
method and could be changed to unix-socket later easily.
2016-08-10 10:19:52 +02:00
Alexander Kukushkin
413a84836b Update etcd topology only after original request succeed (#254)
There is no point to try to update topology until original request is
not performed. Also for us it is more important to execute original
request rather then keep topology of etcd cluster in sync.

In addition to that implement the same retry-timeout logic in the
`machines` property which already is used in `api_execute` method.
2016-08-10 10:17:37 +02:00
Murat Kabilov
a47a2bceff Manage scheduled restarts using patronictl (#248)
Manage scheduled restarts using patronictl
2016-08-09 12:54:48 +02:00
Oleksii Kliukin
ac7abfdd74 Minor fixes, address final rounds of code review. 2016-08-09 10:00:46 +02:00
Oleksii Kliukin
9fd01f6af4 Remove unused imports. 2016-08-08 16:48:14 +02:00
Oleksii Kliukin
d9102d2703 Remove the necessity of creating a RESTAPI object.
- We don't want to export RestApi object, since it initializes the
  socket and listens on it.
- Change get_dcs, so that the explicit scope passed to it will take
  priority over the one in the configuration file.
2016-08-08 16:15:57 +02:00
Oleksii Kliukin
53f991df0f More code-review related fixes
- Add missing delete_cluster.
- Simplify parts of the code by removing exception handlers where
  they are not needed.
- Fix typos.
2016-08-08 15:30:33 +02:00
Oleksii Kliukin
eeb8f1b694 Further address code reviews.
- Fix the issue in ctl that would result in setting the  listen_address to True.
- Minor stylistic issues.
- Add unit-tests.
2016-08-08 12:21:01 +02:00
Alexander Kukushkin
17f317665f Merge pull request #221 from zalando/feature/patronictl-auth
patronictl will send authorization header if it is configured
2016-06-16 12:57:14 +02:00
Alexander Kukushkin
010a2961cb Merge pull request #220 from zalando/feature/patronictl-newconf
Feature/patronictl newconf
2016-06-16 12:56:47 +02:00
Alexander Kukushkin
9f5276dd2b patronictl will send authorization header if it is configured
username:password can be configured in the 'restapi' section of config
file or via environment
2016-06-16 12:16:16 +02:00
Alexander Kukushkin
bd6070e2b0 Make patronictl use config.py for loading config_file
config.py is not only loading config_file but also can build
configuration from environment variables.
2016-06-16 08:50:44 +02:00
Alexander Kukushkin
57807ff337 Don't expose replication user/passwd in DCS 2016-06-15 09:34:04 +02:00
Alexander Kukushkin
5f4e582660 Merge branch 'master' of github.com:zalando/patroni into feature/dynamic-configuration 2016-06-09 11:04:28 +02:00
Alexander Kukushkin
50d118c3aa Split ZooKeeper and Exhibitor
Originally Exhibitor was supported in the ZooKeeper class and
configuration for Exhibitor was taken also from `zookeeper` section in
the yaml config file. In fact, Exhibitor just extends ZooKeeper and now
it is reflected in the code and also Exhibitor got it's own section in
the config.yaml file. It will make it easier to configure Exhibitor
hosts and port via environment variables when PR#211 will be merged.
2016-06-08 19:21:18 +02:00
Alexander Kukushkin
b3ada161cf Implement possibility to configure retry_timeout globally
Previously it was hardcoded all over the place.
2016-05-31 10:30:53 +02:00