106 Commits

Author SHA1 Message Date
Alexander Kukushkin
fce889cd04 Compatibility with psycopg 3.0 (#2088)
By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.
2021-11-19 14:32:54 +01:00
Alexander Kukushkin
9b263dc6c9 Move find_executable to utils (#1799)
We don't want to import the whole patroni.ctl into the patroni
2020-12-16 18:58:54 +01:00
Pavlo Golub
e27ff480d0 Allow custom pager support in patronictl edit-config (#1696)
Fixes #1695
2020-09-16 15:21:52 +02:00
Alexander Kukushkin
0a1f389686 Release 2.0.0 (#1680)
* update release notes
* bump version
* change the default alignment in patronictl table output to `left`
* add missing tests
* add missing pieces to the documentation
2020-09-02 15:35:04 +02:00
Alexander Kukushkin
3341c898ff Add Etcd v3 protocol support via api gRPC-gateway (#1162)
The only python-etcd3 client working directly via gRPC still supports only a single endpoint, which is not very nice for high-availability.

Since Patroni is already using a heavily hacked version of python-etcd with smart retries and auto-discovery out-of-the-box, I decided to enhance the existing code with limited support of v3 protocol via gRPC-gateway.

Unfortunately, watches via gRPC-gateway requires us to open and keep the second connection to the etcd.

Known limitations:
* The very minimal supported version is 3.0.4. On earlier versions transactions don't work due to bugs in grpc-gateway. Without transactions we can't do atomic operations, i.e. leader locks.
* Watches work only starting from 3.1.0
* Authentication works only starting from 3.3.0
* gRPC-gateway does not support authentication using TLS Common Name. This is because gRPC-proxy terminates TLS from its client so all the clients share a cert of the proxy: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/authentication.md#using-tls-common-name
2020-07-31 14:33:40 +02:00
Alexander Kukushkin
cbff544b9c Implement patronictl flush switchover (#1554)
It includes implementing the `DELETE /switchover` REST API endpoint.

Close https://github.com/zalando/patroni/issues/1376
2020-06-25 16:27:57 +02:00
Maxim Fedotov
623b594539 patronictl add ability to print ASCII topology (#1576)
Example:
```bash
$ patronictl topology
+ Cluster: batman (6834835313225022118) -----+---------+----+-----------+------------------------------------------------+
| Member          |      Host      |   Role  |  State  | TL | Lag in MB | Tags                                           |
+-----------------+----------------+---------+---------+----+-----------+------------------------------------------------+
| postgresql0     | localhost:5432 |  Leader | running |  2 |           |                                                |
| + postgresql1   | localhost:5433 | Replica | running |  2 |       0.0 |                                                |
|   + postgresql2 | localhost:5434 | Replica | running |  2 |       0.0 | {nofailover: true, replicatefrom: postgresql1} |
+-----------------+----------------+---------+---------+----+-----------+------------------------------------------------+
```
2020-06-12 15:23:42 +02:00
Alexander Kukushkin
fe23d1f2d0 Release 1.6.5 (#1503)
* bump version
* update release notes
* implement missing unit-tests and format code.
2020-04-23 16:02:01 +02:00
Alexander Kukushkin
337f9efc9e Improve patronictl list output (#1486)
The redundant column `Column` will be presented in the table header.

Depending on output format `Tags` are serialized differently:
* For *pretty* format YAML is used, every element on the new line
* For *tsv* format for YAML is also used, but all elements and on the same line (similar to JSON)
* For *json* and *yaml* formats `Tags` are serialized into an appropriate format.

<details><summary>Examples of output in pretty formats:</summary>

```bash
$ patronictl list
+ Cluster: batman (6813309862653668387) +---------+----+-----------+---------------------+
|    Member   |      Host      |  Role  |  State  | TL | Lag in MB | Tags                |
+-------------+----------------+--------+---------+----+-----------+---------------------+
| postgresql0 | 127.0.0.1:5432 | Leader | running |  3 |           | clonefrom: true     |
|             |                |        |         |    |           | noloadbalance: true |
|             |                |        |         |    |           | nosync: true        |
+-------------+----------------+--------+---------+----+-----------+---------------------+
| postgresql1 | 127.0.0.1:5433 |        | running |  3 |       0.0 |                     |
+-------------+----------------+--------+---------+----+-----------+---------------------+

$ patronictl list badclustername
+ Cluster: badclustername (uninitialized) ------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------+------+-------+----+-----------+
+--------+------+------+-------+----+-----------+
```
</details>

<details><summary>Example in tsv format:</summary>

```bash
Cluster Member  Host    Role    State   TL      Lag in MB       Pending restart Tags
batman  postgresql0     127.0.0.1:5432  Leader  running 2
batman  postgresql1     127.0.0.1:5433          running 2       0               {clonefrom: true, nofailover: true, noloadbalance: true, replicatefrom: postgresql0}
batman  postgresql2     127.0.0.1:5434          running 2       0       *       {replicatefrom: postgres1}
```
</details>

In addition to that, `patronictl list` command will stop showing keys with empty values in `json` and `yaml` formats.
<details><summary>Examples:</summary>

```yaml
$ patronictl list -f yaml
- Cluster: batman
  Host: 127.0.0.1:5432
  Member: postgresql0
  Role: Leader
  State: running
  TL: 2
- Cluster: batman
  Host: 127.0.0.1:5433
  Lag in MB: 0
  Member: postgresql1
  State: running
  TL: 2
  Tags:
    clonefrom: true
    nofailover: true
    noloadbalance: true
    replicatefrom: postgresql0
- Cluster: batman
  Host: 127.0.0.1:5434
  Lag in MB: 0
  Member: postgresql2
  Pending restart: '*'
  State: running
  TL: 2
  Tags:
    replicatefrom: postgres1
```

```json
$ patronictl list -f json | jq .
[
  {
    "Cluster": "batman",
    "Member": "postgresql0",
    "Host": "127.0.0.1:5432",
    "Role": "Leader",
    "State": "running",
    "TL": 2
  },
  {
    "Cluster": "batman",
    "Member": "postgresql1",
    "Host": "127.0.0.1:5433",
    "State": "running",
    "TL": 2,
    "Lag in MB": 0,
    "Tags": {
      "nofailover": true,
      "noloadbalance": true,
      "replicatefrom": "postgresql0",
      "clonefrom": true
    }
  },
  {
    "Cluster": "batman",
    "Member": "postgresql2",
    "Host": "127.0.0.1:5434",
    "State": "running",
    "TL": 2,
    "Lag in MB": 0,
    "Pending restart": "*",
    "Tags": {
      "replicatefrom": "postgres1"
    }
  }
]
```
</details>
2020-04-15 12:19:18 +02:00
Alexander Kukushkin
27cda08ece Improve unit-tests (#1479)
* tests were failing on windows and macos
* improve coverage
2020-04-09 10:34:35 +02:00
Alexander Kukushkin
369a93ce2a Handle cases when conn_url is not defined (#1482)
On K8s when one of the Patroni pods in starting there is valid annotation yet, which could cause failure in patronictl.
In addition to that handle cases if port isn't specified in the standby_cluster configuration.

Close https://github.com/zalando/patroni/issues/1100
Close https://github.com/zalando/patroni/issues/1463
2020-04-09 10:34:12 +02:00
Kaarel Moppel
d58006319b Patronictl - fail if a config file is specified explicitly but not found (#1467)
$ python3 patronictl.py -c postgresql0.yml list
Error: Provided config file postgresql0.yml not existing or no read rights. Check the -c/--config-file parameter
2020-04-01 15:52:43 +02:00
Michael Banck
f419d73465 Set postgresql.pgpass to ./pgpass (#1386)
This avoids test failures if $HOME is not available (fixes: #1385).
2020-02-13 15:06:36 +01:00
Igor Yanchenko
26b6e00575 wait option for patronictl reinit implemented (#1339)
Wait to finish `reinit` if `--wait` option is used.
Every 2 seconds it pulls the status from Patroni REST API and reports to console.
2019-12-20 12:05:39 +01:00
Alexander Kukushkin
0693fe7dd0 Housekeeping (#1315)
* Reduce memory usage by patroni init process
* More cleanup in setup.py
* Implement missing tests
2019-12-04 11:28:46 +01:00
Igor Yanchenko
726ee46111 Implemented patroni --version (#1291)
That required a refactoring of `Config` and `Patroni` classes. Now one has to explicitely create the instance of `Config` before creating `Patroni`.

The Config file can optionally call the validate function.
2019-12-02 12:14:19 +01:00
Alexander Kukushkin
412c720d3a Avoid importing all DCS modules (#1286)
We will try to import only the module which has a configuration section.
I.e. if there is only zookeeper section in the config, Patroni will try to import only `patroni.dcs.zookeeper` and skip `etcd`, `consul`, and `kubernetes`.
This approach has two benefits:
1. When there are no dependencies installed Patroni was showing INFO messages `Failed to import smth`, which looks scary.
2. It reduces memory usage, because sometimes dependencies are heavy.
2019-11-21 14:39:37 +01:00
Alexander Kukushkin
2f9a48fae4 Release 1.6.1 (#1281)
* Bump version to 1.6.1
* Update release notes
2019-11-15 12:48:00 +01:00
Alexander Kukushkin
3f711650a7 Fix compatibility with python 3.4&3.5 (#1248)
Close https://github.com/zalando/patroni/issues/1247
2019-10-25 15:01:49 +02:00
Alexander Kukushkin
367d787ff9 Implement /history and /cluster endpoints (#1191)
The /history endpoint shows the content of the `history` key in DCS
The /cluster endpoint show all cluster members and some service info like pending and scheduled restarts or switchovers.

In addition to that implement `patronictl history`

Close #586
Close #675
Close #1133
2019-10-22 17:19:02 +02:00
Alexander Kukushkin
b666f5e4ed Refactor Patroni REST API communication (#1197)
* make it possible to use client certificates with REST API
* define a separate PatroniRequest class which handles all communication
* refactor patronictl to use the new class
* make Ha to use the new class instead of calling requests.get. The old call wasn't taking into account certificates and basic-auth

Close #898
2019-10-11 10:16:33 +02:00
Alexander Kukushkin
0a1d9b0a25 Get rid from distutils module dependency (#1146)
We are using only one function from there, `find_executable()` and it is better to implement a similar function in Patroni rather than add `distutils` module into requirements.txt
2019-08-26 09:38:47 +02:00
Alexander Kukushkin
278bf9852b Release 1.6.0 (#1131)
* Implement missing tests and do a few minor fixes
* Bump version to 1.6.0
* Update release notes
2019-08-05 15:08:04 +02:00
Rafia Sabih
5cc3afc037 Enhance dialogues for scheduled switchover and restart (#1119)
Enhance dialogue for switchover and restart

In case of schedule switchover or restart, mention time if any, when confirming.
2019-08-02 11:21:26 +02:00
Alexander Kukushkin
a4bd6a9b4b Refactor postgresql class (#1060)
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package

This is just a careful refactoring, without code changes.
2019-05-21 16:02:47 +02:00
Pavlo Golub
b53a29c022 Fix unit-tests for Windows (#1014)
Closes #1013
2019-04-02 13:58:17 +02:00
Alexander Kukushkin
4a4258fc3f Mock external resources (#995)
unit tests should not accidentally hit running Postgres, DCS or filesystem unless we want it explicitly.
2019-03-12 10:39:42 +01:00
Alexander Kukushkin
0f666e69f3 Prefix system tables, views and functions with pg_catalog (#845)
and implement missing unit tests
2018-11-01 16:17:40 +01:00
Alexander Kukushkin
76d1b4cfd8 Minor fixes (#808)
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
*  Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
2018-09-19 16:32:33 +02:00
Dmitry Dolgov
dd7c3c349f [WIP] Standby cluster implementation (#679)
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
2018-09-07 10:10:56 +02:00
wilfriedroset
0136f252ab Add patronictl -k/--insecure flag and suport for restapi cert (#790)
Fixes https://github.com/zalando/patroni/issues/785
2018-08-29 16:08:13 +02:00
Alexander Kukushkin
87e9aab04c Improve tests (#778)
* Implement missing unit-tests
* Add acceptance tests for ISSUE #776
* Update list of classifiers, keywords and authors
2018-08-29 11:29:37 +02:00
Alexander Kukushkin
a1e5c8e1cb A few iprovements in patronictl (#601)
* make switchover work with an old patroni
* exclude leader from candidates when interactively running failover
2018-01-17 15:33:08 +01:00
Alexander Kukushkin
18786464a1 Rename failover to switchover and make new failover work without leader (#588)
In addition to that implement /switchover endpoint as an alias to /failover endpoint and implement more checks like:
* candidate must be provided for a failover
* switchover can't be scheduled in a pause state
* and so on

Fixes https://github.com/zalando/patroni/issues/585
Fixes https://github.com/zalando/patroni/issues/520
2018-01-05 15:17:56 +01:00
Alexander Kukushkin
3a96ffa718 Expose pause state of every member to DCS and via REST (#592)
and implement patronictl pause|resume --wait on top of that

Fixes https://github.com/zalando/patroni/issues/349
2018-01-05 15:16:45 +01:00
Alexander Kukushkin
6b01d2787f More improvements in patronictl (#590)
Make specifying cluster_name optional for some more commands.
If it is not specified, it's value would be taken from config file.
2018-01-04 12:26:13 +01:00
Ants Aasma
15d1767402 Some improvements to patronictl (#571)
* Use scope from config file when listing members

* Add version command to patronictl

* Only delete leader on shutdown when we have the lock to avoid exceptions when leader key does not exist

* Add a timestamp option to list command.

* YAML format for patronictl output

* Fix API request to get version
2018-01-04 10:35:22 +01:00
Alexander Kukushkin
0e01bb33bb Improve patronictl reinit (#576)
Make it possible to cancel a running task if you want to reinitialize replica.
There are two possible ways to trigger it:
1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed
2. if you are using `--force` argument with `patronictl reinit`
2018-01-04 10:31:44 +01:00
Alexander Kukushkin
bd847fd2cc Patronictl extended info (#567)
* Show information about scheduled failover and maintenance mode when showing list of cluster members. Fixes https://github.com/zalando/patroni/issues/557

* Fix postgres version check functions (postgres 10 and above compatibility) and apply pep8 formatting to the tests.
* Bump some configuration parameters to match with postgres 10 defaults.
* Fix name of contributor in release notes.
2017-11-28 12:10:05 +01:00
Alexander Kukushkin
e3a01727a9 Implement missing tests and add pg-10 support to wale_restore(#446)
in addition to that get rid from two modules and fix formatting of tests
2017-05-22 12:01:02 +02:00
Ants Aasma
644b741969 Add config editing to patronictl (#428)
Current UI to change cluster configuration is somewhat unfriendly, involving a curl command, knowing the REST API endpoint, knowing the specific syntax to call it with and writing a JSON document. I added two commands in this branch to make this a bit easier, `show-config` and `edit-config` (names are merely placeholders, any opinions on better ones?).

* `patronictl show-config clustername` fetches the config from DCS, formats it as YAML and outputs it.

* `patronictl edit-config clustername` fetches the config, formats it as YAML, invokes $EDITOR on it, then shows user the diff and after confirmation applies the changed config to DCS, guarding for concurrent modifications.

* `patronictl edit-config clustername --set synchronous_mode=true --set postgresql.use_slots=true` will set the specific key-value pairs.

There are also some UI capabilities I'm less sure of, but included them here as I already implemented them.

* If output is a tty then the diffs are colored. I'm not sure if this feature is cool enough to pull the weight of adding a dependency on cdiff. Or maybe someone knows of another more task focused diff coloring library?
* `patronictl edit-config clustername --pg work_mem=100MB` - Shorthand for `--set postgresql.parameters.work_mem=100MB`
* `patronictl edit-config clustername --apply changes.yaml` - apply changes from a yaml file.
* `patronictl edit-config clustername --replace new-config.yaml` - replace config with new version.
2017-05-19 16:25:21 +02:00
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Alexander Kukushkin
038b5aed72 Improve leader watch functionality (#356)
Previously replicas were always watching for leader key (even if the
postgres was not in the running there). It was not a big issue, but it
was not possible to interrupt such watch in cases if the postgres
started up or stopped successfully. Also it was delaying update_member
call and we had kind of stale information in DCS up to `loop_wait`
seconds. This commit changes such behavior. If the async_executor is
busy by starting/stopping or restarting postgres we will not watch for
leader key but waiting for event from async_executor up to `loop_wait`
seconds. Async executor will fire such event only in case if the
function it was calling returned something what could be evaluated to
boolean True.

Such functionality is really needed to change the way how we are making
decision about necessity of pg_rewind. It will require to have a local
postgres running and for us it is really important to get such
notification as soon as possible.
2016-11-22 16:22:30 +01:00
Alexander Kukushkin
66543f41a3 BUGFIX: Try all cluster members when doing pause/resume (#351)
Previously it was not possible to pause/resume unhealthy cluster.
2016-11-04 18:43:19 +02:00
Alexander Kukushkin
37b020e7a3 Various bugfixes and improvements: (#346)
* Replace pytz.UTC with dateutil.tz.tzutc, it helps to reduce memory by more than 4Mb...

* fix check of python version: 0x0300000 => 0x3000000

* Update leader key before restart and demote
2016-11-04 18:42:56 +02:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alexander Kukushkin
67c4b6b105 Make --dcs and --config-file options global (#322) 2016-09-27 16:25:24 +02:00
Alexander Kukushkin
0b1bfeca5b Make sure that we are running and testing latest versions of everything (#303) 2016-09-19 13:32:53 +02:00
Murat Kabilov
799d4c9bb8 Disable command renamed to pause 2016-08-29 14:30:19 +02:00
Murat Kabilov
22e4af3fb1 Fix failover in the paused state 2016-08-29 12:04:30 +02:00