Commit Graph

2288 Commits

Author SHA1 Message Date
Polina Bungina
5fca21849c Merge branch 'feature/failover-switchover-definition' into refactor/failover-limitations-checks 2023-08-27 22:12:15 +02:00
Polina Bungina
439d292c60 Add timezone to the scheduled option example 2023-08-27 21:58:47 +02:00
Polina Bungina
7024d0a987 Yet another refactoring 2023-08-27 21:57:18 +02:00
Polina Bungina
293c1e4cd3 Apply suggestions from code review 2023-08-27 20:19:00 +02:00
Polina Bungina
7606df7196 Apply suggestions from code review (docs)
Co-authored-by: Matt Baker <93600443+matthbakeredb@users.noreply.github.com>
2023-08-25 13:01:09 +02:00
Polina Bungina
cc076c40aa Move members failover status check to a separate func 2023-08-25 10:01:46 +02:00
Polina Bungina
9ff6663e38 A couple of minor improvements 2023-08-24 14:40:50 +02:00
Polina Bungina
4fa666e78c Test_restart refactoring 2023-08-24 13:15:08 +02:00
Polina Bungina
bcc6a9bd93 Factor new code out to a class 2023-08-22 14:42:30 +02:00
Polina Bungina
c4032f4ce8 Test 2023-08-22 10:43:03 +02:00
Polina Bungina
8fbf1b05da Address review 2023-08-21 15:27:39 +02:00
Polina Bungina
fac0d76081 Fix test_ctl coverage 2023-08-21 15:12:24 +02:00
Polina Bungina
4bdc0c7f7f Merge branch 'master' into feature/failover-switchover-definition 2023-08-21 13:29:18 +02:00
Polina Bungina
fe6c864536 Address review 2023-08-21 12:45:19 +02:00
Polina Bungina
13cfe0af36 Pin sphinx_rtd_theme to >1 (#2825)
Earlier versions are incompatible with sphinx>7
2023-08-21 07:55:02 +02:00
Polina Bungina
7319d12026 Remove accidentally added .DS_Store (#2826)
And extend .gitignore
2023-08-21 07:50:45 +02:00
Polina Bungina
2ec9834c60 Update api examples (#2824)
* Add failsafe_mode_is_active to /patroni and /metrics
* Add patroni_primary to /metrics
* Add examples showing that failsafe_mode_is_active and cluster_unlocked
  are only shown for /patroni when the value is "true"
* Update /patroni and /config examples
2023-08-18 16:13:13 +02:00
Alexander Kukushkin
2be64e5131 Don't return logical slots for standby cluster (#2816)
Cluster.get_replication_slots() didn't take into account that there can not be logical replication slots in a standby cluster replicas. It was only skipping logical slots for the standby_leader, but replicas were expecting that they will have to copy them over.

Also on replicas in a standby cluster these logical slots were falsely added to the `_replication_slots` dict.
2023-08-18 13:36:32 +02:00
Alexander Kukushkin
93be10a655 Remove Python 2 install instructions from docs/README (#2822)
docs/README.rst mainly duplicates README.rst and also should be changed. Besides that remove test/coverage badges.

followup on #2821
2023-08-17 16:17:34 +02:00
Alexander Kukushkin
366829e379 Refactor Connection class (#2815)
1. stop using the same cursor all the time, it creates problems when not carefully used from different threads.
2. introduce query() method in the Connection class and make it return a result set when it is possible.
3. refactor most of the code that is relying (directly or indirectly) on the Connection object to use the query() method as much as possible.

This refactoring helps with reducing code complexity and will help with future introduction of a separate database connection for the REST API thread. The last one will help to improve reliability when system is under significant stress when simple monitoring queries are taking seconds to execute and the REST API starts blocking the main thread.
2023-08-17 15:42:11 +02:00
Jelte Fennema
899cad1c0f Remove Python 2 install instructions from README (#2821) 2023-08-17 13:18:37 +02:00
Israel
a4ac4963d1 Fix IntValidator regarding validation of value 0 (#2818)
Previous to this commit `IntValidator` would always consider the value `0` invalid, even if in the allowed range.

The problem was that `parse_int` was returning `0` in the following line:

```python
value = parse_int(value, self.base_unit) or ""
```

However the `or ""` was evaluating to an empty string.

As `parse_int` returns either an `int` if able to parse, or `None` otherwise, the `isinstance(value, int)` is enough to error out when not a valid `int`.

Closes #2817
2023-08-17 12:55:42 +02:00
Alexander Kukushkin
704d36815a Explicitly enable synchronous mode (#2820)
Close https://github.com/zalando/patroni/issues/2819

Co-authored-by: Polina Bungina <27892524+hughcapet@users.noreply.github.com>
2023-08-17 12:33:15 +02:00
Israel
4138d0b830 Add docstrings to patroni.config (#2708)
Besides adding docstrings to `patroni.config`, a few side changes
have been applied:

* Reference `config_file` property instead of internal attribute
`_config_file` in method `_load_config_file`;
* Have `_AUTH_ALLOWED_PARAMETERS[:2]` as default value of `params`
argument in method `_get_auth` instead of using
`params or _AUTH_ALLOWED_PARAMETERS[:2]` in the body;
* Use `len(PATRONI_ENV_PREFIX)` instead of a hard-coded `8` when
removing the prefix from environment variable names;
* Fix documentation of `wal_log_hints` setting. The previous docs
mentioned it was a dynamic setting that could be changed. However
it is managed by Patroni, which forces `on` value.

References: PAT-123.
2023-08-17 11:19:49 +02:00
Matt Baker
b7ea511511 Generate API docs from code with sphinx autodoc (#2699)
Expanding on the addition of docstrings in code, this adds python module API docs to sphinx documentation.

A developer can preview what this might look like by running this locally:

```
tox -m docs
```

The option `-W` is added to the tox env so that warning messages are considered errors.

Adds doc generation using the above method to the test GitHub workflow to catch documentation problems on PRs.

Some docstrings have been reformatted and fixed to satisfy errors generated with the above setup.
2023-08-17 10:27:33 +02:00
Israel
badf1da183 Add docstrings to patroni.__main__ (#2701)
References: PAT-117.
2023-08-15 09:05:25 +02:00
Alexander Kukushkin
6a75b1591b Use pg_current_wal_flush_lsn() starting from 9.6 (#2813)
Due to historical reasons (not available before 9.6) we used `pg_current_wal_lsn()`/`pg_current_xlog_location()` functions to get current WAL LSN on the primary. But, this LSN is not necessarily synced to disk, and could be lost if the primary node crashed.
2023-08-15 09:01:37 +02:00
Polina Bungina
ba204884d8 Pass bare action name to should_run_scheduled_action() 2023-08-14 20:36:46 +02:00
Matt Baker
82d2ef4878 Make docs more clear on changes to the bootstrap.dcs section of YAML config (#2811)
It seems that a common pitfall for new users of Patroni is that the `bootstrap.dcs` section is only used to initialize the configuration in DCS. This moves the comment about this to an info block so it is more visible to the reader.
2023-08-11 10:31:31 +02:00
Mark Pekala
b83f1c0f44 [Refactor] Rename _is_leader to _leader_expiry (#2807) 2023-08-11 10:30:20 +02:00
Alexander Kukushkin
9209a5a133 Refactor delete_leader interface (#2810)
similar to https://github.com/zalando/patroni/pull/2690, but it helps mostly Consul implementation.
2023-08-11 10:19:29 +02:00
Polina Bungina
3734ecc851 Implement generate-config (#2786)
New patroni.py option that allows to

* generate patroni.yml configuration file with the values from a running cluster
* generate a sample patroni.yml configuration file
2023-08-09 17:46:53 +02:00
Polina Bungina
46bc1ded8e Merge remote-tracking branch 'origin/master' into feature/failover-switchover-definition 2023-08-09 15:28:06 +02:00
Alexander Kukushkin
713244975c Silence useless warnings in patronictl (#2808)
Close https://github.com/zalando/patroni/issues/2805
2023-08-09 14:48:18 +02:00
Alexander Kukushkin
efaba9f183 Rename Postgresql.is_leader() to is_primary() (#2809)
It'll help to avoid confusion with the Ha.is_leader() method.
2023-08-09 14:47:53 +02:00
Polina Bungina
d3da80196e Replace removeprefix() 2023-08-09 12:54:33 +02:00
Polina Bungina
01830ecbd1 Chack if candidate is the same as the leader specified in api 2023-08-09 12:34:53 +02:00
Polina Bungina
3c1f2ff7a4 Implement _get_failover_action_name() 2023-08-09 12:34:47 +02:00
Alexander Kukushkin
c052789c56 Merge branch 'master' of github.com:zalando/patroni into feature/failover-switchover-definition 2023-08-08 17:00:50 +02:00
Polina Bungina
f24db395c6 Refactor is_failover_possible() (#2804)
* Refactor is_failover_possible()

Move all the members filtering inside the function.

* Remove check_synchronous parameter
* Add sync_mode_is_active() method and user it everywhere where it is appropriate
* Reduce nesting

---------

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-08-08 11:50:02 +02:00
Matt Baker
9dd177e5c9 Add docstrings to patroni.postgresql.slots.py (#2778)
Also, made some small code changes to satisfy formatting and pylint.
2023-08-08 08:54:55 +02:00
Matt Baker
eb100fd586 Add docs to patroni.dcs.__init__.py (#2777)
Also, made some small code changes to satisfy formatting and pylint.
2023-08-08 08:49:12 +02:00
Polina Bungina
444021e6b8 Fix docs
Failover section was accidentally moved too far
Add table title
2023-08-05 22:53:12 +02:00
ChenChangAo
a74985f41d reset failsafe state when promote (#2803)
consider the scenario(enable failsafe_mode):

0. node1(primary) - node2(replica)
1. stop all etcd nodes; wait ttl seconds; start all etcd nodes; (node2's failsafe will contain the info about node1)
2. switchover to node2; (node2's failsafe still contain the info about node1)
3. stop all etcd nodes; wait ttl seconds; start all etcd nodes;
4. node2 will demote because it consider node1 as primary

Resetting failsafe state when running as a primary fixes the issue.
2023-08-04 13:55:56 +02:00
Alexander Kukushkin
da9aaf6cdf Mock request() method when running tests (#2802)
1. Unit tests should not really try accessing any resources.
2. Not doing so results in significant execution time of unit tests on Windows

In addition to that perform a request with timeout 3s. Usually this is more than enough to figure out whether resource is accessible.

Followup on #2724
2023-08-04 07:37:49 +02:00
Alexander Kukushkin
84aac437c1 Release v3.1.0 (#2801)
- bump pyright and resolve reported issues
- bump Patroni version
- update release notes
v3.1.0
2023-08-03 13:02:29 +02:00
Israel
48e3d31e1d Refactor docs about migration to Patroni (#2796)
This PR is an attempt of refactoring the docs about migration to Patroni.

These are a few enhancements that we propose through this PR:

* Docs used to mention the procedure can only be performed in a single-node cluster. We changed that so the procedure considers a cluster composed of primary and standbys;
* Teach how to deal with pre-existing replication slots;
* Explain how to create the user for `pg_rewind`, if user intends to enable `use_pg_rewind`.

References: PAT-143.
2023-08-03 09:01:16 +02:00
Alexander Kukushkin
01d07f86cd Set permissions for files and directories created in PGDATA (#2781)
Postgres supports two types of permissions:
1. owner only
2. group readable

By default the first one is used because it provides better security. But, sometimes people want to run a backup tool with the user that is different from postgres. In this case the second option becomes very useful. Unfortunately it didn't work correctly because Patroni was creating files with owner access only permissions.

This PR changes the behavior and permissions on files and directories that are created by Patroni will be calculated based on permissions of PGDATA. I.e., they will get group readable access when it is necessary.

Close #1899
Close #1901
2023-08-02 13:15:43 +02:00
Matt Baker
b6fc4bc393 Replace instances of typing Generator[X, None, None] with Iterator[X] (#2799) 2023-08-02 12:36:29 +02:00
Israel
018a2f4dd9 Enhance docs of slots dynamic configuration (#2797)
The docs of `slots` configuration used to have this mention:

```
my_slot_name: the name of replication slot. If the permanent slot name
matches with the name of the current primary it will not be created.
Everything else is the responsibility of the operator to make sure that
there are no clashes in names between replication slots automatically
created by Patroni for members and permanent replication slots.
```

However that is not true in the sense that Patroni does not check for
clashes between `my_slot_name` and the name of replication slots created
for replicating changes among members. If you specify a slot name that
clashes with the name of a replication slot used by a member, it turns
out Patroni will make the slot permanent in the primary even if the member
key expire from the DCS.

Through this commit we also enhance the docs in terms of explaining that
physical permanent slots are maintained only in the primary, while logical
replication slots are copied from primary to standbys.

Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
2023-08-01 15:40:07 +02:00