Commit Graph

21 Commits

Author SHA1 Message Date
Alexander Kukushkin
93eb4edbe6 Reformat imports with isort (#3123)
Besides that:
1. Introduce `setup.py isort` for quick check
2. Introduce GH actions to check imports
2024-08-13 17:53:59 +02:00
Polina Bungina
ae53260030 Extend behave tests with nostream feature (#3036)
Check state and permanent logical replication slots behaviour
2024-03-29 12:54:40 +01:00
Alexander Kukushkin
2ac1efea54 Optimize priority failover behave tests (#3004)
1. get rid of useless sleep calls
2. call `POST /failover` on the node where we want to failover to
2024-01-15 12:03:14 +01:00
Polina Bungina
71ccf91e36 Don't filter out contradictory nofailover tag (#2992)
* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
2024-01-02 09:30:18 +01:00
Mark Pekala
f5ee67fa1c Feature: failover priority (#2780)
The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags.

The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0).

Close https://github.com/zalando/patroni/issues/2759
2023-10-24 12:22:48 +02:00
Alexander Kukushkin
c5fffb3c97 Further work on permanent physical slots (#2891)
- Fixed issues with has_permanent_slots() method. It didn't took into account the case of permanent physical slots for members, falsely concluding that there are no permanent slots.
- Write to the status key only LSNs for permanent slots (not just for slots that exist on the primary).
  - Include pg_current_wal_flush_lsn() to slots feedback, so that slots on standby nodes could be advanced
- Improved behave tests:
  - Verify that permanent slots are properly created on standby nodes
  - Verify that permanent slots are properly advanced, including DCS failsafe mode
  - Verify that only permanent slots are written to the `/status`
2023-10-23 08:24:28 +02:00
Alexander Kukushkin
f77073c8e1 Speed up dcs failsafe behave tests (#2890)
- get rid from sleeps
- reduce retry_timeout
- avoid graceful Patroni shut down while DCS is "paused", just kill
  Patroni and after that gracefully stop postgres
- don't try to delete Pod when Patroni is killed. If K8s API is paused it takes ages

The run time on my laptop is reduced from 2m to 1m28s.
2023-09-28 10:44:11 +02:00
Alexander Kukushkin
7e89583ec7 Please new flake8 (#2789)
it stopped liking lack of space character between `,` and `\`
```python
foo,\
    bar
```
2023-07-31 09:08:46 +02:00
Mark Pekala
412c51ddf1 Prevent splitbrain from duplicate names in configuration (#2724)
When starting check if node with the same is registered in DCS and try to query it's REST API.
If REST API is accessible exit with the error.

Close #1804
2023-07-11 07:43:57 +02:00
Polina Bungina
3fe2a7868a Ignore D401 in flake8-docstrings (#2627)
* Ignore D401 in flake8-docstrings
* Fix newly reported flake8 issues, ignore the old W503 rule
* rely on concatenation of adjecent strings
* Format behave scripts
* Reformat ha.py according to new rules

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-04-03 09:52:22 +02:00
Alexander Kukushkin
4c3af2d1a0 Change master->primary/leader/member (#2541)
keep as much backward compatibility as possible.

Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples

Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
2023-01-27 07:40:24 +01:00
Alexander Kukushkin
4d77b444dc Enforce search_path=pg_catalog for non-replication connections (#2496)
There is a known [vector of attact](https://pganalyze.com/blog/5mins-postgres-security-patch-releases-pgspot-pghostile) by creating functions and/or operators in a public scheme with the same name and signature as corresponding objects in `pg_catalog`.

Since Patroni is heavily relying on superuser connections we want to mitigate it by enforcing `search_path=pg_catalog` for all connections created by Patroni (except replication connections). It is achieved by introducing a new function, that wraps psycopg.connect() and appends ` -c search_path=pg_catalog` to `options` parameter.

In addition to that, we set connection.autocommit to True before returning it.
2022-12-20 09:56:14 +01:00
Alexander Kukushkin
fce889cd04 Compatibility with psycopg 3.0 (#2088)
By default `psycopg2` is preferred. The `psycopg>=3.0` will be used only if `psycopg2` is not available or its version is too old.
2021-11-19 14:32:54 +01:00
krishna
b3dc765e6d Choose synchronous nodes based on replication lag (#1786)
This commit makes it possible to configure the maximum lag (`maximum_lag_on_syncnode`) after which Patroni will "demote" the node from synchronous and replace it with another node.

The previous implementation always tried to stick to the same synchronous nodes (even if they are not optimal ones).
2021-02-02 15:45:02 +01:00
Ants Aasma
a70b46ef13 Add watchdog support on Linux (#343)
Ensures that system gets rebooted before TTL runs out.

Initial version. Open questions:

    Do we want to disable watchdog while we are not master?
2017-06-01 16:53:46 +02:00
Alexander Kukushkin
d138a8db17 AT for master_start_timeout + minor fixes (#361) 2016-12-09 12:02:41 +01:00
Alexander Kukushkin
4594bc98da Increase timeouts when running AT on travis (#324)
* Increase timeouts two times when running AT on travis
* Make up to 3 attempts to download DCS
* Get rid from hard-coded names
2016-09-28 15:13:09 +02:00
Alexander Kukushkin
24a2ea6cef Refactor acceptance tests to make them work against ZooKeeper
and make it easier to implement controllers for new DCS, i.e. consul
2016-04-10 10:37:43 +02:00
Alexander Kukushkin
5f6beae22f Enforce data-type checks for step matcher
and increase default timeout for patroni start
2016-03-11 14:46:14 +01:00
Alexander Kukushkin
8b81d270bc BUGFIX: Assertion Failed: Steps must be unicode 2016-03-11 13:47:57 +01:00
Alexander Kukushkin
30d3982d25 Acceptance tests with behave 2016-03-11 12:56:29 +01:00