* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
1. extract `GlobalConfig` class to its own module
2. make the module instantiate the `GlobalConfig` object on load and replace sys.modules with the this instance
3. don't pass `GlobalConfig` object around, but use `patroni.global_config` module everywhere.
4. move `ignore_slots_matchers`, `max_timelines_history`, and `permanent_slots` from `ClusterConfig` to `GlobalConfig`.
5. add `use_slots` property to global_config and remove duplicated code from `Cluster` and `Postgresql.ConfigHandler`.
Besides that improve readability of couple of checks in ha.py and formatting of `/config` key when saved from patronictl.
The priority is configured with `failover_priority` tag. Possible values are from `0` till infinity, where `0` means that the node will never become the leader, which is the same as `nofailover` tag set to `true`. As a result, in the configuration file one should set only one of `failover_priority` or `nofailover` tags.
The failover priority kicks in only when there are more than one node have the same receive/replay LSN and are ahead of other nodes in the cluster. In this case the node with higher value of `failover_priority` is preferred. If there is a node with higher values of receive/replay LSN, it will become the new leader even if it has lower value of `failover_priority` (except when priority is set to 0).
Close https://github.com/zalando/patroni/issues/2759
Despite being validated by `IntValidator` some GUC's couldn't be casted directly to `int` because they include suffix. Example: `128MB`.
Close https://github.com/zalando/patroni/issues/2879
Postgres supports two types of permissions:
1. owner only
2. group readable
By default the first one is used because it provides better security. But, sometimes people want to run a backup tool with the user that is different from postgres. In this case the second option becomes very useful. Unfortunately it didn't work correctly because Patroni was creating files with owner access only permissions.
This PR changes the behavior and permissions on files and directories that are created by Patroni will be calculated based on permissions of PGDATA. I.e., they will get group readable access when it is necessary.
Close#1899Close#1901
In issue #2735 it was discussed that there should be some warning around PostgreSQL parameters that do not pass validation.
This commit ensures something is logged for parameters that fail validation and therefore fall back to default values.
Close#2735Close#2740
- added pyrightconfig.json with typeCheckingMode=strict
- added type hints to all files except api.py
- added type stubs for dns, etcd, consul, kazoo, pysyncobj and other modules
- added type stubs for psycopg2 and urllib3 with some little fixes
- fixes most of the issues reported by pyright
- remaining issues will be addressed later, along with enabling CI linting task
Configuration parameter is `kubernetes.retriable_http_codes` or `PATRONI_KUBERNETES_RETRIABLE_HTTP_CODES` environment variable.
These status codes are added to the default list of 500, 503, 504.
Close https://github.com/zalando/patroni/issues/2536
keep as much backward compatibility as possible.
Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples
Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together:
```
/service/batman/
/service/batman/0/
/service/batman/0/initialize
/service/batman/0/leader
/service/batman/0/members/
/service/batman/0/members/m1
/service/batman/0/members/m2
/service/batman/
/service/batman/1/
/service/batman/1/initialize
/service/batman/1/leader
/service/batman/1/members/
/service/batman/1/members/m1
/service/batman/1/members/m2
...
```
Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups.
Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper).
The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group.
Besides that we introduce a new method get_citus_coordinator(). It will be used only by worker clusters.
Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates.
E.g.
```
batman-0-leader # the leader config map for the coordinator
batman-0-config # the config map holding initialize, config, and history "keys"
...
batman-1-leader # the leader config map for worker group 1
batman-1-config
...
```
Citus integration is enabled from patroni.yaml:
```yaml
citus:
database: citus
group: 0 # 0 is for coordinator, 1, 2, etc are for workers
```
If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file.
When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section.
Besides that, Patroni takes over management of some Postgres GUCs:
- `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place
- `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2`
- wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal.
The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using
citus_add_node() and citus_update_node() functions.
Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries.
When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker.
Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked.
If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike.
All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.
If configured, only IPs that matching rules would be allowed to call unsafe endpoints.
In addition to that, it is possible to automatically include IPs of members of the cluster to the list.
If neither of the above is configured the old behavior is retained.
Partially address https://github.com/zalando/patroni/issues/1734
It is now also possible to point the configuration path to a directory instead of a file.
Patroni will find all yml files in the directory and apply them in sorted order
Close https://github.com/zalando/patroni/issues/1669
* new node can join the cluster dynamically and become a part of consensus
* it is also possible to join only Patroni cluster (without adding the node to the raft), just comment or remove `raft.self_addr` for that
* when the node joins the cluster it is using values from `raft.partner_addrs` only for initial discovery.
* It is possible to run Patroni and Postgres on two nodes plus one node with `patroni_raft_controller` (without Patroni and Postgres). In such setup one can temporarily lose one node without affecting the primary.
That required a refactoring of `Config` and `Patroni` classes. Now one has to explicitely create the instance of `Config` before creating `Patroni`.
The Config file can optionally call the validate function.
It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config.
For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it.
In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default).
Close https://github.com/zalando/patroni/issues/1198
It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process.
Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only
recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs.
Example of log section in the config file:
```yaml
log:
dir: /where/to/write/patroni/logs # if not specified, write logs to stderr
file_size: 50000000 # 50MB
file_num: 10 # keep history of 10 files
dateformat: '%Y-%m-%d %H:%M:%S'
loggers: # increase log verbosity for etcd.client and urllib3
etcd.client: DEBUG
urllib3: DEBUG
```
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
* Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
* Use ConfigMaps or Endpoins for leader elections and to keep cluster state
* Label pods with a postgres role
* change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`
On debian, the configuration files (postgresql.conf, pg_hba.conf, etc) are not stored in the data directory. It would be great to be able to configure the location of this separate directory. Patroni could override existing configuration files where they are used to be.
The default is to store configuration files in the data directory. This setting is targeting custom installations like debian and any others moving configuration files out of the data directory.
Fixes#465
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.
This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.
* Introduce timeout parameter to restart.
When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
* Add https and auth support for etcd
Also implement support of PATRONI_ETCD_URL and PATRONI_ETCD_SRV
environment variables
* Implement etcd.proxy etcd.cacert, etcd.cert and etcd.key support
Now it should be possible to set up fully encrypted connection to etcd
with authorization.