Commit Graph

93 Commits

Author SHA1 Message Date
Polina Bungina
6e1f9f7a6e Prepare repo migration (#3085) 2024-06-17 09:04:43 +02:00
Grigory Smolkin
b09af642e6 Disable WAL streaming on standby node via new boolean tag "nostream" (#2842)
Add support for ``nostream`` tag. If set to ``true`` the node will not use replication protocol to stream WAL. It will rely instead on archive recovery (if ``restore_command`` is configured) and ``pg_wal``/``pg_xlog`` polling. It also disables copying and synchronization of permanent logical replication slots on the node itself and all its cascading replicas. Setting this tag on primary node has no effect.
2024-03-20 10:10:53 +01:00
Polina Bungina
71ccf91e36 Don't filter out contradictory nofailover tag (#2992)
* Ensure that nofailover will always be used if both nofailover and
failover_priority tags are provided
* Call _validate_failover_tags from reload_local_configuration() as well
* Properly check values in the _validate_failover_tags(): nofailover value should be casted to boolean like it is done when accessed in other places
2024-01-02 09:30:18 +01:00
Alexander Kukushkin
6d98944e73 Add warning to the sample config about bootstrap section (#2925)
often people are trying to change it and coming with the questions why it doesn't work.
2023-10-23 10:03:18 +02:00
GuanqunYang193
ce187bec38 Remove user creation related docs (#2920)
* Remove user creation related docs
* remove template
2023-10-23 08:29:09 +02:00
Stan Bogatkin
480b8dbf95 Fix typo in yml files (#2760)
Users statement was mentioned twice in templates - fix this simple typo by removing duplicates.
2023-07-17 14:55:06 +02:00
Polina Bungina
6c8a3b0d25 Remove bootstrap.pg_hba (#2684)
* Remove bootstrap.pg_hba
* Extend docs for postgresql.pg_hba/pg_ident
* Add postgresql.pg_hba/pg_ident to dynamic config docs

---------

Co-authored-by: Alexander Kukushkin <cyberdemn@gmail.com>
2023-05-24 09:01:56 +02:00
Alexander Kukushkin
4c3af2d1a0 Change master->primary/leader/member (#2541)
keep as much backward compatibility as possible.

Following changes were made:
1. All internal checks are performed as `role in ('master', 'primary')`
2. All internal variables/functions/methods are renamed
3. `GET /metrics` endpoint returns `patroni_primary` in addition to `patroni_master`.
4. Logs are changed to use leader/primary/member/remote depending on the context
5. Unit-tests are using only role = 'primary' instead of 'master' to verify that 1 works.
6. patronictl still supports old syntax, but also accepts `--leader` and `--primary`.
7. `master_(start|stop)_timeout` is automatically translated to `primary_(start|stop)_timeout` if the last one is not set.
8. updated the documentation and some examples

Future plan: in the next major release switch role name from `master` to `primary` and maybe drop `master` altogether.
The Kubernetes implementation will require more work and keep two labels in parallel. Label values should probably be configurable as described in https://github.com/zalando/patroni/issues/2495.
2023-01-27 07:40:24 +01:00
Alexander Kukushkin
4872ac51e0 Citus integration (#2504)
Citus cluster (coordinator and workers) will be stored in DCS as a fleet of Patroni logically grouped together:
```
/service/batman/
/service/batman/0/
/service/batman/0/initialize
/service/batman/0/leader
/service/batman/0/members/
/service/batman/0/members/m1
/service/batman/0/members/m2
/service/batman/
/service/batman/1/
/service/batman/1/initialize
/service/batman/1/leader
/service/batman/1/members/
/service/batman/1/members/m1
/service/batman/1/members/m2
...
```

Where 0 is a Citus group for coordinator and 1, 2, etc are worker groups.

Such hierarchy allows reading the entire Citus cluster with a single call to DCS (except Zookeeper).

The get_cluster() method will be reading the entire Citus cluster on the coordinator because it needs to discover workers. For the worker cluster it will be reading the subtree of its own group.

Besides that we introduce a new method  get_citus_coordinator(). It will be used only by worker clusters.

Since there is no hierarchical structures on K8s we will use the citus group suffix on all objects that Patroni creates.
E.g.
```
batman-0-leader  # the leader config map for the coordinator
batman-0-config  # the config map holding initialize, config, and history "keys"
...
batman-1-leader  # the leader config map for worker group 1
batman-1-config
...
```

Citus integration is enabled from patroni.yaml:
```yaml
citus:
  database: citus
  group: 0  # 0 is for coordinator, 1, 2, etc are for workers
```

If enabled, Patroni will create the database, citus extension in it, and INSERTs INTO `pg_dist_authinfo` information required for Citus nodes to communicate between each other, i.e. 'password', 'sslcert', 'sslkey' for superuser if they are defined in the Patroni configuration file.

When the new Citus coordinator/worker is bootstrapped, Patroni adds `synchronous_mode: on` to the `bootstrap.dcs` section.

Besides that, Patroni takes over management of some Postgres GUCs:
- `shared_preload_libraries` - Patroni ensures that the "citus" is added to the first place
- `max_prepared_transactions` - if not set or set to 0, Patroni changes the value to `max_connections*2`
- wal_level - automatically set to logical. It is used by Citus to move/split shards. Under the hood Citus is creating/removing replication slots and they are automatically added by Patroni to the `ignore_slots` configuration to avoid accidental removal.

The coordinator primary actively discovers worker primary nodes and registers/updates them in the `pg_dist_node` table using
citus_add_node() and citus_update_node() functions.

Patroni running on the coordinator provides the new REST API endpoint: `POST /citus`. It is used by workers to facilitate controlled switchovers and restarts of worker primaries.
When the worker primary needs to shut down Postgres because of restart or switchover, it calls the `POST /citus` endpoint on the coordinator and the Patroni on the coordinator starts a transaction and calls `citus_update_node(nodeid, 'host-demoted', port)` in order to pause client connections that work with the given worker.
Once the new leader is elected or postgres started back, they perform another call to the `POST/citus` endpoint, that does another `citus_update_node()` call with actual hostname and port and commits a transaction. After transaction is committed, coordinator reestablishes connections to the worker node and client connections are unblocked.
If clients don't run long transaction the operation finishes without client visible errors, but only a short latency spike.

All operations on the `pg_dist_node` are serialized by Patroni on the coordinator. It allows to have more control and ROLLBACK transaction in progress if its lifetime exceeding a certain threshold and there are other worker nodes should be updated.
2023-01-24 16:14:58 +01:00
Alexander Kukushkin
ed47224540 Improve behaviour of the insecure option (#2476)
It didn't worked correctly when client certificates are used for REST API requests.
2022-12-06 17:24:57 +01:00
Alexander Kukushkin
8f8e9c9b81 Inptroduce postgresql.proxy_address (#2437)
It will be written to member key in DCS as the `proxy_url` and could be used/useful for service discovery.
2022-10-24 10:23:06 +02:00
Alexander Kukushkin
729f1dddc8 Compatibility with PostgreSQL 15 beta1 (#2299)
* update postgresql/validator.py
* pg_rewind doesn't like if there are unix sockets in PGDATA
* pg_rewind now supports --config-file option
2022-05-19 15:36:09 +02:00
Bastien Wirtz
38d84b1d15 Make sure no substitution attemps is made when params is empty. (#2212)
Close #2209
2022-02-14 15:20:38 +01:00
Tommy Li
294bb43bf1 Add Patronis default postgres configs to the sample yml files (#1909)
This adds the default Postgres settings enforced by Patroni to the `postgres{n}.yml` files provided in the repo. The documentation does call out the defaults that Patroni will set, but it can be missed if you download postgres0.yml and use that as a starting point. Hopefully the extra commented out configs serve as a visual cue to save the next person from the same mistake :)
2021-04-20 09:45:43 +02:00
Sergey Dudoladov
950eff27ad Optional fencing script (pre_promote) (#1099)
Call a fencing script after acquiring the leader lock. If the script didn't finish successfully - don't promote but remove leader key

Close https://github.com/zalando/patroni/issues/1567
2020-09-01 07:50:39 +02:00
Alexander Kukushkin
bfbc4860d5 PoC: Patroni on pure RAFT (#375)
* new node can join the cluster dynamically and become a part of consensus
 * it is also possible to join only Patroni cluster (without adding the node to the raft), just comment or remove `raft.self_addr` for that
 * when the node joins the cluster it is using values from `raft.partner_addrs` only for initial discovery.
* It is possible to run Patroni and Postgres on two nodes plus one node with `patroni_raft_controller` (without Patroni and Postgres). In such setup one can temporarily lose one node without affecting the primary.
2020-07-29 15:34:44 +02:00
Igor Yanchenko
cd96a10dd2 Provide an example of using multiple etcd endpoints in yaml files (#1289) 2019-11-21 13:29:05 +01:00
Alexander Kukushkin
bba9066315 Make it possible to run pg_rewind without superuser on pg11+ (#1035)
* expose the current patroni version in DCS
* expose `checkpoint_after_promote` flag in DCS as an indicator that pg_rewind could be safely executed
* other nodes will wait until this flag is set instead of connecting as superuser and issuing the CHECKPOINT
* define `postgresql.authention.rewind` with credentials for pg_rewind in patroni configuration files.
* create user for pg_rewind if postgres is 11+
* grant execute on functions required for pg_rewind to rewind user
2019-05-02 14:07:26 +02:00
vilajit
f6d29081c9 Enabling kerberos support (#1015)
* make it possible to create users without passwords
* put `krbsrvname` into the connection string if it is specified in the config
* update postgres?.yml example files to mention `krbsrvname`
2019-04-29 09:02:04 +02:00
Dmitry Dolgov
dd7c3c349f [WIP] Standby cluster implementation (#679)
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
2018-09-07 10:10:56 +02:00
wilfriedroset
0136f252ab Add patronictl -k/--insecure flag and suport for restapi cert (#790)
Fixes https://github.com/zalando/patroni/issues/785
2018-08-29 16:08:13 +02:00
Don Seiler
50a8114d0b Use enforced minimums in postgresX.yml files (#730)
Fix the discrepancy for the values of max_wal_senders and max_replication_slots between the sample postgres.yml files and hard-coded defaults in Patroni, bumping the former to 10.
Contributed by @dtseiler
2018-07-04 10:08:54 +02:00
Ants Aasma
70d718a058 Simplify watchdog code (#452)
* Only activate watchdog while master and not paused

We don't really need the protections while we are not master. This way
we only need to tickle the watchdog when we are updating leader key or
while demotion is happening.

As implemented we might fail to notice to shut down the watchdog if
someone demotes postgres and removes leader key behind Patroni's back.
There are probably other similar cases. Basically if the administrator
if being actively stupid they might get unexpected restarts. That seems
fine.

* Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni.

Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online.

When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work **AND** it is required.

* Reset implementation when config changed while active.

* Add watchdog safety margin configuration

Defaults to 5 seconds. Basically this is the maximum amount of time
that can pass between the calls to odcs.update_leader()` and
`watchdog.keepalive()`, which are called right after each other. Should
be safe for pretty much any sane scenario and allows the default
settings to not trigger watchdog when DCS is not responding.

* Cancel bootstrap if watchdog activation fails

The system would have demoted itself anyway the next HA loop. Doing it
in bootstrap gives at least some other node chance to try bootstrapping
in the hope that it is configured correctly.

If all nodes are unable to activate they will continue to try until the
disk is filled with moved datadirs. Perhaps not ideal behavior, but as
the situation is unlikely to resolve itself without administrator
intervention it doesn't seem too bad.
2017-07-27 12:16:11 +02:00
jouir
4ca94a5dab Add config_dir option for configuration files location (#466)
On debian, the configuration files (postgresql.conf, pg_hba.conf, etc) are not stored in the data directory. It would be great to be able to configure the location of this separate directory. Patroni could override existing configuration files where they are used to be.

The default is to store configuration files in the data directory. This setting is targeting custom installations like debian and any others moving configuration files out of the data directory.

Fixes #465
2017-07-04 16:14:17 +02:00
Ants Aasma
a70b46ef13 Add watchdog support on Linux (#343)
Ensures that system gets rebooted before TTL runs out.

Initial version. Open questions:

    Do we want to disable watchdog while we are not master?
2017-06-01 16:53:46 +02:00
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alejandro Martínez
48a6af6994 Add post_init configuration parameter on bootstrap (#296)
* Add bootstrap post_init configuration parameter
* Add documentation

By @zenitraM
2016-09-28 15:42:23 +02:00
Alexander Kukushkin
0b1bfeca5b Make sure that we are running and testing latest versions of everything (#303) 2016-09-19 13:32:53 +02:00
Alexander Kukushkin
ef0b3c2296 Bring all configs to the new format (#265)
The v1.0 has been released more than one month ago and the new version
is coming. It doesn't make a lot of sense to keep configuration files in
the old format anymore.
In addition to that I've also commented out all the lines enabling and
configuring "archiving" to avoid incidents like here:
https://github.com/zalando/patroni/issues/264
2016-08-23 11:46:16 +02:00
Ants Aasma
494887f47e Enable configuration of PostgreSQL binary locations. (#263)
Adds a bin_dir parameter to PostgreSQL settings that will be prefixed to all command invocations.
2016-08-18 14:06:11 +02:00
Alexander Kukushkin
5314433b70 Merge branch 'feature/dynamic-configuration' of github.com:zalando/patroni into feature/environment-configuration 2016-06-09 11:09:30 +02:00
Alexander Kukushkin
5f4e582660 Merge branch 'master' of github.com:zalando/patroni into feature/dynamic-configuration 2016-06-09 11:04:28 +02:00
Alexander Kukushkin
50d118c3aa Split ZooKeeper and Exhibitor
Originally Exhibitor was supported in the ZooKeeper class and
configuration for Exhibitor was taken also from `zookeeper` section in
the yaml config file. In fact, Exhibitor just extends ZooKeeper and now
it is reflected in the code and also Exhibitor got it's own section in
the config.yaml file. It will make it easier to configure Exhibitor
hosts and port via environment variables when PR#211 will be merged.
2016-06-08 19:21:18 +02:00
Alexander Kukushkin
b7d87f7d07 Implement possibility to configure Patroni via environment 2016-06-08 10:15:24 +02:00
Alexander Kukushkin
b3ada161cf Implement possibility to configure retry_timeout globally
Previously it was hardcoded all over the place.
2016-05-31 10:30:53 +02:00
Alexander Kukushkin
7827951c8c Dynamic configuration 2016-05-25 14:17:05 +02:00
Alexander Kukushkin
eabfd82a5d Implement Consul support 2016-04-27 10:59:01 +02:00
Feike Steenbergen
28d5de17e1 Remove pg_hba injection and filtering
Previously we explicitly injected a replication record into pg_hba.conf.
This doesn't allow users to explicitly write their configurations.

This change will just write the lines specified by the user.
2016-04-20 11:06:36 +02:00
Alexander Kukushkin
24a2ea6cef Refactor acceptance tests to make them work against ZooKeeper
and make it easier to implement controllers for new DCS, i.e. consul
2016-04-10 10:37:43 +02:00
Oleksii Kliukin
d426a795c3 Merge pull request #122 from zalando/feature/replica_without_the_master
Run replicas without the master
2016-02-04 21:32:17 +01:00
Oleksii Kliukin
abaef49670 Disable auth in order to use patronictl with the default configuration, remove obsolete replication_methods like. 2016-01-29 09:05:52 +01:00
Oleksii Kliukin
c650dc092e Follow the node in the replicatefrom if present.
Rename the follow_the_leader to just follow, since the node to
be followed is not necessary a leader anymore. Extend the code
that manages replication slots to the non-master nodes if they
are mentioned in at least one replicatefrom tag.
Add the 3rd configuration in order to be able to run cascading
replicas.
2015-12-30 18:33:23 +01:00
Oleksii Kliukin
d0c84c87ba Fix the formatting, add the missing changes to configuration files. 2015-12-09 13:56:37 +01:00
Oleksii Kliukin
14b8dfa3e8 Make create_replica_method a YAML array.
Make sure the absense of this key or empty value in it is handled
correctly. Update tests and sample configuration files.
2015-11-25 10:29:17 +01:00
Oleksii Kliukin
f3d9edb57f also add -p 1 to the restore commands provided with sample yaml files. 2015-11-24 15:32:44 +01:00
Oleksii Kliukin
87a5646ad0 Merge branch 'restore/movebasebackup' of https://github.com/pgexperts/patroni into pgexperts-restore/movebasebackup 2015-11-16 12:04:32 +01:00
Alexander Kukushkin
57f19fb149 Merge pull request #80 from zalando/feature/nofailover
Feature/nofailover
2015-11-16 10:21:56 +01:00
Alexander Kukushkin
f0a6c86caa Make it possible to specify custom options for initdb
In the initial implementation we were using the only option
--encoding=UTF8. In order to have pg_rewind working with postgresql-9.3
we have to enable data-checksums. The naive approach was to enable it
globaly but taking into account some performance degradation it's better
not to do it but make it possible to configure it.

In addition to that fix all problems with setting up password of default
postgres user: execute CREATE ROLE | ALTER ROLE depending on content of
pg_authid
2015-11-11 15:59:34 +01:00
Josh Berkus
30aa83c5b2 Fixed failing tests, pep8 issues. 2015-11-02 17:51:01 -08:00