237 Commits

Author SHA1 Message Date
Alexander Kukushkin
13c88e8b7a Replace self-execute with multiprocessing.Process (#994)
In addition to that transfer postmaster pid to Patroni process with the help of multiprocessing.Pipe instead of using stdin-stdout pipes.

Closes https://github.com/zalando/patroni/issues/992
2019-03-12 10:40:37 +01:00
Alexander Kukushkin
4a4258fc3f Mock external resources (#995)
unit tests should not accidentally hit running Postgres, DCS or filesystem unless we want it explicitly.
2019-03-12 10:39:42 +01:00
Alexander Kukushkin
739329b590 Make it possible to automatically reinit the former master (#948)
If the pg_rewind is disabled or can't be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing.

So far Patroni was able to remove the data directory only after failed attempt to run pg_rewind. This commit fixes it.
If the `postgresql.remove_data_directory_on_diverged_timelines` is set, Patroni will wipe the data directory and reinitialize the former master automatically.

Fixes: https://github.com/zalando/patroni/issues/941
2019-01-30 12:37:21 +01:00
Alexander Kukushkin
381a5b80d2 Release 1.5.4 (#931)
* Bump version
* Update release notes
* Make it possible to configure registration of Service in Consul via env variables
2019-01-15 12:14:19 +01:00
Alexander Kukushkin
0f666e69f3 Prefix system tables, views and functions with pg_catalog (#845)
and implement missing unit tests
2018-11-01 16:17:40 +01:00
Alexander Kukushkin
2efd97baab Permanent replication slots (#819)
Permanent replication slots are preserved on failover/switchover, that is Patroni on the new primary will create configured replication slots right after doing promote.

Slots could be configured with the help of `patronictl edit-config`.
The initial configuration could be also done in the `bootstrap.dcs`

```yaml
slots:
  permanent_physical_1:
    type: physical
  permanent_logical_1:
    type: logical
    database: foo
    plugin: pgoutput
```

It is the responsibility of the operator to make sure that there are no clashes in names between replication slots automatically created by Patroni for members and permanent replication slots.

Closes https://github.com/zalando/patroni/issues/656
2018-10-31 11:37:42 +01:00
Alexander Kukushkin
76d1b4cfd8 Minor fixes (#808)
* Use `shutil.move` instead of `os.replace`, which is available only from 3.3
*  Introduce standby-leader health-check and consul service
* Improve unit tests, some lines were not covered
* rename `assertEquals` -> `assertEqual`, due to deprecation warning
2018-09-19 16:32:33 +02:00
Dmitry Dolgov
dd7c3c349f [WIP] Standby cluster implementation (#679)
Implementation of "standby cluster" described in #657. Standby cluster consists
of a "standby leader", that replicates from a "remote master" (which is not a
part of current patroni cluster and can be anywhere), and cascade replicas,
that replicate from the corresponding standby leader. "Standby leader" behaves
pretty much like a regular leader, which means that it holds a leader lock in
DSC, in case if disappears there will be an election of a new "standby
leader".
One can define such a cluster using the section "standby_cluster" in patroni
config file. This section provides parameters for standby cluster, that will be
applied only once during bootstrap and can be changed only through DSC.
2018-09-07 10:10:56 +02:00
Dmitry Dolgov
f0d23b0b14 Merge pull request #706 from zalando/feature/rename-create-replica-method
Rename create_replica_method to create_replica_methods
2018-06-12 14:16:54 +02:00
Alexander Kukushkin
e939304001 Take and apply some parameters from controldata when starting as replica (#703)
* Take and apply some parameters from controldata when starting as replica

https://www.postgresql.org/docs/10/static/hot-standby.html#HOT-STANDBY-ADMIN
There is set of parameters which value on the replica must be not smaller than on the primary, otherwise replica will refuse to start:
* max_connections
* max_prepared_transactions
* max_locks_per_transaction
* max_worker_processes

It might happen that values of these parameters in the global configuration are not set high enough, what makes impossible to start a replica without human intervention. Usually it happens when we bootstrap a new cluster from the basebackup.

As a solution to this problem we will take values of above parameters from the pg_controldata output and in case if the values in the global configuration are not high enough, apply values taken from pg_controldata and set `pending_restart` flag.
2018-06-12 14:04:32 +02:00
Alexander Kukushkin
e405e4e03c Workaround to sporadic unit-test failures (#696)
Fixes https://github.com/zalando/patroni/issues/691
2018-06-12 14:00:10 +02:00
erthalion
d037aa8afd Rename create_replica_method to create_replica_methods
To make it clear that it's actually an array
2018-06-12 11:33:13 +02:00
Oleksii Kliukin
4ce539ba1b Allow options to the basebackup built-in method. (#604)
Options should be specified in the basebackup section, which is optional.
2018-05-18 12:18:35 +02:00
Alexander Kukushkin
5296336f4a BUGFIX: postmaster start can fail if pid from postmaster.pid is alive (#681)
Upon start postmaster process performs various safety checks if there is a postmaster.pid file in the data directory. Although Patroni already detected that the running process corresponding to the postmaster.pid is not a postmaster, the new postmaster might fail to start, because it thinks that postmaster.pid is already locked.
Important!!! Unlink of postmaster.pid isn't an option in this case, because it has a lot of nasty race conditions.
Luckily there is a workaround to this problem, we can pass the pid from postmaster.pid in the `PG_GRANDPARENT_PID` environment variable and postmaster will ignore it.

More likely to hit such problem if you run Patroni and postgres in the docker container.
2018-05-18 11:18:27 +02:00
Alexander Kukushkin
3afd26101b Single user mode was waiting for user input and never finish (#634)
Regression was introduced in https://github.com/zalando/patroni/pull/576
2018-03-02 22:22:43 +01:00
Alexander Kukushkin
c04e7a1798 Write bootstrap.pg_hba into a pg_hba.conf after custom bootstrap (#632)
Fixes https://github.com/zalando/patroni/issues/631
2018-02-26 18:48:56 +01:00
Alexander Kukushkin
03c2a85d23 Expose current timeline in DCS and via API (#591)
It is very easy to get current timeline on the master by executing
```sql
SELECT ('x' || SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int
```

Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master.

Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are:
* Second column is the absolute wal position in bytes, instead of LSN
* Optionally there might be a fourth column - timestamp, (mtime of history file)
2018-01-05 15:25:56 +01:00
Alexander Kukushkin
2b8618b027 Minimize amount of SELECTS issued by Patroni on every loop (#584)
Every iteration of HA loop Patroni needs to call pg_is_in_recovery() and calcualte absolute wal_position. It was doing two separate SELECT statements for that. In case of master it was doing even three queries (wal_position two times).
We will issue one SELECT for every HA loop and cache the results.
2018-01-04 11:17:43 +01:00
Alexander Kukushkin
0e01bb33bb Improve patronictl reinit (#576)
Make it possible to cancel a running task if you want to reinitialize replica.
There are two possible ways to trigger it:
1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed
2. if you are using `--force` argument with `patronictl reinit`
2018-01-04 10:31:44 +01:00
Alexander Kukushkin
bd847fd2cc Patronictl extended info (#567)
* Show information about scheduled failover and maintenance mode when showing list of cluster members. Fixes https://github.com/zalando/patroni/issues/557

* Fix postgres version check functions (postgres 10 and above compatibility) and apply pep8 formatting to the tests.
* Bump some configuration parameters to match with postgres 10 defaults.
* Fix name of contributor in release notes.
2017-11-28 12:10:05 +01:00
Ants Aasma
5da0e12353 Factor out postmaster process (#561)
Introduces a PostmasterProcess object that identifies a running process via pid and start time.
When pid file is parsed and the correct process identified this object is passed around.
When the process goes away we try to find a new one in case somebody restarted postgres behind our back.
2017-11-23 14:36:23 +01:00
Alexander Kukushkin
a89a902f4a Bump version and write release notes (#560)
and implement missing unit-tests
2017-11-10 11:48:50 +01:00
Ants Aasma
7367b7c74a Verify process start time when checking if postgres is running. (#549)
After a crash that doesn't clean up postmaster.pid there could be a new process with the same pid resulting in a false positive for is_running(), which will lead to all kinds of bad behavior.

Fixes #548
2017-11-09 15:36:05 +01:00
Alexander Kukushkin
34db670331 Improve test coverage 2017-10-12 15:03:13 +02:00
Alexander Kukushkin
cfdda23e27 Fix pg_rewind behaviour (#524)
When Patroni does calculation whether it should run pg_rewind or not, it relies on pg_controldata output or gets necessary information from replication connection.
On some cases (when for example postgres running as a master was killed), we can't use pg_controldata output immediately, but trying to start postgres. Such start could fail with the following errror:
```
LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
LOG,00000,"database system was interrupted; last known up at 2017-09-16 22:35:22 UTC",,,,,,,,,""
LOG,00000,"restored log file ""00000006.history"" from archive",,,,,,,,,""
LOG,00000,"entering standby mode",,,,,,,,,"" 2017-09-18 08:00:39.433 UTC,,,57,,59bf7d26.39,4,,2017-09-18 08:00:38 UTC,,0,LOG,00000,"restored log file ""00000006.history"" from archive",,,,,,,,,""
FATAL,XX000,"requested timeline 6 is not a child of this server's history","Latest checkpoint is at 29/1A000178 on timeline 5, but in the history of the requested timeline, the server forked off from that timeline at 29/1A000140.",,,,,,,,""
LOG,00000,"startup process (PID 57) exited with exit code 1",,,,,,,,,""
LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
LOG,00000,"database system is shut down",,,,,,,,,""
```
In this case controldata will still have `Database cluster state: in production`
All further attempts to start postgres will fail. Such situation could be fixed only if we start not in recovery. For safety we will do it in a single user mode.

The second problems is: if postgres was running as master, but later we started it and stopped, than pg_controldata will report:
```
Database cluster state:               shut down in recovery
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
```

And this info can't be used for calculations. In this case we should use
`Latest checkpoint location` and `Latest checkpoint's TimeLineID`
2017-09-29 14:21:19 +02:00
Alexander Kukushkin
8a584f7a61 Set pgpass explicitly to /tmp/pgpass0 when running unit-tests (#518)
If $HOME is set to a non-existing directory (which would e.g. be the case on an official Debian package autobuilder) some tests were failing
2017-09-12 16:07:20 +02:00
Alexander Kukushkin
6300ec4dbf Implement missing tests for watchdog (#487)
and fix one bug
2017-07-27 12:41:46 +02:00
Ants Aasma
70d718a058 Simplify watchdog code (#452)
* Only activate watchdog while master and not paused

We don't really need the protections while we are not master. This way
we only need to tickle the watchdog when we are updating leader key or
while demotion is happening.

As implemented we might fail to notice to shut down the watchdog if
someone demotes postgres and removes leader key behind Patroni's back.
There are probably other similar cases. Basically if the administrator
if being actively stupid they might get unexpected restarts. That seems
fine.

* Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni.

Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online.

When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work **AND** it is required.

* Reset implementation when config changed while active.

* Add watchdog safety margin configuration

Defaults to 5 seconds. Basically this is the maximum amount of time
that can pass between the calls to odcs.update_leader()` and
`watchdog.keepalive()`, which are called right after each other. Should
be safe for pretty much any sane scenario and allows the default
settings to not trigger watchdog when DCS is not responding.

* Cancel bootstrap if watchdog activation fails

The system would have demoted itself anyway the next HA loop. Doing it
in bootstrap gives at least some other node chance to try bootstrapping
in the hope that it is configured correctly.

If all nodes are unable to activate they will continue to try until the
disk is filled with moved datadirs. Perhaps not ideal behavior, but as
the situation is unlikely to resolve itself without administrator
intervention it doesn't seem too bad.
2017-07-27 12:16:11 +02:00
Alexander Kukushkin
e2feac87bc Block callbacks during bootstrap (#483)
It wasn't a big issue when on_start was called during normal boostrap
with initdb, because usually such process is very fast. But situation is
changing when we run custom bootstrap, becuase it might be a long time
between cluster become connectable and end of recovery and promote.

Actually situation was even worse than that, on_start was called with
the `replica` argument and later on_role_changes was never called,
because promote wasn't performed by Patroni.

As a solution for this problem we will block any callbacks during
bootstrap and explicitly call on_start after leader lock was taken.
2017-07-24 14:19:19 +02:00
Alexander Kukushkin
cb360f089c Restart postgres after custom bootstrap if hba_file is defined in configuration (#482)
In addition to that always use absolute paths to config files.

Fixes https://github.com/zalando/patroni/issues/481
2017-07-22 09:46:05 +02:00
Alexander Kukushkin
d5b3d94377 Custom bootstrap (#454)
Task of restoring a cluster from backup or cloning existing cluster into a new one was floating around for some time. It was kind of possible to achieve it by doing a lot of manual actions and very error prone. So I come up with the idea of making the way how we bootstrap a new cluster configurable.

In short - we want to run a custom script instead of running initdb.
2017-07-18 15:12:58 +02:00
Alexander Kukushkin
acc6d7c2c2 Watchdog unit-tests, bugfixes and questions (#449)
Implement missing unit-tests for and drop unused code
2017-07-11 10:00:30 +02:00
jouir
4ca94a5dab Add config_dir option for configuration files location (#466)
On debian, the configuration files (postgresql.conf, pg_hba.conf, etc) are not stored in the data directory. It would be great to be able to configure the location of this separate directory. Patroni could override existing configuration files where they are used to be.

The default is to store configuration files in the data directory. This setting is targeting custom installations like debian and any others moving configuration files out of the data directory.

Fixes #465
2017-07-04 16:14:17 +02:00
Alexander Kukushkin
b576e69362 Manage pg_hba.conf via patroni config or dynamic_configuration (#458)
So far Patroni was populating pg_hba.conf only when running bootstrap code and after that it was not very handy to manage it's content, because it was necessary to login to every node, change pg_hba.conf manually and run pg_ctl reload.

This commit intends to fix it and give Patroni control over pg_hba.conf. It is possible to define pg_hba.conf content via `postgresql.pg_hba` in the patroni configuration file or in the `DCS/config` (dynamic configuration).

If the `hba_file` is defined in the `postgresql.parameters`, Patroni will ignore `postgresql.pg_hba`.
2017-06-23 12:38:25 +02:00
Alexander Kukushkin
681b6b507b Support unix sockets when connecting to a local postgres cluster (#457)
For backward compatibility this feature is not enabled by default. To enable it you have to set `postgresql.use_unix_socket: true`.
If feature is enable, and `unix_socket_directories` is defined and non empty, Patroni will use the first suitable value from it to connect to the local postgres cluster.
If the `unix_socket_directories` is not defined, Patroni will assume that default value should be used and will not pass `host` to command line arguments and omit it from connection url.

Solves: https://github.com/zalando/patroni/issues/61

In addition to mentioned above, this commit solves couple of bugs:
* manual failover with pg_rewind in a pause state was broken
* psycopg2 (or libpq, I am not really sure what exactly) doesn't mark cursors connection as closed when we use unix socket and there is an `OperationalError` occurs. We will close such connection on our own.
2017-06-22 11:47:57 +02:00
Alexander Kukushkin
5bd9aa7547 BUGFIX: pg_rewind wasn't working when data page checksum is not enabled (#456)
pg_controldata output depends on postgres major version and in some cases some of the parameters are prefixed by 'Current ' for old postgres versions.

Bug was introduced by commit 37c1552.
Fixes https://github.com/zalando/patroni/issues/455
2017-06-16 10:25:54 +02:00
Alexander Kukushkin
e3a01727a9 Implement missing tests and add pg-10 support to wale_restore(#446)
in addition to that get rid from two modules and fix formatting of tests
2017-05-22 12:01:02 +02:00
Alexander Kukushkin
cd84dc82b6 Implement postgresql-10 support (#444)
Mainly it handles rename of xlog to wal.
In the API and inside DCS it is still named xlog (for compatibility).

* Address feedback
2017-05-19 17:04:53 +02:00
Alexander Kukushkin
7633b19213 Support change of superuser and replication credentials on reload (#445)
Fixes: https://github.com/zalando/patroni/issues/353
and: https://github.com/zalando/patroni/issues/443
2017-05-19 16:32:35 +02:00
Alexander Kukushkin
37c1552c0a Smart pg_rewind (#417)
Previously we were running pg_rewind only in limited amount of cases:
 * when we knew postgres was a master (no recovery.conf in data dir)
 * when we were doing a manual switchover to a specific node (no
   guaranty that this node is the most up-to-date)
 * when a given node has nofailover tag (it could be ahead of new master)

This approach was kind of working in most of the cases, but sometimes we
were executing pg_rewind when it was not necessary and in some other
cases we were not executing it although it was needed.

The main idea of this PR is first try to figure out that we really need
to run pg_rewind by analyzing timelineid, LSN and history file on master
and replica and run it only if it's needed.
2017-05-19 16:32:06 +02:00
Alexander Kukushkin
44a7142a9d Synchronous mode strict (#435)
If synchronous_mode_strict==true then '*' will be written as synchronous_standby_names when that last replication host dies.
2017-04-27 14:32:15 +02:00
Alexander Kukushkin
1c5d5f1dae BUGFIX: pg_drop_replication_slot may not be called if slot is active (#427)
Default value of wal_sender_timeout is 60 seconds while we are trying to remove replication slot after 30 seconds (ttl=30). That means postgres might think that slot is still active and does nothing. Patroni at the same time was thinking that it was removed successfully.

If the drop replication slot query didn't return any single row we must fetch list of existing physical replication slots from postgres on the next iteration of HA loop.

Fixes: issue #425
2017-04-18 12:45:24 +02:00
Alexander Kukushkin
3ece35c0a6 Reassemble postgresql parameters when major version became known (#395)
* Reassemble postgresql parameters when major version became known

Otherwise we were writing some "unknown" parameters into postgresql.conf
and postgres was refusing to start. Only 9.3 was affected.

In addition to that move rename of wal_level from hot_standby to replica
into get_server_parameters method. Now this rename is handled in a
single place.

* Bump etcd and consul versions
2017-02-16 17:07:21 +01:00
Alexander Kukushkin
a5e79bce9d * bugfix: pass an arguments to a callback 2016-12-16 15:44:04 +01:00
Alexander Kukushkin
8c0712047e Serialize callback execution (#366)
If the previous callback is still running - kill it
Also it will fix a problem of zombie processes when executing callbacks from the main thread.
2016-12-16 14:29:53 +01:00
Alexander Kukushkin
1e984c3f00 Take a max from xlog_receive and xlog_replay (#363) 2016-12-12 16:27:36 +01:00
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Alexander Kukushkin
c6417b2558 Add postgres-9.6 support (#357)
starting from 9.6 we need wal_level = 'replica' which is alias for 'hot_standby'. It was working before without problems, but if somebody change wal_level to replica, Patroni will expose pending_restart flag, although restart in this case is not necessary.

* bump versions of consul and etcd to the latest for travis integration-tests
2016-11-25 12:35:01 +01:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alejandro Martínez
48a6af6994 Add post_init configuration parameter on bootstrap (#296)
* Add bootstrap post_init configuration parameter
* Add documentation

By @zenitraM
2016-09-28 15:42:23 +02:00