75 Commits

Author SHA1 Message Date
Oleksii Kliukin
b165183503 Reset is_leader status on demote (#777)
Make sure demoted cluster member stops responding with code 200 on the  /master API call.

Issue a new minor release.

Fixes https://github.com/zalando/patroni/issues/776
2018-08-14 17:08:08 +02:00
Oleksii Kliukin
5e7345a2ca Release notes 1.4.5 (#762)
bump version update release notes
2018-08-03 17:02:11 +02:00
alago197
936a4238fb Update some descriptions for the REST API endpoints (#729)
* Update some descriptions for the REST API endpoints

By @alago197
2018-07-10 15:40:53 +02:00
Oleksii Kliukin
41e5f58f2b Describe synchronous_mode_strict (#710)
* Describe synchronous_mode_strict

Per https://github.com/zalando/patroni/issues/709
2018-06-13 11:12:22 +02:00
erthalion
3d80e49b38 Rename also in settings docs 2018-06-12 13:28:30 +02:00
erthalion
d037aa8afd Rename create_replica_method to create_replica_methods
To make it clear that it's actually an array
2018-06-12 11:33:13 +02:00
Alexander Kukushkin
1de7c78c04 Release 1.4.4 (#683)
bump version and update release notes
2018-05-22 14:46:19 +02:00
Oleksii Kliukin
4ce539ba1b Allow options to the basebackup built-in method. (#604)
Options should be specified in the basebackup section, which is optional.
2018-05-18 12:18:35 +02:00
Kostiantyn Nemchenko
3110090154 Minor corrections to the documentation. (#654) 2018-04-16 15:46:46 +02:00
Reinhard Nägele
20138af37a Link to official Helm chart (#660)
Changes the link from my outdated fork to the official Helm chart which is now up to date.
2018-04-16 15:45:53 +02:00
Dave Cramer
38ad394308 Use the word primary in favour of master (#663)
Primary is a better alternative.
2018-04-16 01:29:51 +02:00
Don Seiler
140618abd2 Missing a word (#647)
In re Issue #639
2018-04-04 13:40:46 +02:00
Josh Berkus
3c05e2e984 Added references to the Slack channel in Readme and in contributing.rst. (#653) 2018-04-04 13:39:43 +02:00
bradnicholson
ca679a93b8 Make deleting recovery.conf optional. (#638)
pgBackRest's restore command generates the appropriate recovery.conf based
on the parameters you provide to pgBackRest.  When calling pgBackRest's restore command
via Patroni's custom bootstrap, it deletes that recovery.conf.  Specifying the recovery.conf
information in the patroni.yml is less than ideal.  It prevent's leveraging pgBackRests
work to ensure recovery.conf files are properly generated.  It also can lead to transient
config data in the patroni.yml under certain restore cases, such as a PITR restore
of Cluster B to  Cluster A, where the restore_commnand in A needs to reference B.

The parameter is optional.  The default behavior is to delete the recovery.conf.

Fixes https://github.com/zalando/patroni/issues/637
2018-03-09 15:35:29 +01:00
Alexander Kukushkin
f500dbb0ff Release 1.4.3 (#635)
Bump version and update release notes
2018-03-05 10:10:17 +01:00
Andy Newton
f748de3b29 Make log level configurable from environment variables (#622)
* `PATRONI_LOGLEVEL` - sets the general logging level
* `PATRONI_REQUESTS_LOGLEVEL` - sets the logging level for all HTTP requests e.g. Kubernetes API calls
2018-03-05 09:50:45 +01:00
Alexander Kukushkin
c95dd990cc Release 1.4.2 (#619)
* Bump version to 1.4.2
* Update release notes
2018-01-30 16:44:42 +01:00
Alexander Kukushkin
a1e5c8e1cb A few iprovements in patronictl (#601)
* make switchover work with an old patroni
* exclude leader from candidates when interactively running failover
2018-01-17 15:33:08 +01:00
Oleksii Kliukin
4202ad853a Minor corrections to the documentation. (#599) 2018-01-10 16:10:12 +01:00
Oleksii Kliukin
84d804e579 Release notes 1.4 (#597)
Document  Kubernetes parameters, environment variables. Describe how Patroni uses Kubernetes.
2018-01-10 11:17:08 +01:00
Oleksii Kliukin
d14d9f669a Document pip-related installation options. (#595)
* Remove redundant requirements of Mac OS.

* Clarify how to run the example in getting started.
2018-01-08 13:59:31 +01:00
Alexander Kukushkin
b6425cab85 Allow to specify multiple hosts for etcd (#589)
This list will be used for initial discovery of etcd cluster members.
If for some reason during work this list of hosts has been exhausted (during work), Patroni will return to initial list.

In addition to that improve ipv6 compatibility by using a special function for splitting host and port.

Fixes https://github.com/zalando/patroni/issues/523
2018-01-04 10:25:06 +01:00
Alexander Kukushkin
062c55f99c Update readthedocs config (#580)
* Get Patroni version from patroni/version.py
* Update copyright to match with the LICENSE file

Fixes https://github.com/zalando/patroni/issues/519
2017-12-20 14:28:12 +01:00
Alexander Kukushkin
bd847fd2cc Patronictl extended info (#567)
* Show information about scheduled failover and maintenance mode when showing list of cluster members. Fixes https://github.com/zalando/patroni/issues/557

* Fix postgres version check functions (postgres 10 and above compatibility) and apply pep8 formatting to the tests.
* Bump some configuration parameters to match with postgres 10 defaults.
* Fix name of contributor in release notes.
2017-11-28 12:10:05 +01:00
Alexander Kukushkin
a89a902f4a Bump version and write release notes (#560)
and implement missing unit-tests
2017-11-10 11:48:50 +01:00
Alexander Kukushkin
2e86fe5991 Consul dc (#559)
Make it possible to specify dc for consul as PATRONI_CONSUL_DC environment variable and update documentation accordingly.
2017-11-10 11:21:47 +01:00
Alexander Kukushkin
7c000f1519 Update releases.rst 2017-10-12 15:03:13 +02:00
Alexander Kukushkin
1e856e4ec6 Update release notes 2017-10-12 15:03:13 +02:00
Alexander Kukushkin
ae1a8f8942 Update release notes 2017-10-12 15:03:13 +02:00
Alexander Kukushkin
8e9c62d002 Make it possible to change Consul session checks (#543)
If list of checks is not specified, Consul will use "serfHealth" in addition to TTL based created by Patroni.
There are some cases when people want to sacrifice fast detection of network partitioning in favor of ability to tolerate network lags.

Fixes https://github.com/zalando/patroni/issues/522
2017-10-12 15:01:31 +02:00
Alexander Kukushkin
3919b322f4 Release 1.3.4 (#515)
Fix documentation and update release notes
2017-09-08 10:56:09 +02:00
Alexander Kukushkin
5ef01cfdfa Advanced configuration for Consul (#506)
* possibility to specify client certs and cacert
* possibility to specify token
* compatibility with python-consul-0.7.1
2017-08-24 07:56:12 +02:00
Oleksii Kliukin
895e46885a Patroni 1.3
- add release notes
- update the version
2017-07-27 15:58:31 +02:00
Ants Aasma
70d718a058 Simplify watchdog code (#452)
* Only activate watchdog while master and not paused

We don't really need the protections while we are not master. This way
we only need to tickle the watchdog when we are updating leader key or
while demotion is happening.

As implemented we might fail to notice to shut down the watchdog if
someone demotes postgres and removes leader key behind Patroni's back.
There are probably other similar cases. Basically if the administrator
if being actively stupid they might get unexpected restarts. That seems
fine.

* Add configuration change support. Change MODE_REQUIRED to disable leader eligibility instead of closing Patroni.

Changes watchdog timeout during the next keepalive when ttl is changed. Watchdog driver and requirement can also be switched online.

When watchdog mode is `required` and watchdog setup does not work then the effect is similar to nofailover. Add watchdog_failed to status API to signify this. This is True only when watchdog does not work **AND** it is required.

* Reset implementation when config changed while active.

* Add watchdog safety margin configuration

Defaults to 5 seconds. Basically this is the maximum amount of time
that can pass between the calls to odcs.update_leader()` and
`watchdog.keepalive()`, which are called right after each other. Should
be safe for pretty much any sane scenario and allows the default
settings to not trigger watchdog when DCS is not responding.

* Cancel bootstrap if watchdog activation fails

The system would have demoted itself anyway the next HA loop. Doing it
in bootstrap gives at least some other node chance to try bootstrapping
in the hope that it is configured correctly.

If all nodes are unable to activate they will continue to try until the
disk is filled with moved datadirs. Perhaps not ideal behavior, but as
the situation is unlikely to resolve itself without administrator
intervention it doesn't seem too bad.
2017-07-27 12:16:11 +02:00
Oleksii Kliukin
895eefaa51 Document bootstrapping and replica creation (#478)
Describe parameters around custom replica creation and bootstrap
2017-07-19 12:25:50 +02:00
jouir
4ca94a5dab Add config_dir option for configuration files location (#466)
On debian, the configuration files (postgresql.conf, pg_hba.conf, etc) are not stored in the data directory. It would be great to be able to configure the location of this separate directory. Patroni could override existing configuration files where they are used to be.

The default is to store configuration files in the data directory. This setting is targeting custom installations like debian and any others moving configuration files out of the data directory.

Fixes #465
2017-07-04 16:14:17 +02:00
Alexander Kukushkin
b576e69362 Manage pg_hba.conf via patroni config or dynamic_configuration (#458)
So far Patroni was populating pg_hba.conf only when running bootstrap code and after that it was not very handy to manage it's content, because it was necessary to login to every node, change pg_hba.conf manually and run pg_ctl reload.

This commit intends to fix it and give Patroni control over pg_hba.conf. It is possible to define pg_hba.conf content via `postgresql.pg_hba` in the patroni configuration file or in the `DCS/config` (dynamic configuration).

If the `hba_file` is defined in the `postgresql.parameters`, Patroni will ignore `postgresql.pg_hba`.
2017-06-23 12:38:25 +02:00
Alexander Kukushkin
681b6b507b Support unix sockets when connecting to a local postgres cluster (#457)
For backward compatibility this feature is not enabled by default. To enable it you have to set `postgresql.use_unix_socket: true`.
If feature is enable, and `unix_socket_directories` is defined and non empty, Patroni will use the first suitable value from it to connect to the local postgres cluster.
If the `unix_socket_directories` is not defined, Patroni will assume that default value should be used and will not pass `host` to command line arguments and omit it from connection url.

Solves: https://github.com/zalando/patroni/issues/61

In addition to mentioned above, this commit solves couple of bugs:
* manual failover with pg_rewind in a pause state was broken
* psycopg2 (or libpq, I am not really sure what exactly) doesn't mark cursors connection as closed when we use unix socket and there is an `OperationalError` occurs. We will close such connection on our own.
2017-06-22 11:47:57 +02:00
Ants Aasma
a70b46ef13 Add watchdog support on Linux (#343)
Ensures that system gets rebooted before TTL runs out.

Initial version. Open questions:

    Do we want to disable watchdog while we are not master?
2017-06-01 16:53:46 +02:00
Oleksii Kliukin
2559ba8ca2 Release notes for version 1.2
In addition to writing the nodes, modify the way changes are presented, adding some custom CSS.
2017-01-05 11:31:26 +01:00
Oleksii Kliukin
fb89e75ce4 Make patroni documentation available on patroni.readthedocs.io. (#373)
Run sphnix-quickstart and some workarounds.
Sphinx is a logical choice because our docs is already in .rst.
2016-12-20 18:22:57 +01:00
Alexander Kukushkin
574e1dba04 Update diagram 2016-12-20 16:32:00 +01:00
Alejandro Martínez
27dd9b4c6e Add HA loop diagram (#352) 2016-12-20 16:26:45 +01:00
Alexander Kukushkin
d138a8db17 AT for master_start_timeout + minor fixes (#361) 2016-12-09 12:02:41 +01:00
Ants Aasma
1290b30b84 Introduce starting state and master start timeout. (#295)
Previously pg_ctl waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover or a crash recovery or a crash recovery interrupted by failover and a missed rewind.

This change adds a master_start_timeout parameter and introduces a new state for the main run_cycle loop: starting. When master_start_timeout is zero we will fail over as soon as there is a failover candidate. Otherwise PostgreSQL will be started, but once master_start_timeout expires we will stop and release leader lock if failover is possible. Once failover succeeds or fails (no leader and no one to take the role) we continue with normal processing. While we are waiting for the master timeout we handle manual failover requests.

* Introduce timeout parameter to restart.

When restart timeout is set master becomes eligible for failover after that timeout expires regardless of master_start_time. Immediate restart calls will wait for this timeout to pass, even when node is a standby.
2016-12-08 14:44:27 +01:00
Alexander Kukushkin
b299b12f58 Varios configuration parameters for etcd (#358)
* Add https and auth support for etcd

Also implement support of PATRONI_ETCD_URL and PATRONI_ETCD_SRV
environment variables

* Implement etcd.proxy etcd.cacert, etcd.cert and etcd.key support

Now it should be possible to set up fully encrypted connection to etcd
with authorization.
2016-12-06 16:40:21 +01:00
Ants Aasma
7e53a604d4 Add synchronous replication support. (#314)
Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions.

To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over.

We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions.

On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS.

When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock.

Added acceptance tests and documentation.

Implementation by @ants with extensive review by @CyberDem0n.
2016-10-19 16:12:51 +02:00
Alejandro Martínez
48a6af6994 Add post_init configuration parameter on bootstrap (#296)
* Add bootstrap post_init configuration parameter
* Add documentation

By @zenitraM
2016-09-28 15:42:23 +02:00
Alexander Kukushkin
39d16fe2f9 Merge pull request #281 from CartoDB/feature/add_custom_conf_location
Add configuration parameter to specify a path to a custom postgresql.base.conf and disable its backup
2016-09-05 13:53:04 +02:00
Alejandro Martínez
f58ff3a96f Document custom_conf parameter 2016-09-01 17:59:47 +02:00