Commit Graph

1673 Commits

Author SHA1 Message Date
Sergey Dudoladov
4cc095f913 first commit 2019-09-12 16:58:07 +02:00
anikin-aa
3937a8d4fc Fix status code for GET /replica, when replica is starting (#1152)
Close #772, #1128
2019-08-26 11:18:13 +02:00
Soulou
53d32f1457 Allow lower values for postgresql configuration (#1148)
* Default values have not been changed
* These minimal values still work properly to boot a (small) cluster

Fixes #1142
2019-08-26 10:48:36 +02:00
Alexander Kukushkin
0a1d9b0a25 Get rid from distutils module dependency (#1146)
We are using only one function from there, `find_executable()` and it is better to implement a similar function in Patroni rather than add `distutils` module into requirements.txt
2019-08-26 09:38:47 +02:00
Alexander Kukushkin
3aa3bc3237 Pass statement_timeout=0 in PGOPTIONS when doing pg_rewind (#1155)
It might happen that statement_timeout on the server is set to some small value and one of the statements executed by pg_rewind is canceled.

I already proposed a patch fixing the pg_rewind itself, but it also would be good to have a workaround in Patroni.
2019-08-26 08:43:05 +02:00
msvechla
0d0c4c0a30 Add PATRONICTL_CONFIG_FILE Environment Variable (#1150)
add a `PATRONICTL_CONFIG_FILE` environment variable, which allows configuring the --config-file flag from the environment.
2019-08-26 08:42:24 +02:00
Alexander
e9a5d25ef3 Synchronous commit is disabled for rewind user GRANTs (#1145)
SET local synchronous_commit = 'local' before running GRANT
2019-08-23 17:03:02 +02:00
Will Colton
0f7c8b7b09 Fix a command in the docker readme. (#1138)
Fixes #1139
2019-08-06 15:49:32 +02:00
Alexander Kukushkin
278bf9852b Release 1.6.0 (#1131)
* Implement missing tests and do a few minor fixes
* Bump version to 1.6.0
* Update release notes
v1.6.0
2019-08-05 15:08:04 +02:00
Alexander Kukushkin
0b1b1e3b54 Compatibility with postgresql 12 (#1068)
* use `SHOW primary_conninfo` instead of parsing config file on pg12
* strip out standby and recovery parameters from postgresql.auto.conf before starting the postgres 12

Patroni config remains backward compatible.
Despite for example `restore_command` converted to a GUC starting from postgresql 12, in the Patroni configuration you can still keep it in the `postgresql.recovery_conf` section.
If you put it into `postgresql.parameters.restore_command`, that will also work, but it is important not to mix both ways:
```yaml
# is OK
postgresql:
  parameters:
    restore_command: my_restore_command
    archive_cleanup_command: my_archive_cleanup_command

# is OK
postgresql:
  recovery_conf:
    restore_command: my_restore_command
    archive_cleanup_command: my_archive_cleanup_command

# is NOT ok
postgresql:
  parameters:
    restore_command: my_restore_command
  recovery_conf:
    archive_cleanup_command: my_archive_cleanup_command
```
2019-08-02 16:00:55 +02:00
Alexander Kukushkin
4a24b79b73 IPv6 support (#1122)
fixes https://github.com/zalando/patroni/issues/1121
2019-08-02 11:34:29 +02:00
Rafia Sabih
5cc3afc037 Enhance dialogues for scheduled switchover and restart (#1119)
Enhance dialogue for switchover and restart

In case of schedule switchover or restart, mention time if any, when confirming.
2019-08-02 11:21:26 +02:00
Don Seiler
5cb7d1bdc1 Grammar fixes for SETTINGS.rst (#1106) 2019-07-26 09:34:42 +02:00
Alexander Kukushkin
9a94c54cb0 Do a smart comparison of actual and desired primary_conninfo (#1089)
On every replica Patroni periodically opens the recovery.conf file and checks that it contains the correct primary_conninfo.
So far the correctness check was pretty dumb, it was basically doing a full match of strings. That could lead to the restart of the replica when Patroni is "joining" already running postgres process.
Instead of using string comparison we parse the actual primary_conninfo value from the file and check that all parameters are matching with the desired value.

In addition to that we stop reading and parsing recovery.conf of every iteration if the modification time didn't change.
2019-07-01 13:01:45 +02:00
Jan Tomsa
7d1a5cad03 Allow to specify consul consistency mode (#1094)
Allow users to specify consul consistency mode.
This option will be passed to the Consul client as kwargs https://github.com/zalando/patroni/blob/master/patroni/dcs/consul.py#L213.
The library will then enforce the selected consistency level https://python-consul.readthedocs.io/en/latest/#consul

More about consistency mode here https://www.consul.io/api/features/consistency.html
2019-07-01 11:02:26 +02:00
Jan Tomsa
1d3fd3ac0b Enable debug logging for GET/OPTIONS API calls together with latency 2019-07-01 10:50:39 +02:00
wilfriedroset
47c3dc2352 patronictl checks if --config value exists (#736) (#1092)
Be verbose about configuration file when the given filename does not exists instead of ignoring silently (which can lead to misunderstanding)
2019-06-27 14:58:07 +02:00
wilfriedroset
78999a2d62 Add fallback value for editor_cmd (#532) (#1091)
Fixes #532
2019-06-27 14:56:57 +02:00
Alexander Kukushkin
1a6db4f5af Reverse logic around checkpoint_after_promote (#1084)
It will be set to false in the JSON only until the checkpoint actually happened.

The next improvement of bba9066315
2019-06-17 10:42:31 +02:00
Alexander Kukushkin
fad6d26a3a Small refactoring of postgresql/bootstrap (#1086)
the main purpose of this PR is simplifying #1068

It is mostly necessary for future support of pg12, where there will be no recovery.conf anymore, but `keep_existing_recovery_conf` parameter still needs to be supported due to backward compatibility.
2019-06-17 10:41:13 +02:00
Alexander Kukushkin
37f03790cc Implement two-step logging (#1080)
A few times we observed that Patroni HA loop was blocked for a few minutes due to not being able to write logs to stderr. This is a very rare condition which we hit so far only on k8s. This commit makes Patroni resilient to such kind of problems. All log messages first are written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread.

The maximum queue size is configurable and the default value is 1000. This should be enough to keep more than one hour of log messages with default settings and when Patroni cluster operates normally (without big issues).

In case if we hit the maximum size of the queue further logs will be discarded until the queue size will be reduced. The number of discarded messages will be reported into the log later.

In addition to that, the number of non-flushed and discarded messages (if there are any), will be reported via Patroni REST API as:
```json
"logger_queue_size": X,
"logger_records_lost": Y`
```
2019-06-13 14:18:49 +02:00
Alexander Kukushkin
83e62c2723 Simplify postgresql.reload_config (and fix some bugs) (#1087)
`compare_values` function is already smart enough to handle all tricky cases when the unit is not in `B` or `kB`, but also in blocks. Therefore we don't need to query `wal_segment_size` from pg_settings.
2019-06-13 14:01:37 +02:00
Daniel Kucera
638b560cf8 log exceptions caught in Retry (#1081)
Log the final exception when either the number of attempts or the timeout were reached, it will hopefully help to debug some issues when communication to DCS fails
2019-06-11 15:27:14 +02:00
wilfriedroset
2384d9e735 Add API route /health (#1079)
close #119
2019-06-11 15:22:52 +02:00
Kostiantyn Nemchenko
dcd605ebc8 Update existing_data.rst (#1071) 2019-06-11 15:15:48 +02:00
Alexander Kukushkin
75926b442e Open trust in pg_hba.conf during bootstrap to localhost (#1075)
previously it was open by mistake to unix_socket only

Fixes https://github.com/zalando/patroni/issues/1072
2019-06-11 15:14:57 +02:00
Alexander Kukushkin
3b90cc1931 Fix some small issues in the Dockerfile (#1074)
* symlink ~/.config/patroni/patronictl.yaml
* comment out rewind section
2019-05-28 14:47:46 +02:00
Alexander Kukushkin
b291291ff6 Make compare_values compatible with pg12 (#1061)
* integer variables could be specified as floats: work_mem = '30.1GB'
* memory-based units could be in B and MB
* time-based variables now allow microseconds (us)
* vacuum_cost_delay and autovacuum_vacuum_cost_delay converted to real with base unit ms
2019-05-21 16:20:22 +02:00
Alexander Kukushkin
a4bd6a9b4b Refactor postgresql class (#1060)
* Convert postgresql.py into a package
* Factor out cancellable process into a separate class
* Factor out connection handler into a separate class
* Move postmaster into postgresql package
* Factor out pg_rewind into a separate class
* Factor out bootstrap into a separate class
* Factor out slots handler into a separate class
* Factor out postgresql config handler into a separate class
* Move callback_executor into postgresql package

This is just a careful refactoring, without code changes.
2019-05-21 16:02:47 +02:00
Alexander Kukushkin
f1f2389146 A couple of small improvements in acceptance tests (#1057)
* Keep basebackup and wal_archive next to PGDATA in the data directory
* Test bootstrap of standby cluster nodes with custom scripts
2019-05-13 16:33:19 +02:00
Alexander Kukushkin
e54dfa508d Consider sync node as a healthy even when the former leader is ahead (#1059)
Fixes https://github.com/zalando/patroni/issues/1054
2019-05-13 16:32:53 +02:00
Alexander Kukushkin
4b48653d09 More standby cluster bugfixes (#1053)
1. use the default port is 5432 when only standby_cluster.host is defined
2. check that standby_cluster replica can be bootstrapped without connection to the standby_cluster leader against `create_replica_methods` defined in the `standby_cluster` config instead of the `postgresql` section.
3. Don't fallback to the create_replica_methods defined in the `postgresql` section when bootstrapping a member of the standby cluster.
4. Make sure we specify the database when connecting to the leader.
2019-05-13 14:19:22 +02:00
Alexander Kukushkin
c7db134a20 Make pyscopg2 version parsing more robust (#1052)
non-released version can have unparsable suffixes, for example 2.8.3.dev0
2019-05-13 14:18:56 +02:00
Alexander Kukushkin
bba9066315 Make it possible to run pg_rewind without superuser on pg11+ (#1035)
* expose the current patroni version in DCS
* expose `checkpoint_after_promote` flag in DCS as an indicator that pg_rewind could be safely executed
* other nodes will wait until this flag is set instead of connecting as superuser and issuing the CHECKPOINT
* define `postgresql.authention.rewind` with credentials for pg_rewind in patroni configuration files.
* create user for pg_rewind if postgres is 11+
* grant execute on functions required for pg_rewind to rewind user
2019-05-02 14:07:26 +02:00
vilajit
f6d29081c9 Enabling kerberos support (#1015)
* make it possible to create users without passwords
* put `krbsrvname` into the connection string if it is specified in the config
* update postgres?.yml example files to mention `krbsrvname`
2019-04-29 09:02:04 +02:00
Alexander Kukushkin
f0b784fe7f Manage pg_ident.conf with Patroni (#1037)
This functionality works similarly to the `pg_hba`:
If the `postgresql.pg_ident` is defined in the config file or DCS, Patroni will write its value to pg_ident.conf, however, if `postgresql.parameters.ident_file` is defined, Patroni will assume that pg_ident is managed from outside and not update the file.
2019-04-23 16:16:53 +02:00
Julien Riou
663026c34c Use SSLContext to wrap REST API socket (#1039)
Using `ssl.wrap_socket` is deprecated and was still allowing soon-to-be-deprecated protocols like TLS 1.1.
Now using `SSLContext.create_default_context()` to produce a secure SSL context to wrap the REST API server's socket.
2019-04-23 11:23:22 +02:00
Alexander Kukushkin
51b085a76d Don't wait until the previous callback finish is kill failed (#1036)
Such wait was happening in the main thread and blocking HA loop.
After all the executor thread was doing absolutely the same.
2019-04-15 15:49:06 +02:00
Cameron Daniel
74893e7b84 Reload Consul config on SIGHUP (#1030)
It is especially useful when somebody is changing the value of `token`
2019-04-15 15:00:38 +02:00
Julien Riou
0e0b364302 Add read-only and read-write API routes (#1038)
Patroni is great behind a load balancer like HAProxy. API routes are handy to direct the traffic to a valid node depending on its role. We could have one route for writes directed to the primary node and one route for reads directed to the replicas. In a two nodes setup, when we lose a node, there's one host left: the primary. Then, the read traffic will be dropped, even if the primary can handle it. This commit adds read-only and read-write routes. Read-only route enables reads on the primary. Read-write route is an alias for '/master', '/primary' or '/leader' route.
2019-04-15 14:49:54 +02:00
Alexander Kukushkin
51b02e65a5 Don't set archive_mode to off during the custom bootstrap (#1025)
As @ants complained on Slack, it breaks the workflow when the same location is used for bootstrap and further archiving.
For the pg_upgrade we can achieve the same results by modifying postgres config in the upgrade script, which is part of spilo.
2019-04-15 14:31:02 +02:00
Alexander Kukushkin
7c0c9599fc Remove psycopg2 from requirements (#1023)
Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation.

This PR also changes the following behavior:
* `pip install patroni` will fail if psycopg2 is not installed
* Patroni will check psycopg2 upon start and fail if it can't be found or outdated.

Closes https://github.com/zalando/patroni/issues/1021
2019-04-15 14:30:16 +02:00
joe-tss
cf0d712450 Grant deletecollection to patroni serviceaccount (#1033)
Closes #1032
2019-04-11 15:17:06 +02:00
Sharoon Thomas
d418a9449e scheduled_at needs a value for manual_failover (#1024)
The variable scheduled_at may be undefined for a manual failover where there is an exception raised trying to switchover with REST API.
2019-04-08 12:35:15 +02:00
Alexander Kukushkin
6909ce0c7a Release 1.5.6 (#1020)
* Update release notes
* Bump version
v1.5.6
2019-04-03 14:44:31 +02:00
Alexander Kukushkin
cd9d9ca0c3 Make sure we don't enforce ssl_version (#1010)
Fixes https://github.com/zalando/patroni/issues/1009
2019-04-02 16:49:32 +02:00
Alexander Kukushkin
a0a2da238e Couple of minor improvements (#1019)
1. Fix race condition on shutdown. It is very annoying when you cancel behave tests but postgres remains running.
2. Dump pg_controldata output to logs when "recovering" stopped postgres. It will help to investigate some annoying issues.
2019-04-02 16:49:21 +02:00
Pavlo Golub
b53a29c022 Fix unit-tests for Windows (#1014)
Closes #1013
2019-04-02 13:58:17 +02:00
Alexander Kukushkin
e38fe78b56 Fix callbacks behavior (mostly for standby cluster) (#998)
First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.
Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.

In addition to that, the REST API will return standby_leader role for the leader of the standby cluster.

Closes https://github.com/zalando/patroni/issues/988
2019-03-29 10:28:07 +01:00
Lukas Vogel
e059e30560 Ingore VSCode files (#1007)
Fixes #1008
2019-03-22 16:26:37 -04:00