Commit Graph

1627 Commits

Author SHA1 Message Date
Alexander Kukushkin
a0a2da238e Couple of minor improvements (#1019)
1. Fix race condition on shutdown. It is very annoying when you cancel behave tests but postgres remains running.
2. Dump pg_controldata output to logs when "recovering" stopped postgres. It will help to investigate some annoying issues.
2019-04-02 16:49:21 +02:00
Pavlo Golub
b53a29c022 Fix unit-tests for Windows (#1014)
Closes #1013
2019-04-02 13:58:17 +02:00
Alexander Kukushkin
e38fe78b56 Fix callbacks behavior (mostly for standby cluster) (#998)
First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.
Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.

In addition to that, the REST API will return standby_leader role for the leader of the standby cluster.

Closes https://github.com/zalando/patroni/issues/988
2019-03-29 10:28:07 +01:00
Lukas Vogel
e059e30560 Ingore VSCode files (#1007)
Fixes #1008
2019-03-22 16:26:37 -04:00
Alexander Kukushkin
680444ae13 Reduce lock time taken by dcs.get_cluster() (#989)
`dcs.cluster` and `dcs.get_cluster()` are using the same lock resource and therefore when get_cluster call is slow due to the slowness of DCS it was also affecting the `dcs.cluster` call, which in return was making health-check requests slow.
2019-03-12 22:37:11 +01:00
Alexander Kukushkin
92720882aa Reset is_leader flag for every removal of leader key (#990)
This is the next improvement of #777
2019-03-12 22:10:46 +01:00
Alexander Kukushkin
13c88e8b7a Replace self-execute with multiprocessing.Process (#994)
In addition to that transfer postmaster pid to Patroni process with the help of multiprocessing.Pipe instead of using stdin-stdout pipes.

Closes https://github.com/zalando/patroni/issues/992
2019-03-12 10:40:37 +01:00
Alexander Kukushkin
4a4258fc3f Mock external resources (#995)
unit tests should not accidentally hit running Postgres, DCS or filesystem unless we want it explicitly.
2019-03-12 10:39:42 +01:00
Ants Aasma
2204454094 Remove unnecessary usage of relpath (#1002)
`os.path.relpath` depends on being able to resolve the working directory.
This will fail if Patroni is started in a directory that is later unlinked from the filesystem, creating an unnecessary exception when loading from DCS.
2019-03-11 12:31:06 +01:00
Alexander Kukushkin
9e19b43869 Rename cluster name to demo (#1000)
* Assign hostname to haproxy container
* Tune vim config
2019-03-11 10:57:26 +01:00
Julien Tachoires
13f7aede61 Wait for callback end if it could not be killed (#985)
that could happen if the script is running under a sudo
2019-03-11 10:35:36 +01:00
Julien Tachoires
8a7ba57457 Clean target directory when pg_wal|pg_xlog is a symlink (#997) 2019-03-11 10:33:46 +01:00
Alexander Kukushkin
c64d51f79c Better support for static etcd cluster (#986)
if the `etcd.use_proxies` is set to true, Patroni will stick to the list of hosts specified in the `etcd.hosts` and avoid doing topology discovery. Such mode might be useful when you know that you connect to the etcd cluster via the set of proxies or when th etcd cluster has static topology.
2019-03-07 11:36:36 +01:00
Alexander Kukushkin
f0990532dc Update docker-compose demo cluster (#980)
1. Multi-stage build with an extensive cleanup of useless files and optional image compression
2. Start three-node etcd cluster
3. Start three-node Patroni cluster
4. One container with haproxy
5. All container names are prefixed with "demo-" and don't have suffixes
6. Decommission dev_patroni_cluster.sh script, docker-compose is now standard de-facto.
7. Provide more examples in the docker/README.md
2019-03-07 11:32:03 +01:00
Alexander Kukushkin
e670122c80 Use create_replica_methods from standby_cluster for replica bootstrap (#981)
It might happen that the standby cluster is configured to be created and replay WAL files from the different source than when it is running not in standby mode.  This is necessary to avoid writing WAL files and backups into the old place after promotion.

The easiest way to achieve such behavior is passing RemoteMember object to `Postgresql.clone` method instead of the usual Member object.
2019-02-21 11:37:50 +01:00
Alexander Kukushkin
c6e70a9910 Release 1.5.5 (#979)
* Bump version
* Update release notes
v1.5.5
2019-02-15 16:14:39 +01:00
Alexander Kukushkin
0ec1760397 Don't write primary_conninfo into recovery.conf for wal only standby cluster (#971)
It is useless and makes postgres to generate a lot of errors
2019-02-15 13:35:34 +01:00
Alexander Kukushkin
317721991b Fix handling of PATRONI_*_PASSWORD environment variables (#970)
Bug was introduced in https://github.com/zalando/patroni/pull/947 and https://github.com/zalando/patroni/pull/500
2019-02-15 13:35:22 +01:00
Alexander Kukushkin
0c516de147 Create headless service associated with $SCOPE-config endpoint (#958)
if there is no service defined k8s assumes that endpoint is orphaned and removes it.
Patroni tries to create the service only in case if use_endpoints is enabled if the following cases:
1. Upon start
2. When it tries to (re-)create the config endpoint

If for some reason creation of the service has failed, Patroni will retry it on every cycle of HA loop. Usually it fails due to lack of permissions and if you don't want to give such permissions to the service account used by Patroni, you can create the service explicitly in the deployment manifest.
2019-02-15 13:35:04 +01:00
Michael Banck
073074f83e Run coverage as python -m coverage (#968)
Depending on the platform the coverage binary might not always be available under the standard name.
2019-02-13 16:02:12 +01:00
Michael Banck
345e6d3131 Copy away output directories of failed acceptance tests. (#967)
And dump logs on travis from only failed features
2019-02-13 16:00:15 +01:00
Michael Banck
d01a9bdcd5 Change base port for acceptance tests from 5440 to 5360 (#966) 2019-02-13 15:59:13 +01:00
Alexander Kukushkin
10bbf0c3c5 Always use replication=1, otherwise it is considered a logical walsender (#952)
Fixes: https://github.com/zalando/patroni/issues/894
Fixes: https://github.com/zalando/patroni/issues/951
2019-01-30 12:38:51 +01:00
Alexander Kukushkin
4304560ce2 Adjust read timeout for leader watch blocking query (#950)
According to the Consul documentation the actual response timeout is increased by a small random amount of additional wait time added to the supplied maximum wait time to spread out the wake up time of any concurrent requests. It adds up to wait / 16 additional time to the maximum duration.
In our case we will add wait/15 or 1 second depending on what is bigger.

Fixes: https://github.com/zalando/patroni/issues/945
2019-01-30 12:38:24 +01:00
Alexander Kukushkin
254ee2acfc Show information about timelines in patronictl list (#949)
This information will help to detect stale replicas.

In addition to that Host will include ':{port}' if the port value isn't default or more than one member running on the same host.

Fixes: https://github.com/zalando/patroni/issues/942
2019-01-30 12:37:51 +01:00
Alexander Kukushkin
739329b590 Make it possible to automatically reinit the former master (#948)
If the pg_rewind is disabled or can't be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing.

So far Patroni was able to remove the data directory only after failed attempt to run pg_rewind. This commit fixes it.
If the `postgresql.remove_data_directory_on_diverged_timelines` is set, Patroni will wipe the data directory and reinitialize the former master automatically.

Fixes: https://github.com/zalando/patroni/issues/941
2019-01-30 12:37:21 +01:00
Étienne M
bd2c54581a Add ETCD_(PROTOCOL|USERNAME|PASSWORD) env variables (#947)
Fix #944
2019-01-30 12:36:50 +01:00
Maxim Ivanov
f0b12b7e2e Document create_replicas_methods in standby_cluster section (#939)
Fixes https://github.com/zalando/patroni/issues/935
2019-01-30 12:36:24 +01:00
Étienne M
93d157dea3 Document how to start Patroni with an existing data directory (#918) 2019-01-30 12:35:57 +01:00
Alexander Kukushkin
2c128520cf Python34 compatibility (#933)
and some other minor fixes.

Closes https://github.com/zalando/patroni/issues/932
2019-01-16 14:40:05 +01:00
Alexander Kukushkin
381a5b80d2 Release 1.5.4 (#931)
* Bump version
* Update release notes
* Make it possible to configure registration of Service in Consul via env variables
v1.5.4
2019-01-15 12:14:19 +01:00
Alexander Kukushkin
71dae6a905 Optionally consider node not healthy if it is not on the latest timeline (#892)
The latest timeline is calculated from the `/history` key in DCS. In case there is no such key or it contains some garbage we consider the node healthy.
Closes https://github.com/zalando/patroni/issues/890
2019-01-15 11:16:30 +01:00
Alexander Kukushkin
cf34fb3934 Relax requirements on superuser credentials (#930)
libpq allows opening connection without explicitly specifying neither username nor password. Depending on situation it would rely either on `pgpass` file or trust authentication method in pg_hba.conf.

Since pg_rewind is also using libpq, it could work the same way.

Fixes https://github.com/zalando/patroni/issues/928
2019-01-15 11:15:35 +01:00
Alexander Kukushkin
e080ded44b Make logging configurable via YAML file (#927)
It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process.
Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only
recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs.

Example of log section in the config file:
```yaml
log:
  dir: /where/to/write/patroni/logs  # if not specified, write logs to stderr
  file_size: 50000000  # 50MB
  file_num: 10  # keep history of 10 files
  dateformat: '%Y-%m-%d %H:%M:%S'
  loggers:  # increase log verbosity for etcd.client and urllib3
    etcd.client: DEBUG
    urllib3: DEBUG
```
2019-01-15 08:42:13 +01:00
jouir
dec3656f6e Redirect HTTPServer exceptions to logger (#900) (#925)
By default they were written to stdout
2019-01-15 08:37:06 +01:00
Alexander Kukushkin
1e2d89fa58 Apply 5 second backoff when loading global config up on start (#922)
It doesn't make much sense to hammer DCS when we just starting up.
Fixes https://github.com/zalando/patroni/issues/919
2019-01-14 14:55:56 +01:00
Alexander Kukushkin
994863c18d Refactor wait_for_user_backends_to_close method (#917)
1. Log only debug level messages on any kind of error
2. Update regexp for matching postgres aux processes to make it compatible with postgres 11

Fixes https://github.com/zalando/patroni/issues/914
2019-01-14 14:55:45 +01:00
Alexander Kukushkin
3fce982909 Set archive_mode to off during the custom bootstrap (#911)
We want to avoid archiving WALs and history files until the cluster is fully functional. It should really help if the custom bootstrap involves pg_upgrade.
2019-01-14 14:55:23 +01:00
Lucas Capistrant
d306092cbc Explicitly secure rw perms for recovery.conf at creation time (#910)
We don't want anybody except patroni/postgres user reading this file, it contains replication user and password.
2019-01-14 14:22:04 +01:00
Dmitry Dolgov
11f7ceb521 Do not check types of standby_cluster configuration (#924)
Simply allow valid keys
2019-01-14 14:16:15 +01:00
bradnicholson
05a13839aa Update replica_bootstrap.rst (#915)
Add some docs about replication slots for standby clusters
2019-01-14 12:57:21 +01:00
Étienne M
04ac199fc8 Single quotes are mandatory around each host in PATRONI_ETCD_HOSTS (#926)
Otherwise YAML parser fails
2019-01-14 11:56:15 +01:00
anikin-aa
0be8a9527b possibility to set logdatefmt via env. (#904)
The value could be configured via `PATRONI_LOGDATEFMT` environment variable. The default value is `%Y-%m-%d %H:%M:%S`
2019-01-14 08:46:38 +01:00
Pavel Kirillov
929ff08bfd Service deregister timeout must be in Go time format (#893) 2018-12-21 15:42:15 +01:00
Cody Coons
7bc8d0aac9 Removed stderr pipe to stdout on pg_ctl process (#896)
Inheriting stderr from the main Patroni process allows all Postgres logs to be seen along with all patroni logs. This is very useful in a container environment as Patroni and Postgres logs may be consumed using standard tools (docker logs, kubectl, etc).

In addition to that, this change fixes a bug with Patroni not being able to catch postmaster pid when postgres writing some warnings into stderr
2018-12-21 15:41:34 +01:00
Lucas Capistrant
f3da6de129 Add ability to configure app logs to be written to a file (#903)
It gives users the option to send Patroni application logs to a File instead of Standard Out. There are three environment variables that can be set to enable and configure file logging.
1. `PATRONI_FILE_LOG_DIR`: Path to a directory that is writeable by the executing user. Having this variable set is what activates file logging.
2. `PATRONI_FILE_LOG_NUM`: This is a rolling file logger. This variable dictates how many log files are retained.
3. `PATRONI_FILE_LOG_SIZE`: This variable dictates the size at which the logs will roll.

If `PATRONI_FILE_LOG_DIR` is not set than Patroni will log to stderr (default behavior does not change)

Closes https://github.com/zalando/patroni/issues/902
2018-12-21 15:38:29 +01:00
Alexander Kukushkin
491f230711 Release 1.5.3 (#889)
* Bump version
* Update release notes
v1.5.3
2018-12-03 17:12:53 +01:00
Alexander Kukushkin
f1d7ccf36e Make sure we refresh session at least once per HA loop (#880)
Fixes https://github.com/zalando/patroni/issues/879
2018-12-03 16:35:14 +01:00
Kostiantyn Nemchenko
96ea01bee4 Fix kubernetes demo files (#885)
- Update postgres docker image to the latest 11 version.

- Remove empty lines inside the `RUN` command to make the Dockerfile compatible with future docker versions.

- Set the `PATRONI_KUBERNETES_POD_IP` environment variable, which is required when _use_endpoints_ is enabled. Otherwise, the `KeyError` is raised [here](https://github.com/zalando/patroni/blob/master/patroni/dcs/kubernetes.py#L95).

- Set `EDITOR` environment variable to make configuration changes via `patronictl edit-config`.
2018-12-03 15:46:25 +01:00
Alexander Kukushkin
e684ca66e5 Compatibility with postgres 9.3 (#882)
use replication=1 instead of replication='database' when opening replication connection, 9.3 doesn't understand it
2018-11-30 15:46:05 +01:00