Using `ssl.wrap_socket` is deprecated and was still allowing soon-to-be-deprecated protocols like TLS 1.1.
Now using `SSLContext.create_default_context()` to produce a secure SSL context to wrap the REST API server's socket.
Patroni is great behind a load balancer like HAProxy. API routes are handy to direct the traffic to a valid node depending on its role. We could have one route for writes directed to the primary node and one route for reads directed to the replicas. In a two nodes setup, when we lose a node, there's one host left: the primary. Then, the read traffic will be dropped, even if the primary can handle it. This commit adds read-only and read-write routes. Read-only route enables reads on the primary. Read-write route is an alias for '/master', '/primary' or '/leader' route.
As @ants complained on Slack, it breaks the workflow when the same location is used for bootstrap and further archiving.
For the pg_upgrade we can achieve the same results by modifying postgres config in the upgrade script, which is part of spilo.
Recently released psycopg2 split into two different packages, psycopg2, and psycopg2-binary which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install psycopg2. There are a few options available and it is reflected in the documentation.
This PR also changes the following behavior:
* `pip install patroni` will fail if psycopg2 is not installed
* Patroni will check psycopg2 upon start and fail if it can't be found or outdated.
Closes https://github.com/zalando/patroni/issues/1021
1. Fix race condition on shutdown. It is very annoying when you cancel behave tests but postgres remains running.
2. Dump pg_controldata output to logs when "recovering" stopped postgres. It will help to investigate some annoying issues.
First of all, this patch changes the behavior of `on_start`/`on_restart` callbacks, they will be called only when postgres is started or restarted without role changes. In case if the member is promoted or demoted only the `on_role_change` callback will be executed. `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.
Before that `on_role_change` was never called for standby leader, only `on_start`/`on_restart` and with a wrong role argument.
In addition to that, the REST API will return standby_leader role for the leader of the standby cluster.
Closes https://github.com/zalando/patroni/issues/988
`dcs.cluster` and `dcs.get_cluster()` are using the same lock resource and therefore when get_cluster call is slow due to the slowness of DCS it was also affecting the `dcs.cluster` call, which in return was making health-check requests slow.
In addition to that transfer postmaster pid to Patroni process with the help of multiprocessing.Pipe instead of using stdin-stdout pipes.
Closes https://github.com/zalando/patroni/issues/992
`os.path.relpath` depends on being able to resolve the working directory.
This will fail if Patroni is started in a directory that is later unlinked from the filesystem, creating an unnecessary exception when loading from DCS.
if the `etcd.use_proxies` is set to true, Patroni will stick to the list of hosts specified in the `etcd.hosts` and avoid doing topology discovery. Such mode might be useful when you know that you connect to the etcd cluster via the set of proxies or when th etcd cluster has static topology.
1. Multi-stage build with an extensive cleanup of useless files and optional image compression
2. Start three-node etcd cluster
3. Start three-node Patroni cluster
4. One container with haproxy
5. All container names are prefixed with "demo-" and don't have suffixes
6. Decommission dev_patroni_cluster.sh script, docker-compose is now standard de-facto.
7. Provide more examples in the docker/README.md
It might happen that the standby cluster is configured to be created and replay WAL files from the different source than when it is running not in standby mode. This is necessary to avoid writing WAL files and backups into the old place after promotion.
The easiest way to achieve such behavior is passing RemoteMember object to `Postgresql.clone` method instead of the usual Member object.
if there is no service defined k8s assumes that endpoint is orphaned and removes it.
Patroni tries to create the service only in case if use_endpoints is enabled if the following cases:
1. Upon start
2. When it tries to (re-)create the config endpoint
If for some reason creation of the service has failed, Patroni will retry it on every cycle of HA loop. Usually it fails due to lack of permissions and if you don't want to give such permissions to the service account used by Patroni, you can create the service explicitly in the deployment manifest.
According to the Consul documentation the actual response timeout is increased by a small random amount of additional wait time added to the supplied maximum wait time to spread out the wake up time of any concurrent requests. It adds up to wait / 16 additional time to the maximum duration.
In our case we will add wait/15 or 1 second depending on what is bigger.
Fixes: https://github.com/zalando/patroni/issues/945
This information will help to detect stale replicas.
In addition to that Host will include ':{port}' if the port value isn't default or more than one member running on the same host.
Fixes: https://github.com/zalando/patroni/issues/942
If the pg_rewind is disabled or can't be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing.
So far Patroni was able to remove the data directory only after failed attempt to run pg_rewind. This commit fixes it.
If the `postgresql.remove_data_directory_on_diverged_timelines` is set, Patroni will wipe the data directory and reinitialize the former master automatically.
Fixes: https://github.com/zalando/patroni/issues/941
The latest timeline is calculated from the `/history` key in DCS. In case there is no such key or it contains some garbage we consider the node healthy.
Closes https://github.com/zalando/patroni/issues/890
libpq allows opening connection without explicitly specifying neither username nor password. Depending on situation it would rely either on `pgpass` file or trust authentication method in pg_hba.conf.
Since pg_rewind is also using libpq, it could work the same way.
Fixes https://github.com/zalando/patroni/issues/928
It allows changing logging settings in runtime by updating config and doing reload or sending `SIGHUP` to the Patroni process.
Important! Environment configuration names related to logging were renamed and documentation accordingly updated. For compatibility reasons Patroni still accepts `PATRONI_LOGLEVEL` and `PATRONI_FORMAT`, but some other variables related to logging, which were introduced only
recently (between releases), will stop working. I think it is ok, since we didn't release the new version yet and therefore it is very unlikely that somebody is using them except authors of corresponding PRs.
Example of log section in the config file:
```yaml
log:
dir: /where/to/write/patroni/logs # if not specified, write logs to stderr
file_size: 50000000 # 50MB
file_num: 10 # keep history of 10 files
dateformat: '%Y-%m-%d %H:%M:%S'
loggers: # increase log verbosity for etcd.client and urllib3
etcd.client: DEBUG
urllib3: DEBUG
```
1. Log only debug level messages on any kind of error
2. Update regexp for matching postgres aux processes to make it compatible with postgres 11
Fixes https://github.com/zalando/patroni/issues/914
We want to avoid archiving WALs and history files until the cluster is fully functional. It should really help if the custom bootstrap involves pg_upgrade.