Previously, patroni would die after receiving an exception
other than RetryFailedError, etcd.EtcdException from etcd.
We have observed an AttributeError raised by etcd on some
occasions. With this change, we demote ourselves, but not
terminate on such exceptions.
Some refactoring to reuse some codepaths.
A dsn option is now added, it is useful in scripts like so:
psql -d "$(patronictl dsn alpha)"
Restarting has been extended to allow restarting based on role.
By default, haproxy sens an OPTION request, which we didn't
handle until now. In addition, all haproxy requests that doesn't
examine the request body close the connection as soon as the status
code is obtained. Such behavior breaks BaseHTTPRequestHandler,
namely handle_one_request, which doesn't check for connection reset
by peer and throw this error on a higher level, but since we don't
call this function directly, there is no place in the code to catch
it, therefore, we have to patch this function in the base class.
In addition, patch the StreamRequestHandler finish() function in
order to handle the connection reset error.
Re-read the cluster from DCS right after the failover to supply
the correct new values to the API thread. Fix a typo.
In the initial implementation we were using the only option
--encoding=UTF8. In order to have pg_rewind working with postgresql-9.3
we have to enable data-checksums. The naive approach was to enable it
globaly but taking into account some performance degradation it's better
not to do it but make it possible to configure it.
In addition to that fix all problems with setting up password of default
postgres user: execute CREATE ROLE | ALTER ROLE depending on content of
pg_authid
Previously, the leader key was watched for changes after a failover. This resulted in a delay
of up to 10 seconds to report a healthy failover back to the client.
With this patch, we are not relying on the role of a member registered in the dcs anymore.
For managing Patroni clusters, the Patroni api can be used. For many tasks, a command line interface for
this api would be a useful addition. This commit adds patroncli (The name is still under debate).
The command line interface needs access to the DCS; this is required for any operation. For some tasks it is required
to have access to the Patroni api.
A small summary of the additions to get the cli/ctl started:
* Updated Docker image to use 'true' as the archive_command, to ensure disk not filling up during failover
testing.
* The cli currently can list members, failover a master and remove a given cluster from DCS.
* The cli can be configured with a command, for repeated access to the same DCS
* Added some simple tests for the cli, code coverage is very low
Instead of checking that nofailover node should not
be marked as healthiest in one of the _failover
functions, do make it unhealthy in the is_healthiest_node.