In the initial implementation we were using the only option
--encoding=UTF8. In order to have pg_rewind working with postgresql-9.3
we have to enable data-checksums. The naive approach was to enable it
globaly but taking into account some performance degradation it's better
not to do it but make it possible to configure it.
In addition to that fix all problems with setting up password of default
postgres user: execute CREATE ROLE | ALTER ROLE depending on content of
pg_authid
One can use pgpass configuration parameter in the postgres
subsection of Patroni. By default pgpass is written in ~/.
Mock actual writes to pgpass in the tests.
I accidentially removed the call when moving the backup functions
to the external script. It is intended to save the configuration,
so that at the restore phase one can just copy backup files.
Its primary intention was to save configuration files in the WAL-E
case (WAL-E just omits everything with .conf), but it is also
useful in the pg_basebackup case, which omits all symlinks, leaving
the cluster with .conf files symlinked in the broken state.
- check the link before checking the file when deciding to remove it,
as isfile follows symlinks and, therefore, may return True on them.
- Remove append mode from write_pgpass, as it is always written anew
before it is used.
- make pg_controldata return an empty hash in case of an error, and
check for the empty value return by this function before using it.
some other minior fixed and test updates.
If patroni detects the former master was killed, it runs
it first in a single-user mode and then shuts down normally,
to make sure pg_rewind will see a normal shut down status
in pg_controldata.
Add a flag need_rewind, since the point where it is detected
that rewind might be necessary is moved out the code that
runs rewind.
1. run touch_member from the main loop
2. move code which takes care about long tasks into separate class
3. change format of data stored in a DCS: use json instead of url
4. change Member class: from now it deserialize everything into data property
5. rework API: from now it takes into account state of the current node in a dcs
Implementation is done on top of feature/is-healthiest-via-api and
feature/api branches.
In order to trigger manual failover one has to create 'failover' key in
a configuration store with the value in following format:
'leader_name:member_name'
leader_name can be empty or should match with the name of current leader
member_name can be empty or should match with the name one of cluster
nodes
Leader always checks that either desired member (if specified) or one of
the memners is accessible and healthy before demote.
After leader has deomted himself other nodes are performig checks that
desired node is healthy. If it is not they are participating in a leader
race. In some cases (when accidently there is no healthy nodes) former
leader can also participate in a leader race.
Current implementation does not provide REST API endpoint for a manual
failover.
POST /restart -- will restart postgres
You you are restartung leader node, lock would be maintained during
restart.
POST /reinitialize -- will reinitialize node from the leader.
It's not possible to reinitialize current leader.
Command will fail when the leader is unknown.
query method in an api.py also needs retry in some cases (for example
when we are running is_healthiest_node check).
In all cases we should retry only when connection is closed or broken.
BUT, the connection status must be checked via cursor.connection (old
implementation was using general connection object for that). For
multi-threaded applications this is not appropriate, because some other
thread might restore connection.
In addition to that I've changed most of the unit tests to use `Mock` and
`patch` where it is possible.
`Cluster.leader` is not reference to `Member` anymore, but to `Leader`
`Leader` class contains field `index` (update index). This field is very
useful for watching for events which changing leader key. Also `Leader`
contains `member` field, which should reference real member.