We are doing number of attempts when trying to initialize replica using
different methods. Any of this attemp may create and put something into
data directory, what causes next attempts fail.
In addition to that improve logging when creating replica.
* Avoid retries when syncing replication slots.
Do not retry postgres queries that fetch, create and drop slots at the end of
the HA cycle. The complete run_cycle routine executes with the async_executor
lock. This lock is also used with scheduling operations like reinit or restart
in different threads. Looks like CPython threading class has fairness issues
when multiple threads try to acquire the same lock and one of them executes
long-running actions while holding it: the others have little chances of
acquiring the lock in order. To get around this issue, the long action (i.e.
retrying the query) is removed.
Investigation by Ants Aasma and Alexander Kukushkin.
PostgreSQL replication slot names only allow names consisting of [a-z0-9_].
Invalid characters cause replication slot creation and standby startup to fail.
This change substitutes the invalid characters with underscores or unicode
codepoints. In case multiple member names map to identical replication slots
master log will contain a corresponding error message.
Motivated by wanting to use hostnames as member names. Hostnames often
contain periods and dashes.
In the original code we were parsing/deparsing url-style connection
strings back and forth. That was not really resource greedy but rather
annoying. Also it was not really obvious how to switch all local
connections to unix-sockets (preferably).
This commit isolates different use-cases of working with connection
strings and minimizes amount of code parsing and deparsing them. Also it
introduces one new helper method in the `Member` object - `conn_kwargs`.
This method can accept as a parameter dict object with credentials
(username and password). As a result it returns dict object which could
be used by `psycopg2.connect` or for building connection urls for
pg_rewind, pg_basebackup or some other replica creation methods.
Params for local connection are builded in the `_local_connect_kwargs`
method and could be changed to unix-socket later easily.
Although such situation should not happen in reality (follow method is
not supposed to be called when when the node is holding leader lock and
postgres is running), but to be on the safe side it is better to
implement as much checks as possible, because this method could
potentially remove data directory.
rewind is not possible when:
1) trying to rewind from themself
2) leader is not reachable
3) leader is_in_recovery
All these cases were leading to removing of data directory...
In all cases except 1) it should "retry" when leader will became
available and not is_in_recovery.
In addition for that make pg_ctl --timeout option configurable.
If the stop or start didn't succeeded during given timeout when demoting
master, role will be forcibly changed to 'unknown' and all needed
callbacks executed.
These parameters usually must be the same across all cluster nodes and
therefore must be set only via global configuration and always passed as
a list of postgres arguments (via pg_ctl) to make it not possible
accidentally change them by 'ALTER SYSTEM'
Originally we were passing postgresql options as an argument of `pg_ctl
start`. It was nice and convenient because doesn't require to touch
configuration files but this method has one significant drawback: it
wasn't possible to change values of options which were passed as an
arguments without restart (event for the case when option reqires only
reload). Instead of doing that (passing options as arguments) we will:
1) rename original postgresql.conf to postgresql-base.conf
2) write options into postgresql.conf which has `include
'postgresql-base.conf'` on the the third line after comment that this
file is generated by Patroni and you should not change it manually
3) listen_addresses and port are still passed as an arguments to the
pg_ctl (just to be foolproof against ALTER SYSTEM set port to 'random')
In addition to that this commit makes some attributes of `Postgresql`
class private (prefixes them with _)
Previously we explicitly injected a replication record into pg_hba.conf.
This doesn't allow users to explicitly write their configurations.
This change will just write the lines specified by the user.
Previously, pg_rewind was called only if a crashed master tried
to rejoin the cluster. It didn't cover the important case of a
master shut down cleanly, but with a combination of a smart
shutdown and subsequently a fast shutdown. Since out pg_rewind
code does not depend on the "uncleanness" of the master's shutdown,
we can call it unconditionally in all cases where the former master
tries to rejoin as a replica.
This resolves #167.
Previously, "without_leader" suffix was used in the name of methods
and functions that initialize a replica without an active replication
connection, and leader was part of the name for parameters and messages
that require an active replication conneciton. Since we support init
from the members other than the leader, those conventions have to be
changed.