It could take up to 10 seconds to create replication slot.
In addition to that when replica fails to connect to the master via
streaming replication it doesn't retry immediately, but with some
timeout (5 seconds). 10 + 5 == 15 what causes replication check
scenarios fail.
Without sudo travis is executing build tasks using docker and waiting
time in this case is really small, usually not longer then 10 seconds.
postgresql-9.5 is installed via addons.apt.packages (without sudo)
But ports 5432 and 5433 are busy. So I had to ajust environment.py to
assign port from higher diapason.
And a few words about build tasks:
First task is used for executing unit tests for all different python versions
The second one is used for executing acceptance tests against etcd
The third one is used for executing acceptance tests against zookeeper
acceptance tests are executed with python2.7 and python3.5
In addition that I've introduced caching of python virtual environment.
It really helps to reduce time needed to install python modules.
According to https://github.com/zalando/patroni/issues/48 'clonefrom'
tag should be boolean and it should be used to mark node as a suitable
for creation of a new replica from. If there are more then one such node
in the cluster (with tag clonefrom=true), one of them will be chosed
randomly.
It was shutdown correctly and I expected such 'join' working, but it was
not, because new leader didn't had enough time to catch up with the
master before promote.
The actual amount of time to establish the master and the replication
after the scheduled failover seems sufficient (15 seconds with the
failover in 10 seconds), but occasionally leads to test failures.
This is unlikely the test issue and should be investigated inside
the patroni.
Add an ability to specify the origin and the destination for
the replication works clause. Use this ability in the API
promotion test to ensure the replication from the former
replica to the former master.
Toggle the etcd debug logging and write the log to the test dir.
Make sure etcd and patroni are terminated when the tests finish
by sending SIGKILL in case SIGTERM does not work.
Make sure before.all code does the proper cleanup when the exception
is thrown.
Move etcd code into a separate class.
Reduce the number of global interdependencies.
Clearly define private members of PatroniController and EtcdController.
It would not make the QuantifiedCode entirely happy, since lettuce
passes the step argument to the step definition, that is not used
in the client code, but internally (via the @steps decorator on
the steps class), but that's the issue of the tool used.