142 Commits

Author SHA1 Message Date
Alexander Kukushkin
fd4f12aac8 Do not assume that connection user is postgres, but take it from config.yml 2016-04-21 13:56:09 +02:00
Alexander Kukushkin
7006a4ee14 Sometimes replica can't attach to the master after pg_rewind
The reason for that is: it takes up to 10 seconds to create replication
slot + up to 5 seconds to start straming and recover.
2016-04-13 14:28:00 +02:00
Alexander Kukushkin
d57310bbc0 Fix one more corner-case
It could take up to 10 seconds to create replication slot.
In addition to that when replica fails to connect to the master via
streaming replication it doesn't retry immediately, but with some
timeout (5 seconds). 10 + 5 == 15 what causes replication check
scenarios fail.
2016-04-13 14:09:45 +02:00
Alexander Kukushkin
01da5266a0 Give time for running healh-checks when promoting replica 2016-04-13 13:32:39 +02:00
Alexander Kukushkin
b4e86f0809 Make it possible to schedule failover in less then 10 seconds
But only when API request was posted to the leader
2016-04-13 13:32:39 +02:00
Alexander Kukushkin
15d30a2d35 Try to stabilize acceptance tests 2016-04-13 13:32:39 +02:00
Alexander Kukushkin
f8bf1bb0ab Disable sudo, reshuffle travis tasks and introduce caching
Without sudo travis is executing build tasks using docker and waiting
time in this case is really small, usually not longer then 10 seconds.

postgresql-9.5 is installed via addons.apt.packages (without sudo)
But ports 5432 and 5433 are busy. So I had to ajust environment.py to
assign port from higher diapason.

And a few words about build tasks:
First task is used for executing unit tests for all different python versions
The second one is used for executing acceptance tests against etcd
The third one is used for executing acceptance tests against zookeeper
acceptance tests are executed with python2.7 and python3.5

In addition that I've introduced caching of python virtual environment.
It really helps to reduce time needed to install python modules.
2016-04-13 13:32:39 +02:00
Alexander Kukushkin
24a2ea6cef Refactor acceptance tests to make them work against ZooKeeper
and make it easier to implement controllers for new DCS, i.e. consul
2016-04-10 10:37:43 +02:00
Alexander Kukushkin
c6cc731bf0 Merge pull request #166 from zalando/feature/clonefrom
Correct implementation of 'clonefrom' feature
2016-04-10 10:33:18 +02:00
Alexander Kukushkin
ada50e418c Update scenario description 2016-03-31 17:13:29 +02:00
Alexander Kukushkin
db5999a639 Correct implementation of 'clonefrom' feature
According to https://github.com/zalando/patroni/issues/48 'clonefrom'
tag should be boolean and it should be used to mark node as a suitable
for creation of a new replica from. If there are more then one such node
in the cluster (with tag clonefrom=true), one of them will be chosed
randomly.
2016-03-30 11:30:05 +02:00
Alexander Kukushkin
e6af18f0bb Former leader was not able to reattach to cluster without pg_rewind
It was shutdown correctly and I expected such 'join' working, but it was
not, because new leader didn't had enough time to catch up with the
master before promote.
2016-03-24 14:45:21 +01:00
Alexander Kukushkin
54055c1ff8 Rename ambiguous Failover.member to candidate
But! 'member' is still accepted by REST API and also name 'member' is
used to strore/read this value to/from DCS (for backward comatibility)
2016-03-18 15:59:47 +01:00
Alexander Kukushkin
79f4d9a13b Attempt to export acceptance tests coverage results to coveralls 2016-03-13 09:42:02 +01:00
Alexander Kukushkin
62f11ab747 Attempt to export acceptance tests coverage results to coveralls 2016-03-13 09:09:31 +01:00
Oleksii Kliukin
6985df3aca Restore the test for the clone from the replica. 2016-03-11 16:59:35 +01:00
Alexander Kukushkin
ba444adb67 make codacy and quantifiedcode happier 2016-03-11 15:32:16 +01:00
Alexander Kukushkin
5f6beae22f Enforce data-type checks for step matcher
and increase default timeout for patroni start
2016-03-11 14:46:14 +01:00
Alexander Kukushkin
8b81d270bc BUGFIX: Assertion Failed: Steps must be unicode 2016-03-11 13:47:57 +01:00
Alexander Kukushkin
30d3982d25 Acceptance tests with behave 2016-03-11 12:56:29 +01:00
Alexander Kukushkin
c2d1eea7d0 disable clonefrom test 2016-03-10 17:19:43 +01:00
Alexander Kukushkin
42d798a3de acceptance tests on travis 2016-03-10 17:19:10 +01:00
Oleksii Kliukin
998f0da3d8 Add cascading replication (backup from the replica) tests. 2016-03-10 16:05:06 +01:00
Oleksii Kliukin
3f1c34f557 Add tests for the scheduled failover.
The actual amount of time to establish the master and the replication
after the scheduled failover seems sufficient (15 seconds with the
failover in 10 seconds), but occasionally leads to test failures.
This is unlikely the test issue and should be investigated inside
the patroni.
2016-03-02 19:39:12 +01:00
Oleksii Kliukin
069440be15 Improve the "replication work" sentence definition.
Add an ability to specify the origin and the destination for
the replication works clause. Use this ability in the API
promotion test to ensure the replication from the former
replica to the former master.
2016-03-02 15:43:44 +01:00
Oleksii Kliukin
24ebcc72f6 Add more tests for the restart and promotion. 2016-03-01 22:07:18 +01:00
Oleksii Kliukin
ed15f7cd73 Improve tests start/stop, add etcd logging.
Toggle the etcd debug logging and write the log to the test dir.

Make sure etcd and patroni are terminated when the tests finish
by sending SIGKILL in case SIGTERM does not work.

Make sure before.all code does the proper cleanup when the exception
is thrown.
2016-03-01 22:03:38 +01:00
Oleksii Kliukin
fa1a7687e5 Correct the step definition, randomize the table.
Make sure the step definition does not include "command" worlds.
Use the table name that includes current timestamp in the tests.
2016-03-01 22:00:30 +01:00
Oleksii Kliukin
0d44e3eb7c Add simple API tests for 2 nodes, to be extended. 2016-02-26 18:00:11 +01:00
Oleksii Kliukin
4e9ebf48a8 Add API tests for a stand-alone node. Bugfixes.
Add tests for patroni API.
Fix test failures when an already running etcd is used.
2016-02-26 17:37:37 +01:00
Oleksii Kliukin
83b7c34b00 Do not try to close an already closed file. 2016-02-25 15:35:15 +01:00
Oleksii Kliukin
a84a3fc5e1 Changeset missing in the previous commit. 2016-02-25 15:16:35 +01:00
Oleksii Kliukin
67f55b4606 Stylistic issues: clearly mark unused variables. 2016-02-25 15:11:48 +01:00
Oleksii Kliukin
4a8edf44e6 Convert normal methods to static methods when possible. 2016-02-25 14:55:29 +01:00
Oleksii Kliukin
481a80a3ce Fix another couple of warnings from the QuantifiedCode and Co. 2016-02-25 14:39:22 +01:00
Oleksii Kliukin
4986db5c6a Code refactoring, no functional changes.
Move etcd code into a separate class.
Reduce the number of global interdependencies.
Clearly define private members of PatroniController and EtcdController.

It would not make the QuantifiedCode entirely happy, since lettuce
passes the step argument to the step definition, that is not used
in the client code, but internally (via the @steps decorator on
the steps class), but that's the issue of the tool used.
2016-02-25 12:52:45 +01:00
Oleksii Kliukin
53b5dfe39e Remove an unused function. 2016-02-24 19:24:46 +01:00
Oleksii Kliukin
c9b8c2d3a9 Bugfixes, add a function to kill patroni daemon, make the feature description more concise. 2016-02-24 19:22:42 +01:00
Oleksii Kliukin
6f03953268 Merge basic failover and basic replication scenarios in one feature. 2016-02-24 17:12:45 +01:00
Oleksii Kliukin
6ec3523748 Collect test output, add basic failover test. 2016-02-24 16:30:52 +01:00
Oleksii Kliukin
f781d0b9fe Address the code review by Alex Shulgin. 2016-02-16 16:50:50 +01:00
Oleksii Kliukin
38bd037d99 Add the 1st lettuce test for the basic replication.
Basically check that the table inserted on the primary
will get its way to the secondary.
2016-02-05 13:30:42 +01:00