patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 02:20:04 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	680444ae13	Reduce lock time taken by dcs.get_cluster() (#989 ) `dcs.cluster` and `dcs.get_cluster()` are using the same lock resource and therefore when get_cluster call is slow due to the slowness of DCS it was also affecting the `dcs.cluster` call, which in return was making health-check requests slow.	2019-03-12 22:37:11 +01:00
Alexander Kukushkin	9bf074acfb	Compatibility with python3 (#883 ) Change of `loop_wait` was causing Patroni to disconnect from zookeeper and never reconnect back. The error was happening only with python3 due to a difference in implementation of `select.select` function.	2018-11-30 11:40:34 +01:00
Alexander Kukushkin	fb01aaebc5	Compatibility with kazoo-2.6.0 (#872 ) Recently 2.6.0 was release which changes the way how create_connection method is called. Before it was passing two arguments, and in the new version all argument names are specified explicitly.	2018-11-19 14:26:20 +01:00
Alexander Kukushkin	4ca8a6e506	Make retries of calls to DCS consistent across implementations (#805 ) in addition to that do a small refactoring of zookeeper and consul and try to improve the stability of AT	2018-09-06 08:37:26 +02:00
Alexander Kukushkin	03c2a85d23	Expose current timeline in DCS and via API (#591 ) It is very easy to get current timeline on the master by executing ```sql SELECT ('x' \|\| SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int ``` Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master. Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are: * Second column is the absolute wal position in bytes, instead of LSN * Optionally there might be a fourth column - timestamp, (mtime of history file)	2018-01-05 15:25:56 +01:00
Alexander Kukushkin	4328c15010	Make Patroni Kubernetes native (#500 ) * Use ConfigMaps or Endpoins for leader elections and to keep cluster state * Label pods with a postgres role * change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`	2017-12-08 16:55:00 +01:00
Alexander Kukushkin	038b5aed72	Improve leader watch functionality (#356 ) Previously replicas were always watching for leader key (even if the postgres was not in the running there). It was not a big issue, but it was not possible to interrupt such watch in cases if the postgres started up or stopped successfully. Also it was delaying update_member call and we had kind of stale information in DCS up to `loop_wait` seconds. This commit changes such behavior. If the async_executor is busy by starting/stopping or restarting postgres we will not watch for leader key but waiting for event from async_executor up to `loop_wait` seconds. Async executor will fire such event only in case if the function it was calling returned something what could be evaluated to boolean True. Such functionality is really needed to change the way how we are making decision about necessity of pg_rewind. It will require to have a local postgres running and for us it is really important to get such notification as soon as possible.	2016-11-22 16:22:30 +01:00
Ants Aasma	7e53a604d4	Add synchronous replication support. (#314 ) Adds a new configuration variable synchronous_mode. When enabled Patroni will manage synchronous_standby_names to enable synchronous replication whenever there are healthy standbys available. With synchronous mode enabled Patroni will automatically fail over only to a standby that was synchronously replicating at the time of master failure. This effectively means zero lost user visible transactions. To enforce the synchronous failover guarantee Patroni stores current synchronous replication state in the DCS, using strict ordering, first enable synchronous replication, then publish the information. Standby can use this to verify that it was indeed a synchronous standby before master failed and is allowed to fail over. We can't enable multiple standbys as synchronous, allowing PostreSQL to pick one because we can't know which one was actually set to be synchronous on the master when it failed. This means that on standby failure commits will be blocked on the master until next run_cycle iteration. TODO: figure out a way to poke Patroni to run sooner or allow for PostgreSQL to pick one without the possibility of lost transactions. On graceful shutdown standbys will disable themselves by setting a nosync tag for themselves and waiting for the master to notice and pick another standby. This adds a new mechanism for Ha to publish dynamic tags to the DCS. When the synchronous standby goes away or disconnects a new one is picked and Patroni switches master over to the new one. If no synchronous standby exists Patroni disables synchronous replication (synchronous_standby_names=''), but not synchronous_mode. In this case, only the node that was previously master is allowed to acquire the leader lock. Added acceptance tests and documentation. Implementation by @ants with extensive review by @CyberDem0n.	2016-10-19 16:12:51 +02:00
Alexander Kukushkin	5fe74bec3b	Make different kazoo timeouts depend on loop_wait (#243 ) * Make different kazoo timeouts dependant on loop_wait ping timeout ~ 1/2 * loop_wait connect_timeout ~ 1/2 * loop_wait Originally these values were calculated from negotiated session timeout and didn't worked very well, because it was taking significant time to figure out that connection is dead and reconnect (up to session timeout) and not giving us time to retry. * Address the code review	2016-08-10 10:15:09 +02:00
Alexander Kukushkin	f7c6bd4eab	Implement different connect strategy for zookeeper Originally it was trying to connect during session_timeout time. Such strategy doesn't work good during short network hiccups...	2016-07-01 12:31:29 +02:00
Alexander Kukushkin	5f4e582660	Merge branch 'master' of github.com:zalando/patroni into feature/dynamic-configuration	2016-06-09 11:04:28 +02:00
Alexander Kukushkin	50d118c3aa	Split ZooKeeper and Exhibitor Originally Exhibitor was supported in the ZooKeeper class and configuration for Exhibitor was taken also from `zookeeper` section in the yaml config file. In fact, Exhibitor just extends ZooKeeper and now it is reflected in the code and also Exhibitor got it's own section in the config.yaml file. It will make it easier to configure Exhibitor hosts and port via environment variables when PR#211 will be merged.	2016-06-08 19:21:18 +02:00
Alexander Kukushkin	b3ada161cf	Implement possibility to configure `retry_timeout` globally Previously it was hardcoded all over the place.	2016-05-31 10:30:53 +02:00
Alexander Kukushkin	7827951c8c	Dynamic configuration	2016-05-25 14:17:05 +02:00
Alexander Kukushkin	6104d688d9	Merge branch 'master' of github.com:zalando/patroni into feature/sighup	2016-05-19 14:27:04 +02:00
Alexander Kukushkin	0c2aad98a3	Move dcs implementations into dcs package	2016-05-19 10:57:18 +02:00
Alexander Kukushkin	1741fa7e0f	Mininize number of references to dcs implementations from tests where it is not necessary (test_ha, test_ctl, etc...) It will simplyfy further refactoring and make it possible to install implementations of AbstractDCS independant of each other.	2016-05-19 10:00:32 +02:00
Alexander Kukushkin	d422e16aad	Implement reload of config.yaml on SIGHUP If some changes require restart of postgres patroni will expose `restart_pending` flag in DCS and via REST API	2016-05-13 13:31:21 +02:00
Alexander Kukushkin	0d3dca56ff	In some cases Ha.cluster can be None after calling `get_cluster` Such situation is causing patroni crash. Usually it was happening during manual failover, after former master has demoted and `reset_cluster` method has been called. In this case `fetch_cluster` was `False` and `_load_cluster` method was returning value from `self._cluster`, which was `None`.	2016-03-24 12:06:39 +01:00
Alexander Kukushkin	3a7d2c3874	Remove unused code from unit tests	2016-03-21 20:48:17 +01:00
Alexander Kukushkin	54055c1ff8	Rename ambiguous `Failover.member` to candidate But! 'member' is still accepted by REST API and also name 'member' is used to strore/read this value to/from DCS (for backward comatibility)	2016-03-18 15:59:47 +01:00
Alexander Kukushkin	0e0c8ed8d7	Implement `delete_cluster` interface in for all available dcs In addition to that rename confusing `Etcd.client` and `ZooKeeper.client` into `_client`. This attribute is available from AbstractDCS and people had wrong impression that it provides the same interface for different DCS implementations, which is obviously not the case. For Etcd it has type etcd.Client and for ZooKeeper - KazooClient.	2016-03-15 16:25:48 +01:00
Alexander Kukushkin	df9b8fed2e	Improve quality of code by resolving issues found by quantifiedcode and codacy	2016-02-12 12:23:49 +01:00
Alexander Kukushkin	a6603e8b48	bugfix in zookeeper module: when master node was being attached to patroni/zookeeper (no cluster in zookeeper yet) patroni has never tried to "refetch" cluster from DCS. It was leeding to demote...	2015-10-08 13:07:38 +02:00
Alexander Kukushkin	8a844285ff	Set fetch_cluster flag to False when _inner_load_cluster called Set the same flag to True if the cluster does not yet exists in ZooKeeper	2015-10-07 16:48:39 +02:00
Alexander Kukushkin	d8f4b09478	use Event.wait instead of sleep it makes possible to break "sleep" for example from API plus small bugfix: catch ValueError exception from json.loads	2015-10-02 10:26:48 +02:00
Alexander Kukushkin	d09875a056	refactoring: 1. run touch_member from the main loop 2. move code which takes care about long tasks into separate class 3. change format of data stored in a DCS: use json instead of url 4. change Member class: from now it deserialize everything into data property 5. rework API: from now it takes into account state of the current node in a dcs	2015-10-01 17:06:42 +02:00
Alexander Kukushkin	c218054d05	Implement manual failover Implementation is done on top of feature/is-healthiest-via-api and feature/api branches. In order to trigger manual failover one has to create 'failover' key in a configuration store with the value in following format: 'leader_name:member_name' leader_name can be empty or should match with the name of current leader member_name can be empty or should match with the name one of cluster nodes Leader always checks that either desired member (if specified) or one of the memners is accessible and healthy before demote. After leader has deomted himself other nodes are performig checks that desired node is healthy. If it is not they are participating in a leader race. In some cases (when accidently there is no healthy nodes) former leader can also participate in a leader race. Current implementation does not provide REST API endpoint for a manual failover.	2015-09-28 17:00:42 +02:00
Alexander Kukushkin	6e9cb60fd5	Restart and reinitialize via api POST /restart -- will restart postgres You you are restartung leader node, lock would be maintained during restart. POST /reinitialize -- will reinitialize node from the leader. It's not possible to reinitialize current leader. Command will fail when the leader is unknown.	2015-09-24 14:52:03 +02:00
Alexander Kukushkin	d8982e1e5a	Refactor Postgresql.query method to use common retry mechanism query method in an api.py also needs retry in some cases (for example when we are running is_healthiest_node check). In all cases we should retry only when connection is closed or broken. BUT, the connection status must be checked via cursor.connection (old implementation was using general connection object for that). For multi-threaded applications this is not appropriate, because some other thread might restore connection. In addition to that I've changed most of the unit tests to use `Mock` and `patch` where it is possible.	2015-09-20 13:54:30 +02:00
Alexander Kukushkin	7f8e95b334	Next run of ha cycle is rescheduled depending on return value of `watch` Current etcd implementation does not yet support timeout option when `wait=true`: https://github.com/coreos/etcd/issues/2468 Originaly I've implemented `watch` method for `Etcd` class in a following manner: if the leader key was updated just because master needs to update ttl and watch timeout is not yet expired, I was recalculating timeout and starting `watch` call once again. Usually after "restart" we were getting urllib3.exceptions.TimeoutError. The only possible way to recover after such exception - close socket and establish a new connection. With pure http it's relatively cheap, but with https and some kind of authorization on etcd side it would became rather expensive and should be avoided.	2015-09-16 10:38:34 +02:00
Alexander Kukushkin	90cfcf0c14	Make zookeeper module compatible with python3	2015-09-14 17:14:39 +02:00
Alexander Kukushkin	209c985420	get_node and get_children should catch only NoNodeError exception. All other exceptions are needed to have retry functionality working correctly.	2015-09-14 11:45:00 +02:00
Alexander Kukushkin	f494d2ce64	Build Cluster object for ZooKeeper the same way as for Etcd Previous implementation was always setting Cluster.initialize to True. Also it was throwing ZooKeeperError when there were no members in a cluster. Plus BUGFIX of a bug introduced with https://github.com/zalando/patroni/pull/34 in a `load_members` method. - data = self.get_node(self.member_path) + data = self.get_node(self.members_path + member) It was always fetching the same node for all cluster members. Fortunately Etcd doesn't have such problem because we are fetching the whole cluster directory with one recursive API call.	2015-09-14 11:19:46 +02:00
Oleksii Kliukin	2377c417e4	Fix etcd and zookeper interactions with initialize key. Fix unittests as well.	2015-09-10 16:05:10 +02:00
Oleksii Kliukin	938b946e55	Merge branch 'master' into feature/cleanup_on_failed_initialization	2015-09-10 15:43:31 +02:00
Alexander Kukushkin	36cbd34ffc	Fix zookeeper test coverage	2015-09-09 15:59:02 +02:00
Oleksii Kliukin	ff499604f0	Act on removal of initialization flag. If initializer node suddenly dies before the initialization is complete, other nodes should try to take over. Fix some unittests for etcd and zookeeper and add couple of new ones.	2015-09-08 16:04:54 +02:00
Oleksii Kliukin	92647b7aad	Merge branch 'master' of https://github.com/zalando/patroni into feature/cleanup_on_failed_initialization	2015-09-08 14:54:52 +02:00
Oleksii Kliukin	b842ed478b	Make sure initialize flag is reset on failure. Cleanup the initialize flag if the initializing node fails to bootstrap its PostgreSQL database. Rename dcs.race to initialize, since we only call it for the initialize flag. Factored out PostgreSQL bootstrapping code into a separate function.	2015-09-08 12:03:34 +02:00
Alexander Kukushkin	1774d6e31a	Merge branches watch-leader-key and package-refactoring	2015-09-05 16:09:28 +02:00
Alexander Kukushkin	650e244904	Refactor directory structure in preparation for building pypi-package	2015-09-04 16:06:44 +02:00
Alexander Kukushkin	10c95a23e4	Rename sleep to watch in a AbstractDCS This method suppose to watch for changes of leader key if current node is not leader and also it could watch for changes in a members list if current conde is the leader.	2015-09-01 09:59:37 +02:00
Alexander Kukushkin	3b1efff53e	Refactor `Cluster` object `Cluster.leader` is not reference to `Member` anymore, but to `Leader` `Leader` class contains field `index` (update index). This field is very useful for watching for events which changing leader key. Also `Leader` contains `member` field, which should reference real member.	2015-08-27 10:53:22 +02:00
Alexander Kukushkin	befd33555d	Refactor helpers/etcd.py Work with etcd cluster via high-level python-etcd module. Plus change all unit tests accordingly.	2015-08-24 16:58:08 +02:00
Alexander Kukushkin	dcad7a3229	Add exhibitor support List of ZooKeeper nodes could be periodically updated from Exhibitor Since we know that each Exhibitor accompanies one ZooKeeper node, list of Exhibitor nodes also maintained. Exhibitor assumes that all ZooKeeper nodes are using the same client port, 2181. The same assumption is valid for Exhibitor, it should always listen on the same port on all nodes. Original list of Exhibitor nodes is cached and used as a fallback when it failed ito query information with using maintained list.	2015-07-10 10:46:33 +02:00
Alexander Kukushkin	c49580d6a7	Rename governor into patroni	2015-07-08 10:37:35 +02:00
Alexander Kukushkin	43b12af3a7	Implement possibility to work against ZooKeeper This implementation is using the same interface (AbstractDCS) as Etcd class. It means that there should be no problem to implement another plugin to work agains Consul for example.	2015-07-07 12:45:14 +02:00

48 Commits