patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-28 10:20:05 +00:00

Author	SHA1	Message	Date
Feike Steenbergen	dd5bc1bc9b	Merge branch 'master' into feature/replica-info	2016-08-24 11:55:33 +02:00
Alexander Kukushkin	96da6340a9	Calculate future restart time dynamically (#268 ) `do_POST_restart` was ramdomly showing not 100% coverage after 2016-08-20 due to hardcoded timestamps.	2016-08-24 09:46:56 +02:00
Feike Steenbergen	1fc8b43b36	Return replication information on the api To enable better monitoring, it is useful to have replication statistics. Addresses issue #261	2016-08-24 09:31:49 +02:00
Alexander Kukushkin	fa7aa71092	Always call on_start callback when starting Patroni (#262 ) When Patroni was "joining" already running postgres it was not calling callbacks, what in some cases causing issues (callback could be used to change routing/load-balancer or assign/remove floating (service) ip. In addition to that we should `start` postgres instead of `restart`-ing it when doing recovery, because in this case 'on_start' callback should be called, instead of 'on_restart'	2016-08-18 09:35:13 +02:00
Oleksii Kliukin	179131893e	Merge branch 'master' into feature/ctl_scaffolding	2016-08-10 11:49:08 +02:00
Alexander Kukushkin	8ef7178ddf	Refactor code dealing with database connection string/params (#255 ) In the original code we were parsing/deparsing url-style connection strings back and forth. That was not really resource greedy but rather annoying. Also it was not really obvious how to switch all local connections to unix-sockets (preferably). This commit isolates different use-cases of working with connection strings and minimizes amount of code parsing and deparsing them. Also it introduces one new helper method in the `Member` object - `conn_kwargs`. This method can accept as a parameter dict object with credentials (username and password). As a result it returns dict object which could be used by `psycopg2.connect` or for building connection urls for pg_rewind, pg_basebackup or some other replica creation methods. Params for local connection are builded in the `_local_connect_kwargs` method and could be changed to unix-socket later easily.	2016-08-10 10:19:52 +02:00
Alexander Kukushkin	413a84836b	Update etcd topology only after original request succeed (#254 ) There is no point to try to update topology until original request is not performed. Also for us it is more important to execute original request rather then keep topology of etcd cluster in sync. In addition to that implement the same retry-timeout logic in the `machines` property which already is used in `api_execute` method.	2016-08-10 10:17:37 +02:00
Alexander Kukushkin	5fe74bec3b	Make different kazoo timeouts depend on loop_wait (#243 ) * Make different kazoo timeouts dependant on loop_wait ping timeout ~ 1/2 * loop_wait connect_timeout ~ 1/2 * loop_wait Originally these values were calculated from negotiated session timeout and didn't worked very well, because it was taking significant time to figure out that connection is dead and reconnect (up to session timeout) and not giving us time to retry. * Address the code review	2016-08-10 10:15:09 +02:00
Murat Kabilov	a47a2bceff	Manage scheduled restarts using patronictl (#248 ) Manage scheduled restarts using patronictl	2016-08-09 12:54:48 +02:00
Oleksii Kliukin	ac7abfdd74	Minor fixes, address final rounds of code review.	2016-08-09 10:00:46 +02:00
Oleksii Kliukin	9fd01f6af4	Remove unused imports.	2016-08-08 16:48:14 +02:00
Oleksii Kliukin	d9102d2703	Remove the necessity of creating a RESTAPI object. - We don't want to export RestApi object, since it initializes the socket and listens on it. - Change get_dcs, so that the explicit scope passed to it will take priority over the one in the configuration file.	2016-08-08 16:15:57 +02:00
Oleksii Kliukin	53f991df0f	More code-review related fixes - Add missing delete_cluster. - Simplify parts of the code by removing exception handlers where they are not needed. - Fix typos.	2016-08-08 15:30:33 +02:00
Oleksii Kliukin	eeb8f1b694	Further address code reviews. - Fix the issue in ctl that would result in setting the listen_address to True. - Minor stylistic issues. - Add unit-tests.	2016-08-08 12:21:01 +02:00
Oleksii Kliukin	13b4306f40	Remove one more occurrence of the time bomb	2016-07-14 16:53:02 +02:00
Oleksii Kliukin	6c9ffa4d3c	Address the code review In particular, replace the fixed dates for the future actions in the unit tests with those that depend on the current date, avoiding the "timebomb" effect.	2016-07-14 16:39:35 +02:00
Oleksii Kliukin	3181c4e59f	Code review, asynchronous restarts. - Make the restart initiated by the schedule asynchronous - Fix the placeholders in logs. - Fix the regexp to detect the PostgreSQL version.	2016-07-12 20:25:01 +02:00
Oleksii Kliukin	b17483b7dd	Fix the PG version regex.	2016-07-11 15:21:31 +02:00
Oleksii Kliukin	c91eda8d78	Merge branch 'master' into feature/scheduled_restarts	2016-07-11 12:56:24 +02:00
Oleksii Kliukin	6da2eecb90	Increase the test coverage.	2016-07-11 11:51:07 +03:00
Alexander Kukushkin	659f7617f5	New option: remove_data_directory_on_rewind_failure One more try to fix pg_rewind	2016-07-05 12:11:15 +02:00
Oleksii Kliukin	8834f929aa	Improve the unit tests/coverage.	2016-07-05 10:07:29 +02:00
Alexander Kukushkin	a19dbfaddf	Merge pull request #232 from zalando/bugfix/pg_rewind Start readonly when holding leader lock	2016-07-04 13:11:35 +02:00
Alexander Kukushkin	b84e22c4ea	Implement more checks in the follow method Although such situation should not happen in reality (follow method is not supposed to be called when when the node is holding leader lock and postgres is running), but to be on the safe side it is better to implement as much checks as possible, because this method could potentially remove data directory.	2016-07-04 10:56:37 +02:00
Alexander Kukushkin	f9298d30ca	Merge pull request #231 from zalando/bugfix/etcd-retry Fix retry logic in etcd.py	2016-07-04 10:37:55 +02:00
Alexander Kukushkin	f7c6bd4eab	Implement different connect strategy for zookeeper Originally it was trying to connect during session_timeout time. Such strategy doesn't work good during short network hiccups...	2016-07-01 12:31:29 +02:00
Alexander Kukushkin	ee529669d2	Start readonly when holding leader lock Not starting of postgres was causeing situation when there were no master running...	2016-07-01 12:28:02 +02:00
Alexander Kukushkin	dc27a30800	Merge pull request #230 from zalando/bugfix/pg_rewind Try to cover as much as possible pg_rewind corner-cases	2016-06-30 12:09:10 +02:00
Alexander Kukushkin	aa10f42913	checkpoint method returns string status message	2016-06-30 10:45:54 +02:00
Alexander Kukushkin	876cfdfb2d	Fix retry logic in etcd.py Client class takes care about retrying when connection to the etcd node fails. It calculates amount of retries and timeout depending on etcd cluster size. Etcd class should not retry when EtcdConnectionFailed exception is raised (this case is already handled in the Client). Besides that adjust retry timeouts in the Client class.	2016-06-29 15:30:54 +02:00
Alexander Kukushkin	4b67008488	Try to cover as much as possible pg_rewind corner-cases rewind is not possible when: 1) trying to rewind from themself 2) leader is not reachable 3) leader is_in_recovery All these cases were leading to removing of data directory... In all cases except 1) it should "retry" when leader will became available and not is_in_recovery.	2016-06-29 14:29:31 +02:00
Alexander Kukushkin	ae88e7c96e	Document that every single zookeeper host:port MUST be quoted otherwise yaml library can not parse the list. And make visible yaml exception when trying to parse this list.	2016-06-29 14:25:50 +02:00
Oleksii Kliukin	d2832ee43b	Address the code review. Fix return value in the should_run_scheduled_action and the comments. Correct the json composition in the scheduled_restart test. Fix the delete in case there is no scheduled restart. Fix the usage of format in the logger output. Fix the indentation in the evaluate_scheduled_restart. Fix the condition related to the body_is_optional in the do_POST_restart. Fix a few typos in the error messages. Fix the _read_json_content Make the scheduled restart unit-tests a bit less ugly	2016-06-28 16:54:20 +02:00
Alexander Kukushkin	0318749b56	bugfix: api must report role=master during pg_ctl stop In addition for that make pg_ctl --timeout option configurable. If the stop or start didn't succeeded during given timeout when demoting master, role will be forcibly changed to 'unknown' and all needed callbacks executed.	2016-06-28 14:14:42 +02:00
Oleksii Kliukin	568eb730bc	Clear the scheduled restart after the normal one. Make sure the scheduled restart flag is cleared when the postmaster_start_time changes since the time restart was scheduled. Additionally, separate the logic of checking the restart conditions into the function in order to support conditions for the normal restart as well.	2016-06-24 17:39:04 +02:00
Oleksii Kliukin	29845dd383	Restart the node according to the schedule. The scheduled restart data structures are now independent of those used by the normal restarts. This would be fixed in subsequent commits. Add the behave tests, that cover the POST /restart (but not DELETE).	2016-06-23 10:43:54 +02:00
Oleksii Kliukin	c2490d4831	Merge branch 'master' into feature/scheduled_restarts	2016-06-20 15:38:20 +02:00
Oleksii Kliukin	318ca6be38	Implement scheduling and deleting a restart. The scheduled restart API extends the already existing restart endpoint by processing the parameters in the request body. Only one scheduled restart at a time is support. DELETE method on the /restart endpoint is used to remove an existing restart.	2016-06-20 15:16:22 +02:00
Alexander Kukushkin	bd1e658080	Bugfix: obviously sys.hexversion was one symbol shorter plus remove some unneeded code	2016-06-17 12:18:41 +02:00
Alexander Kukushkin	bd5440a102	Fix a typo and call sys.exit on sigterm otherwise it will wait up to `loop_wait` seconds berfore exiting...	2016-06-16 15:19:21 +02:00
Alexander Kukushkin	69099b060e	SystemExit exception was swallowed in in thread It was causing patroni failing to stop after receiving SIGTERM. Acceptance tests was killing it with SIGKILL which was causing further tests fail because postgres was still running: 2016-06-16 14:36:24,444 INFO: no action. i am the leader with the lock 2016-06-16 14:36:25,448 INFO: Lock owner: postgres0; I am postgres0 2016-06-16 14:36:25,452 ERROR: Failed to update /service/batman/optime/leader Traceback (most recent call last): File "/home/akukushkin/git/patroni/patroni/dcs/zookeeper.py", line 208, in write_leader_optime self._client.retry(self._client.set, path, last_operation) File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/client.py", line 273, in _retry return self._retry.copy()(args, kwargs) File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/retry.py", line 123, in __call__ return func(args, **kwargs) File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/client.py", line 1219, in set return self.set_async(path, value, version).get() File "/home/akukushkin/git/patroni/py2/local/lib/python2.7/site-packages/kazoo/handlers/utils.py", line 74, in get self._condition.wait(timeout) File "/usr/lib/python2.7/threading.py", line 340, in wait waiter.acquire() File "/home/akukushkin/git/patroni/patroni/utils.py", line 219, in sigterm_handler sys.exit() SystemExit 2016-06-16 14:36:25,453 INFO: no action. i am the leader with the lock 2016-06-16 14:36:26,443 INFO: Lock owner: postgres0; I am postgres0 2016-06-16 14:36:26,444 INFO: no action. i am the leader with the lock	2016-06-16 14:59:13 +02:00
Alexander Kukushkin	17f317665f	Merge pull request #221 from zalando/feature/patronictl-auth patronictl will send authorization header if it is configured	2016-06-16 12:57:14 +02:00
Alexander Kukushkin	010a2961cb	Merge pull request #220 from zalando/feature/patronictl-newconf Feature/patronictl newconf	2016-06-16 12:56:47 +02:00
Alexander Kukushkin	9f5276dd2b	patronictl will send authorization header if it is configured username:password can be configured in the 'restapi' section of config file or via environment	2016-06-16 12:16:16 +02:00
Alexander Kukushkin	bd6070e2b0	Make patronictl use config.py for loading config_file config.py is not only loading config_file but also can build configuration from environment variables.	2016-06-16 08:50:44 +02:00
Alexander Kukushkin	57807ff337	Don't expose replication user/passwd in DCS	2016-06-15 09:34:04 +02:00
Alexander Kukushkin	f2980b13fb	Merge pull request #211 from zalando/feature/environment-configuration Implement possibility to configure Patroni via environment	2016-06-14 10:10:09 +02:00
Alexander Kukushkin	c64170ef33	Extend list of postgres parameters controlled by Patroni These parameters usually must be the same across all cluster nodes and therefore must be set only via global configuration and always passed as a list of postgres arguments (via pg_ctl) to make it not possible accidentally change them by 'ALTER SYSTEM'	2016-06-13 10:33:14 +02:00
Alexander Kukushkin	9ecff0f64d	Bugfixes * GET /config was returning latesy "correct" version of dynamic configuration. * PATCH /config was breaking when trying to patch not dict with dict	2016-06-10 12:35:04 +02:00
Alexander Kukushkin	49efb371f9	Make it possible to work without config.yml Most of the basic configuration could be done via ENV	2016-06-09 14:44:29 +02:00

... 6 7 8 9 10 ...

735 Commits