patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	a5ff38a034	Improve behave tests (#1313 ) Hopefully, make them less flaky	2019-12-02 10:33:44 +01:00
Alexander Kukushkin	a3be2958a7	Tidy up setup.py (#1308 ) 1. Stop using `setuptools.command.test`, it is being deprecated 2. Remove junit integration	2019-11-27 15:56:50 +01:00
Alexander Kukushkin	85341ff78b	Use passfile in primary_conninfo only on 10+ (#1301 ) somehow passfile happened to work on ubuntu with older postgres versions, but it is not always the case for other distros.	2019-11-27 14:58:23 +01:00
Alexander Kukushkin	7793887ea7	Fix tests on windows (#1303 ) and disable junit, it produces a deprecation warning	2019-11-27 14:57:33 +01:00
Alexander Kukushkin	525a26fab5	Solve the problem of cyclic imports (#1306 ) Move `PATRONI_ENV_PREFIX` into the `patroni/__init__.py`	2019-11-26 17:03:34 +01:00
Alexander Kukushkin	90a4208390	Get rid from requests module (#1296 ) It wasn't used for anything critical anyway, so it doesn't make a lot of sense to keep it as an explicit dependency.	2019-11-22 15:31:55 +01:00
Alexander Kukushkin	f03b85f5b0	Fix calculation of wal_buffes (#1297 ) Close https://github.com/zalando/patroni/issues/1288	2019-11-22 15:30:28 +01:00
Alexander Kukushkin	474ac3cc11	Move multiprocessing.set_start_method() back to main (#1295 ) It is not possible to call it from forked process	2019-11-21 17:28:14 +01:00
Alexander Kukushkin	412c720d3a	Avoid importing all DCS modules (#1286 ) We will try to import only the module which has a configuration section. I.e. if there is only zookeeper section in the config, Patroni will try to import only `patroni.dcs.zookeeper` and skip `etcd`, `consul`, and `kubernetes`. This approach has two benefits: 1. When there are no dependencies installed Patroni was showing INFO messages `Failed to import smth`, which looks scary. 2. It reduces memory usage, because sometimes dependencies are heavy.	2019-11-21 14:39:37 +01:00
Igor Yanchenko	cd96a10dd2	Provide an example of using multiple etcd endpoints in yaml files (#1289 )	2019-11-21 13:29:05 +01:00
Alexander Kukushkin	183adb7848	Housekeeping (#1284 ) * Implement proper tests for `multiprocessing.set_start_method()` * Exclude some watchdog code from coverage (it is used only for behave tests) * properly use os.path.join for windows compatibility * import DCS modules in `features/environment.py` on demand. It allows to run behave tests against chosen DCS without installing all dependencies. * remove some unused behave code * fix some minor issues in the dcs.kubernetes module	2019-11-21 13:27:55 +01:00
Igor Yanchenko	8b26733f6a	striping extra spaces if etcd.hosts is written as comma separated string (#1290 ) Patroni was failing to connect to the etcd server if config looks like following: ```yaml etcd: hosts: host1:port1, host2:port2 ```	2019-11-21 10:50:14 +01:00
Alexander Kukushkin	35a2ccf8a8	A couple of small fixes in docs (#1285 ) * fix formatting in release notes * fix patronictl reinit command name	2019-11-21 10:39:28 +01:00
Alexander Kukushkin	2f9a48fae4	Release 1.6.1 (#1281 ) * Bump version to 1.6.1 * Update release notes v1.6.1	2019-11-15 12:48:00 +01:00
Maciej Kowalczyk	efcd05ace2	Use "spawn" multiprocessing start method (#1279 ) workaround https://bugs.python.org/issue6721 Fixes #1278	2019-11-15 10:56:18 +01:00
Alexander Kukushkin	66d77697ae	Use LIST + WATCH when working with K8s API (#1276 ) There is an opinion that LIST requests with labelSelector to K8s API are expensive and Patroni was doing two such requests per HA loop (LIST pods and LIST endpoints/configmaps). To efficiently detect object changes we will switch to the LIST+WATCH approach. The initial LIST request populates the ObjectCache and events from the WATCH request update it. In addition to that, the ObjectCache will be updated after performing the UPDATE operations on the K8s objects. To avoid race conditions, all operations on ObjectCache are performed after comparing the resource_version of the old and the new objects and rejected if the new resource_version value is smaller than the old one. The disadvantage of such an approach is that it will require keeping three connections to the K8s API from each Patroni Pod (previously it was two). Yesterday I deployed this feature branch on our biggest K8s cluster, with ~300 Patroni pods. The CPU Utilization on K8s master nodes immediately dropped from ~20% to ~10% (two times), and the incoming traffic on master nodes dropped ~7-8 times! Last, but not least, we get more or less the same impact on etcd cluster behind K8s master nodes, the CPU Utilization dropped nearly twice and outgoing traffic ~7-8 times.	2019-11-14 14:54:57 +01:00
Alexander Kukushkin	c1adbafbc5	Improve documentation (#1244 ) * document tags * move dynamic configuration out of `bootstrap.dcs` * document REST API endpoints	2019-11-13 16:10:28 +01:00
Alexander Kukushkin	252a1b78ed	Make it possible to change use_slots online (#1261 ) Previously it required restarting Patroni and removing slots manually Fixes https://github.com/zalando/patroni/issues/1158	2019-11-11 16:18:53 +01:00
Alexander Kukushkin	5ea73d50ed	Make it possible to apply some recovery params without restart (#1260 ) Starting from PostgreSQL 12 the following recovery parameters could be changed without restart, but Patroni didn't yet support it: * archive_cleanup_command * promote_trigger_file * recovery_end_command * recovery_min_apply_delay In future postgres releases this list will be extended and Patroni will support it automatically.	2019-11-11 16:18:23 +01:00
Alexander Kukushkin	09a7cf265d	Fix 'start failed' issue (#1262 ) The start of postgres happens in two stages: 1. First Patroni is waiting for postgres port to be open 2. After that, it is waiting for postgres starts to accept connections There is a default timeout 60 seconds for both stages (in total). When the port isn't open, pg_isready exits with code=2. If postgres is rejecting connections due to recovery, exit code=1. In most cases postgres quickly opens the port and pg_isready starts returning 1, but in rare cases the whole timeout could spend in `1.` After that, the HA loop is still waiting for postgres to start, but executing only the check from `2.`. Since pg_isready exit code is still = 2, Patroni was falsely assuming that 'start failed' without taking into consideration the fact that the postmaster process is up and running. Fixes https://github.com/zalando/patroni/issues/1160	2019-11-11 09:37:06 +01:00
Feike Steenbergen	d2d49907ad	Correctly document PATRONI_KUBERNETES_PORTS (#1266 ) The previous documentation was wrong and will throw the following error when used: Exception when parsing list {[{"name": "postgresql", "port": 5432}]} When removing the surrounding braces, the error goes away and the endpoint is updated with the correct Port name.	2019-11-05 10:09:24 +01:00
Alexander Kukushkin	94b7ff656e	Don't give up on retry too early (#1245 ) Fixes https://github.com/zalando/patroni/issues/1195	2019-10-31 09:33:16 +01:00
Alexander Kukushkin	29ac77b6e7	Compare all recovery parameters (#1208 ) Previously check_recovery_conf() function was only checking whether primary_conninfo has changed and never taking into account all other recovery parameters. Fixes https://github.com/zalando/patroni/issues/1201	2019-10-30 12:30:09 +01:00
Alexander Kukushkin	9e87b00d36	Kill callback child processes when it is necessary (#1242 ) Not doing so makes it hard to implement callbacks in bash and eventually can lead to the situation when two callbacks are running at the same time. In case if we failed to kill the child process we will still wait for it to finish. The same problem could happen with custom bootstrap, therefore if we happen to kill the custom bootstrap process we also kill all child subprocesses. Closes https://github.com/zalando/patroni/issues/1238	2019-10-29 12:44:18 +01:00
Alexander Kukushkin	3f711650a7	Fix compatibility with python 3.4&3.5 (#1248 ) Close https://github.com/zalando/patroni/issues/1247	2019-10-25 15:01:49 +02:00
Alexander Kukushkin	2a9ef418d6	Return the real member name when picking the sync standby (#1253 ) Before we returned it in the lower case, what was preventing such a standby from promoting due to the name comparison mismatch. Fixes https://github.com/zalando/patroni/issues/1252	2019-10-25 14:53:05 +02:00
Alexander Kukushkin	6fe482a4c8	Avoid calling expensive os.listdir() (#1254 ) When the system is under IO stress, `os.listdir()` could take a few seconds (or even minutes) to execute what is badly affecting the HA loop of Patroni and could even cause the leader key to disappear from DCS due to the lack of updates. There is a better and less expensive way to check that the PGDATA is not empty. Instead of doing the `os.listdir` we simply check the presence of the `global/pg_control` file in it.	2019-10-25 14:52:13 +02:00
Alexander Kukushkin	828585079f	Improve workflow when PGDATA is not empty during bootstrap (#1217 ) Recently it has happened two times when people tried to deploy the new cluster but postgres data directory wasn't empty and also wasn't valid. In this case Patroni was still creating initialize key in DCS and trying to start the postgres up. Now it will complain about non-empty invalid postgres data directory and exit. Close https://github.com/zalando/patroni/issues/1216	2019-10-25 14:09:44 +02:00
Alexander Kukushkin	0947ac1e43	Fix race condition in postmaster_start_time() (#1243 ) when it is executed not from the main thread we need to create a new cursor object.	2019-10-24 11:23:34 +02:00
Cody Coons	d770c910fd	Remove only PATRONI_ prefixed environment variables (#1224 ) it will solve a lot of problems with running different FDW	2019-10-24 08:39:21 +02:00
cobolbaby	732d33812f	Add net-tools and iputils-ping to the docker image (#1230 ) they might be useful.	2019-10-24 08:36:50 +02:00
Alexander Kukushkin	78a3848e73	Retry on raft internal error (#1241 ) Fixes https://github.com/zalando/patroni/issues/1237	2019-10-22 17:20:06 +02:00
Alexander Kukushkin	367d787ff9	Implement /history and /cluster endpoints (#1191 ) The /history endpoint shows the content of the `history` key in DCS The /cluster endpoint show all cluster members and some service info like pending and scheduled restarts or switchovers. In addition to that implement `patronictl history` Close #586 Close #675 Close #1133	2019-10-22 17:19:02 +02:00
Alexander Kukushkin	f4623c4e8e	Build recovery params in a separate method (#1219 ) In addition to that try to protect from the case when some recovery parameters are set in one of included files by explicitly setting their value to an empty string on postgres 12. Simplifies https://github.com/zalando/patroni/pull/1208	2019-10-11 20:18:06 +02:00
Alexander Kukushkin	863aed314b	Fix race conditions in async actions (#1215 ) Specifically, there was a chance that `patronictl reinit --force` was overwritten by recover and we end up in a situation when Patroni was trying to start the postgres while basebackup still running.	2019-10-11 10:17:02 +02:00
Alexander Kukushkin	b666f5e4ed	Refactor Patroni REST API communication (#1197 ) * make it possible to use client certificates with REST API * define a separate PatroniRequest class which handles all communication * refactor patronictl to use the new class * make Ha to use the new class instead of calling requests.get. The old call wasn't taking into account certificates and basic-auth Close #898	2019-10-11 10:16:33 +02:00
Alexander Kukushkin	21ed8e2d09	A few small fixes (#1221 ) * fix some warnings when running unit-tests * allow python-kubernetes up to 10.0.1 * python-consul>=0.7.1 is required due to #802	2019-10-11 10:15:22 +02:00
Alexander Kukushkin	c95275665f	Functions for better parsing of primary_conninfo and recovery.conf (#1218 ) Needed to simplify https://github.com/zalando/patroni/pull/1208	2019-10-10 16:00:55 +02:00
Alexander Kukushkin	3d29cb7e50	Perform pg_ctl reload regardless of config changes (#1204 ) It is possible that some config files are not controlled by Patroni and when somebody is doing reload via REST API or by sending SIGHUP to Patroni process the usual expectation is that postgres will also be reloaded, but it didn't happen when there were no changes in the postgresql section of Patroni config. For example one might replace ssl_cert_file and ssl_key_file on the filesystem and starting from PostgreSQL 10 it just requires a reload, but Patroni wasn't doing it. In addition to that fix the issue with handling of `wal_buffers`. The default value depends on `shared_buffers` and `wal_segment_size` and therefore Patroni was exposing pending_restart when the new value in the config was explicitly set to -1 (default). Close https://github.com/zalando/patroni/issues/1198	2019-10-10 14:49:30 +02:00
Alexander Kukushkin	1572c02ced	Use passfile in the primary_conninfo instead of password (#1194 ) Fixed a few minor issues related to the #1134 and #1122 Close https://github.com/zalando/patroni/issues/1185	2019-10-09 18:04:14 +02:00
Alexander Kukushkin	86ee22efab	Switch to a streaming watcher (#1189 ) Watch requests to K8s API either streaming the data or close connection by timeout. In any case it requires a second connection open, but opening a new connection every 10 seconds is more expensive for both, Patroni and K8s API. Switching to the streaming model also brings other benefits: we can watch not only on leader object, but also on config and wake up Patroni main thread if the config was changed.	2019-10-07 15:16:35 +02:00
Alexander Kukushkin	facee0186d	Explicitly start logger Thread (#1186 ) The PatroniLogger object is instantiated in the Patroni constructor and down the road there might be a fatal error causing Patroni process to exit, but live thread prevents the normal shutdown. In order to mitigate the issue and don't loose ability to use the logging infrastructure we will switch to QueueLogger only when the thread was explicitly started from the Patroni.run() method. Continuation of https://github.com/zalando/patroni/pull/1178	2019-10-07 11:00:38 +02:00
Alexander Kukushkin	686b2c5432	Fix memory leak on python 3.7 (#1200 ) Close https://github.com/zalando/patroni/issues/1167	2019-10-07 10:55:26 +02:00
wilfriedroset	ee678f61d7	Fix typos in documentation (#1202 )	2019-10-07 10:34:43 +02:00
Jecho	a8c32a4032	Fix minor typo in documentation #1212 Close #1211	2019-10-07 10:14:15 +02:00
geokala	178e565fe4	Update cacert documentation for use with REST API (#1190 ) Fixes #1188	2019-09-24 13:04:07 +02:00
Alexander Kukushkin	fa7eef3d7c	Fix logger shutdown behavior (#1178 ) Since it is based on Thread with daemon set to True, the shutdown of logger was very likely to happen too early, what was causing some lines not to appear at the destination. Close https://github.com/zalando/patroni/issues/1173	2019-09-17 12:27:09 +02:00
Jonathan S. Katz	a88704e792	Allow for certificate-based authentication from Patroni PostgreSQL accounts (#1134 ) The two principal features this introduces: 1. Provide the Patroni PostgreSQL management accounts (superuser, replication, rewind) to be able to authenticate using certificate-based authentication 2. Allow the user to specify the `sslmode` they wish to connect as. ### References - [PostgreSQL Certificate Based Authentication](https://www.postgresql.org/docs/current/auth-cert.html) - [libpq connection parameters](https://www.postgresql.org/docs/current/libpq-connect.html) which are used by psycopg2 - [SSL Modes](https://www.postgresql.org/docs/current/libpq-ssl.html)	2019-09-17 12:14:49 +02:00
anikin-aa	3937a8d4fc	Fix status code for GET /replica, when replica is starting (#1152 ) Close #772, #1128	2019-08-26 11:18:13 +02:00
Soulou	53d32f1457	Allow lower values for postgresql configuration (#1148 ) * Default values have not been changed * These minimal values still work properly to boot a (small) cluster Fixes #1142	2019-08-26 10:48:36 +02:00

... 5 6 7 8 9 ...

2020 Commits