patroni

mirror of https://github.com/outbackdingo/patroni.git synced 2026-01-27 18:20:05 +00:00

Author	SHA1	Message	Date
Alexander Kukushkin	1de7c78c04	Release 1.4.4 (#683 ) bump version and update release notes v1.4.4	2018-05-22 14:46:19 +02:00
Alexander Kukushkin	041015037e	Sync replication slots when we noticed a new postmaster process (#677 ) Fixes: https://github.com/zalando/patroni/issues/674	2018-05-18 16:32:06 +02:00
Alexander Kukushkin	856552bd61	Sync replication slots and verify sysid after coming out of pause (#678 ) Fixes https://github.com/zalando/patroni/issues/568 and https://github.com/zalando/patroni/issues/674	2018-05-18 12:18:49 +02:00
Oleksii Kliukin	4ce539ba1b	Allow options to the basebackup built-in method. (#604 ) Options should be specified in the basebackup section, which is optional.	2018-05-18 12:18:35 +02:00
Oleksii Kliukin	1043376e6b	Do not exit when encountering invalid system ID. (#669 ) Do not exit when the cluster system ID is empty or the one that doesn't pass the validation check. In that case, the cluster most likely needs a reinit; mention it in the result message. Avoid terminating Patroni, as otherwise reinit cannot happen.	2018-05-18 11:48:15 +02:00
Alexander Kukushkin	ed479fe585	Don't demote master if failed to update leader key in pause (#668 ) Fixes https://github.com/zalando/patroni/issues/659	2018-05-18 11:19:56 +02:00
Alexander Kukushkin	5ce18a8045	Improve protection of DCS being accidentally wiped (#680 ) We already have a lot of logic in place to prevent failover in such case and restore all keys, but an accidental removal of `/config` key was effectively switching off pause mode for 1 cycle of HA loop.	2018-05-18 11:18:58 +02:00
Alexander Kukushkin	5296336f4a	BUGFIX: postmaster start can fail if pid from postmaster.pid is alive (#681 ) Upon start postmaster process performs various safety checks if there is a postmaster.pid file in the data directory. Although Patroni already detected that the running process corresponding to the postmaster.pid is not a postmaster, the new postmaster might fail to start, because it thinks that postmaster.pid is already locked. Important!!! Unlink of postmaster.pid isn't an option in this case, because it has a lot of nasty race conditions. Luckily there is a workaround to this problem, we can pass the pid from postmaster.pid in the `PG_GRANDPARENT_PID` environment variable and postmaster will ignore it. More likely to hit such problem if you run Patroni and postgres in the docker container.	2018-05-18 11:18:27 +02:00
Cody Coons	3eeb4ed979	Added check for empty subsets (#670 ) On Kubernetes 1.10.0 I experienced an issue where calls to `patch_or_create` were failing when bootstraping a cluster. The call was failing because `self._leader_observed_subsets` was `None` instead of `[]`.	2018-04-26 16:38:19 +02:00
Alexander Kukushkin	84f29caf92	Fix race condition in poll_failover_result (#658 ) It didn't affect directly neither failover nor switchover, but in some rare cases it was reporting it as a success too early, when the former leader released the lock: `Failed over to "None" instead of "desired-node"` In addition to that this commit improves logs and status messages by differentiating between failover and switchover.	2018-04-16 17:45:05 +02:00
Alexander Kukushkin	d78790b194	Abort start if attaching to running postgres and cluster not initiazlied (#661 ) Patroni can attach itself to an already running PostgreSQL instance. If that is the first instance "seen" in the given cluster, Patroni for that instance will create the initialize key, grab the leader key and, if the instance is running a replica, promote. Because of this behavior, when a cluster with a master and one or more replicas gets Patroni for each node, it is imperative to start running Patroni on the master node before getting to the replicas. This commit changes such weird behavior and will abort Patroni start if there is no initialize key in DCS and postgres is running as a replica. Closes https://github.com/zalando/patroni/issues/655	2018-04-16 17:32:26 +02:00
Kostiantyn Nemchenko	3110090154	Minor corrections to the documentation. (#654 )	2018-04-16 15:46:46 +02:00
Reinhard Nägele	20138af37a	Link to official Helm chart (#660 ) Changes the link from my outdated fork to the official Helm chart which is now up to date.	2018-04-16 15:45:53 +02:00
Dave Cramer	38ad394308	Use the word primary in favour of master (#663 ) Primary is a better alternative.	2018-04-16 01:29:51 +02:00
Alexander Kukushkin	e375fac273	Treat postgres settings parameter names as case insensitive (#650 ) Because they are indeed case insensitive. Most of the parameters have snake_case_name, but there are three exceptions from this rule: DateStyle, IntervalStyle and TimeZone. In fact, if you specify timezone = 'some/tzn' it still works, but Patroni wasn't able to find 'timezone' in pg_settings and stripping this parameter out. We will use CaseInsensitiveDict to keep postgresql.parameters. This change affects only "final" configuration. That means if you put some"duplicates" (work_mem vs WORK_MEM) into patroni yaml or into cluster config, it would be resolved only at the last stage and for example you will be able to see both values if you use `patronictl edit-config`. Fixes https://github.com/zalando/patroni/issues/649	2018-04-04 14:23:53 +02:00
Alexander Kukushkin	8c795ff0cf	Pass dict object to touch_member instead of json encoded string (#651 ) DCS implementation will take care about encoding it. Fixes https://github.com/zalando/patroni/issues/642	2018-04-04 13:45:44 +02:00
Don Seiler	140618abd2	Missing a word (#647 ) In re Issue #639	2018-04-04 13:40:46 +02:00
Josh Berkus	3c05e2e984	Added references to the Slack channel in Readme and in contributing.rst. (#653 )	2018-04-04 13:39:43 +02:00
bradnicholson	ca679a93b8	Make deleting recovery.conf optional. (#638 ) pgBackRest's restore command generates the appropriate recovery.conf based on the parameters you provide to pgBackRest. When calling pgBackRest's restore command via Patroni's custom bootstrap, it deletes that recovery.conf. Specifying the recovery.conf information in the patroni.yml is less than ideal. It prevent's leveraging pgBackRests work to ensure recovery.conf files are properly generated. It also can lead to transient config data in the patroni.yml under certain restore cases, such as a PITR restore of Cluster B to Cluster A, where the restore_commnand in A needs to reference B. The parameter is optional. The default behavior is to delete the recovery.conf. Fixes https://github.com/zalando/patroni/issues/637	2018-03-09 15:35:29 +01:00
Alexander Kukushkin	f500dbb0ff	Release 1.4.3 (#635 ) Bump version and update release notes v1.4.3	2018-03-05 10:10:17 +01:00
Andy Newton	f748de3b29	Make log level configurable from environment variables (#622 ) * `PATRONI_LOGLEVEL` - sets the general logging level * `PATRONI_REQUESTS_LOGLEVEL` - sets the logging level for all HTTP requests e.g. Kubernetes API calls	2018-03-05 09:50:45 +01:00
Alexander Kukushkin	3afd26101b	Single user mode was waiting for user input and never finish (#634 ) Regression was introduced in https://github.com/zalando/patroni/pull/576	2018-03-02 22:22:43 +01:00
Alexander Kukushkin	c04e7a1798	Write bootstrap.pg_hba into a pg_hba.conf after custom bootstrap (#632 ) Fixes https://github.com/zalando/patroni/issues/631	2018-02-26 18:48:56 +01:00
Alexander Kukushkin	89a11fed07	Don't rediscover etcd cluster topology when watch timed out (#630 ) but switch to the next node if it is possible. Fixes https://github.com/zalando/patroni/issues/628	2018-02-26 18:48:30 +01:00
Alexander Kukushkin	c95dd990cc	Release 1.4.2 (#619 ) * Bump version to 1.4.2 * Update release notes v1.4.2	2018-01-30 16:44:42 +01:00
Alexander Kukushkin	dd1500b4dc	Handle exceptions raised from psutil (#610 ) Process.cmdline can raise `NoSuchProcess` or `AccessDenied` Process.children can raise `AccessDenied` Fixes https://github.com/zalando/patroni/issues/609	2018-01-30 16:26:39 +01:00
Alexander Kukushkin	0db788bb95	Make show-config work with cluster_name from config file (#618 ) similar to edit-config, list and so on	2018-01-30 14:24:54 +01:00
Maciej Szulik	a4aaf53212	Add proper rbac to run patroni on k8s (#616 ) Adds 3 resources that will properly setup the RBAC: 1. service account, which is also assigned to the pods of the cluster, so that they use those particular permissions 2. a role, which holds only the necessary permissions that patroni members need to interact with k8s cluster 3. a rolebinding, which connects two two former things together to use. The role and rolebinding was created using this tool https://github.com/liggitt/audit2rbac which looks at [audit logs](https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#advanced-audit) provided by the api server.	2018-01-30 12:00:49 +01:00
Maciej Szulik	3d293ac087	Change the pip definition in Dockerfile to use master now (#617 )	2018-01-30 10:58:08 +01:00
Alexander Kukushkin	a0c8491abb	Don't swallow silently all errors from k8s API (#611 ) Output exception trace to the logs when http status code == 403, something is wrong with permissions. When http status code == 409 -- such error could be ignored, because object probably was created or updated by another process. For all other http status codes it will also produce stack traces. I hope it will help to debug issues similar to the https://github.com/zalando/patroni/issues/606	2018-01-26 09:57:17 +01:00
Alexander Kukushkin	f8b6b21297	Avoid calling pg_controldata during bootstrap (#612 ) Fixes https://github.com/zalando/patroni/issues/608	2018-01-26 09:56:17 +01:00
Alexander Kukushkin	9825b8a584	Improve patronictl list UX (#613 ) * rename scheduled failover to scheduled switchover * show information about pending_restarts	2018-01-26 09:55:54 +01:00
Alexander Kukushkin	8da05ad785	Update haproxy.tmpl (#614 ) connection to postgres should be closed forcibly when health-check fails	2018-01-25 15:31:02 +01:00
Alexander Kukushkin	a1e5c8e1cb	A few iprovements in patronictl (#601 ) * make switchover work with an old patroni * exclude leader from candidates when interactively running failover v1.4.1	2018-01-17 15:33:08 +01:00
Oleksii Kliukin	4202ad853a	Minor corrections to the documentation. (#599 )	2018-01-10 16:10:12 +01:00
Alexander Kukushkin	93ac309b38	Fix link to the Kubernetes documentation (#598 ) blog => blob	2018-01-10 13:19:23 +01:00
Oleksii Kliukin	84d804e579	Release notes 1.4 (#597 ) Document Kubernetes parameters, environment variables. Describe how Patroni uses Kubernetes. v1.4	2018-01-10 11:17:08 +01:00
Alexander Kukushkin	d1312a7ce4	Do not try to load history file when timeline=1 (#596 ) 00000001.history doesn't exists	2018-01-09 12:01:14 +01:00
Oleksii Kliukin	d14d9f669a	Document pip-related installation options. (#595 ) * Remove redundant requirements of Mac OS. * Clarify how to run the example in getting started.	2018-01-08 13:59:31 +01:00
Alexander Kukushkin	5668367181	Implement '/sync' and `/async` endpoints (#578 ) They will respond with http status code 200 only when the node is running as a synchronous or asynchronous replica. Fixes https://github.com/zalando/patroni/issues/189 Fixes https://github.com/zalando/patroni/issues/415	2018-01-05 15:28:40 +01:00
Alexander Kukushkin	03c2a85d23	Expose current timeline in DCS and via API (#591 ) It is very easy to get current timeline on the master by executing ```sql SELECT ('x' \|\| SUBSTR(pg_walfile_name(pg_current_wal_lsn()), 1, 8))::bit(32)::int ``` Unfortunately the same method doesn't work when postgres is_in_recovery. Therefore we will use replication connection for that on the replicas. In order to avoid opening and closing replication connection on every HA loop we will cache the result if its value matches with the timeline of the master. Also this PR introduces a new key in DCS: `/history`. It will contain a json serialized object with timeline history in a format similar to the usual history files. The differences are: * Second column is the absolute wal position in bytes, instead of LSN * Optionally there might be a fourth column - timestamp, (mtime of history file)	2018-01-05 15:25:56 +01:00
Alexander Kukushkin	18786464a1	Rename failover to switchover and make new failover work without leader (#588 ) In addition to that implement /switchover endpoint as an alias to /failover endpoint and implement more checks like: * candidate must be provided for a failover * switchover can't be scheduled in a pause state * and so on Fixes https://github.com/zalando/patroni/issues/585 Fixes https://github.com/zalando/patroni/issues/520	2018-01-05 15:17:56 +01:00
Alexander Kukushkin	3a96ffa718	Expose pause state of every member to DCS and via REST (#592 ) and implement patronictl pause\|resume --wait on top of that Fixes https://github.com/zalando/patroni/issues/349	2018-01-05 15:16:45 +01:00
Alexander Kukushkin	6b01d2787f	More improvements in patronictl (#590 ) Make specifying cluster_name optional for some more commands. If it is not specified, it's value would be taken from config file.	2018-01-04 12:26:13 +01:00
Alexander Kukushkin	2b8618b027	Minimize amount of SELECTS issued by Patroni on every loop (#584 ) Every iteration of HA loop Patroni needs to call pg_is_in_recovery() and calcualte absolute wal_position. It was doing two separate SELECT statements for that. In case of master it was doing even three queries (wal_position two times). We will issue one SELECT for every HA loop and cache the results.	2018-01-04 11:17:43 +01:00
Ants Aasma	15d1767402	Some improvements to patronictl (#571 ) * Use scope from config file when listing members * Add version command to patronictl * Only delete leader on shutdown when we have the lock to avoid exceptions when leader key does not exist * Add a timestamp option to list command. * YAML format for patronictl output * Fix API request to get version	2018-01-04 10:35:22 +01:00
Alexander Kukushkin	0e01bb33bb	Improve patronictl reinit (#576 ) Make it possible to cancel a running task if you want to reinitialize replica. There are two possible ways to trigger it: 1. patronictl will ask whether you want to cancel already running task if an attempt to trigger reinitialize has failed 2. if you are using `--force` argument with `patronictl reinit`	2018-01-04 10:31:44 +01:00
Alexander Kukushkin	b6425cab85	Allow to specify multiple hosts for etcd (#589 ) This list will be used for initial discovery of etcd cluster members. If for some reason during work this list of hosts has been exhausted (during work), Patroni will return to initial list. In addition to that improve ipv6 compatibility by using a special function for splitting host and port. Fixes https://github.com/zalando/patroni/issues/523	2018-01-04 10:25:06 +01:00
Alexander Kukushkin	84de53603f	Update travis settings (#581 ) * Add master branch and release tags to safelist * Update build matrix: don't install python3.5 if running acceptance tests	2017-12-20 16:28:09 +01:00
Alexander Kukushkin	062c55f99c	Update readthedocs config (#580 ) * Get Patroni version from patroni/version.py * Update copyright to match with the LICENSE file Fixes https://github.com/zalando/patroni/issues/519	2017-12-20 14:28:12 +01:00

1 2 3 4 5 ...

1511 Commits