This adds the `selector` to the Patroni Kubernetes StatefulSet spec
Without it, one gets errors like
```
error: error validating "patroni_k8s.yaml": error validating data: ValidationError(StatefulSet.spec): missing required field "selector" in io.k8s.api.apps.v1.StatefulSetSpec; if you choose to ignore these errors, turn validation off with --validate=false
```
(as mentioned in #1867)
When running on K8s Patroni is communicating with API via the `kubernetes` service, which is address is exposed via the
`KUBERNETES_SERVICE_HOST` environment variable. Like any other service, the `kubernetes` service is handled by `kube-proxy`, that depending on configuration is either relying on userspace program or `iptables` for traffic routing.
During K8s upgrade, when master nodes are replaced, it is possible that `kube-proxy` doesn't update the service configuration in time and as a result Patroni fails to update the leader lock and demotes postgres.
In order to improve the user experience and get more control on the problem we make it possible to bypass the `kubernetes` service and connect directly to API nodes.
The strategy is very simple:
1. Resolve list IPs of API nodes from the kubernetes endpoint on every iteration of HA loop.
2. Stick to one of these IPs for API requests
3. Switch to a different IP if connected to IP is not from the list
4. If the request fails, switch to another IP and retry
Such a strategy is already used for Etcd and proven to work quite well.
In order to enable the feature, you need either to set to `true` `kubernetes.bypass_api_service` in the Patroni configuration file or `PATRONI_KUBERNETES_BYPASS_API_SERVICE` environment variable.
If for some reason `GET /default/endpoints/kubernetes` isn't allowed Patroni will disable the feature.
They could be useful to eliminate "unhealthy" pods from subsets addresses when the K8s service with label selectors are used.
Real-life example: the node where the primary was running has failed and being shutdown and Patroni can't update (remove) the role label.
Therefore on OpenShift the leader service will have two pods assigned, one of them is a failed primary.
With the readiness probe defined, the failed primary pod will be excluded from the list.
OpenShift enforces securityContext.fsGroups for block devices and sets group stickybits for volumeMounts.
This leads to patroni pods failing to start after the first restart:
> 2020-01-13 14:46:13.695 UTC [143] FATAL: data directory "/home/postgres/pgdata/pgroot/data" has invalid permissions
2020-01-13 14:46:13.695 UTC [143] DETAIL: Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
A initContainer which fixes the OpenShift tampering solves the issue. I stole the solution from the stable postgres helm chart:
https://github.com/helm/charts/pull/14540/files
Tested on OpenShift v3.11
Note: This error does not occur when using shared filesystems (like NFS)
if there is no service defined k8s assumes that endpoint is orphaned and removes it.
Patroni tries to create the service only in case if use_endpoints is enabled if the following cases:
1. Upon start
2. When it tries to (re-)create the config endpoint
If for some reason creation of the service has failed, Patroni will retry it on every cycle of HA loop. Usually it fails due to lack of permissions and if you don't want to give such permissions to the service account used by Patroni, you can create the service explicitly in the deployment manifest.
- Update postgres docker image to the latest 11 version.
- Remove empty lines inside the `RUN` command to make the Dockerfile compatible with future docker versions.
- Set the `PATRONI_KUBERNETES_POD_IP` environment variable, which is required when _use_endpoints_ is enabled. Otherwise, the `KeyError` is raised [here](https://github.com/zalando/patroni/blob/master/patroni/dcs/kubernetes.py#L95).
- Set `EDITOR` environment variable to make configuration changes via `patronictl edit-config`.
- It modifies the Dockerfile and entrypoint slightly to allow for OpenShift SCCs to operate correctly
- It adds 2 template examples that can be easily modified by changing parameters
Fixes#572
'patronictl remove' deletes the cluster configuration (stored either in configmaps or endpoints) and cannot be run from the postgres pod w/o 'delete' on those objects being granted to the pod service account.
Adds 3 resources that will properly setup the RBAC:
1. service account, which is also assigned to the pods of the cluster, so that they use those particular permissions
2. a role, which holds only the necessary permissions that patroni members need to interact with k8s cluster
3. a rolebinding, which connects two two former things together to use.
The role and rolebinding was created using this tool https://github.com/liggitt/audit2rbac which looks at [audit logs](https://kubernetes.io/docs/tasks/debug-application-cluster/audit/#advanced-audit) provided by the api server.
* Use ConfigMaps or Endpoins for leader elections and to keep cluster state
* Label pods with a postgres role
* change behavior of pip install. From now on it will not install all dependencies, you have to specify explicitly DCS you want to use Patroni with: `pip install patroni[etcd,zookeeper,kubernetes]`