mirror of
https://github.com/optim-enterprises-bv/kubernetes.git
synced 2025-11-02 11:18:16 +00:00
move admin related docs into docs/admin
This commit is contained in:
92
docs/admin/README.md
Normal file
92
docs/admin/README.md
Normal file
@@ -0,0 +1,92 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Kubernetes Cluster Admin Guide
|
||||
|
||||
The cluster admin guide is for anyone creating or administering a Kubernetes cluster.
|
||||
It assumes some familiarity with concepts in the [User Guide](user-guide.md).
|
||||
|
||||
## Planning a cluster
|
||||
|
||||
There are many different examples of how to setup a kubernetes cluster. Many of them are listed in this
|
||||
[matrix](getting-started-guides/README.md). We call each of the combinations in this matrix a *distro*.
|
||||
|
||||
Before choosing a particular guide, here are some things to consider:
|
||||
- Are you just looking to try out Kubernetes on your laptop, or build a high-availability many-node cluster? Both
|
||||
models are supported, but some distros are better for one case or the other.
|
||||
- Will you be using a hosted Kubernetes cluster, such as [GKE](https://cloud.google.com/container-engine), or setting
|
||||
one up yourself?
|
||||
- Will your cluster be on-premises, or in the cloud (IaaS)? Kubernetes does not directly support hybrid clusters. We
|
||||
recommend setting up multiple clusters rather than spanning distant locations.
|
||||
- Will you be running Kubernetes on "bare metal" or virtual machines? Kubernetes supports both, via different distros.
|
||||
- Do you just want to run a cluster, or do you expect to do active development of kubernetes project code? If the
|
||||
latter, it is better to pick a distro actively used by other developers. Some distros only use binary releases, but
|
||||
offer is a greater variety of choices.
|
||||
- Not all distros are maintained as actively. Prefer ones which are listed as tested on a more recent version of
|
||||
Kubernetes.
|
||||
- If you are configuring kubernetes on-premises, you will need to consider what [networking
|
||||
model](networking.md) fits best.
|
||||
- If you are designing for very [high-availability](availability.md), you may want multiple clusters in multiple zones.
|
||||
|
||||
## Setting up a cluster
|
||||
|
||||
Pick one of the Getting Started Guides from the [matrix](getting-started-guides/README.md) and follow it.
|
||||
If none of the Getting Started Guides fits, you may want to pull ideas from several of the guides.
|
||||
|
||||
One option for custom networking is *OpenVSwitch GRE/VxLAN networking* ([ovs-networking.md](ovs-networking.md)), which
|
||||
uses OpenVSwitch to set up networking between pods across
|
||||
Kubernetes nodes.
|
||||
|
||||
If you are modifying an existing guide which uses Salt, this document explains [how Salt is used in the Kubernetes
|
||||
project.](salt.md).
|
||||
|
||||
## Upgrading a cluster
|
||||
[Upgrading a cluster](cluster_management.md).
|
||||
|
||||
## Managing nodes
|
||||
|
||||
[Managing nodes](node.md).
|
||||
|
||||
## Optional Cluster Services
|
||||
|
||||
* **DNS Integration with SkyDNS** ([dns.md](dns.md)):
|
||||
Resolving a DNS name directly to a Kubernetes service.
|
||||
|
||||
* **Logging** with [Kibana](logging.md)
|
||||
|
||||
## Multi-tenant support
|
||||
|
||||
* **Namespaces** ([namespaces.md](namespaces.md)): Namespaces help different
|
||||
projects, teams, or customers to share a kubernetes cluster.
|
||||
|
||||
* **Resource Quota** ([resource_quota_admin.md](resource_quota_admin.md))
|
||||
|
||||
## Security
|
||||
|
||||
* **Kubernetes Container Environment** ([container-environment.md](container-environment.md)):
|
||||
Describes the environment for Kubelet managed containers on a Kubernetes
|
||||
node.
|
||||
|
||||
* **Securing access to the API Server** [accessing the api](accessing_the_api.md)
|
||||
|
||||
* **Authentication** [authentication](authentication.md)
|
||||
|
||||
* **Authorization** [authorization](authorization.md)
|
||||
|
||||
* **Admission Controllers** [admission_controllers](admission_controllers.md)
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
94
docs/admin/accessing-the-api.md
Normal file
94
docs/admin/accessing-the-api.md
Normal file
@@ -0,0 +1,94 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Configuring APIserver ports
|
||||
|
||||
This document describes what ports the kubernetes apiserver
|
||||
may serve on and how to reach them. The audience is
|
||||
cluster administrators who want to customize their cluster
|
||||
or understand the details.
|
||||
|
||||
Most questions about accessing the cluster are covered
|
||||
in [Accessing the cluster](accessing-the-cluster.md).
|
||||
|
||||
|
||||
## Ports and IPs Served On
|
||||
The Kubernetes API is served by the Kubernetes APIServer process. Typically,
|
||||
there is one of these running on a single kubernetes-master node.
|
||||
|
||||
By default the Kubernetes APIserver serves HTTP on 2 ports:
|
||||
1. Localhost Port
|
||||
- serves HTTP
|
||||
- default is port 8080, change with `--insecure-port` flag.
|
||||
- defaults IP is localhost, change with `--insecure-bind-address` flag.
|
||||
- no authentication or authorization checks in HTTP
|
||||
- protected by need to have host access
|
||||
2. Secure Port
|
||||
- default is port 6443, change with `--secure-port` flag.
|
||||
- default IP is first non-localhost network interface, change with `--bind-address` flag.
|
||||
- serves HTTPS. Set cert with `--tls-cert-file` and key with `--tls-private-key-file` flag.
|
||||
- uses token-file or client-certificate based [authentication](authentication.md).
|
||||
- uses policy-based [authorization](authorization.md).
|
||||
3. Removed: ReadOnly Port
|
||||
- For security reasons, this had to be removed. Use the service account feature instead.
|
||||
|
||||
## Proxies and Firewall rules
|
||||
|
||||
Additionally, in some configurations there is a proxy (nginx) running
|
||||
on the same machine as the apiserver process. The proxy serves HTTPS protected
|
||||
by Basic Auth on port 443, and proxies to the apiserver on localhost:8080. In
|
||||
these configurations the secure port is typically set to 6443.
|
||||
|
||||
A firewall rule is typically configured to allow external HTTPS access to port 443.
|
||||
|
||||
The above are defaults and reflect how Kubernetes is deployed to Google Compute Engine using
|
||||
kube-up.sh. Other cloud providers may vary.
|
||||
|
||||
## Use Cases vs IP:Ports
|
||||
|
||||
There are three differently configured serving ports because there are a
|
||||
variety of uses cases:
|
||||
1. Clients outside of a Kubernetes cluster, such as human running `kubectl`
|
||||
on desktop machine. Currently, accesses the Localhost Port via a proxy (nginx)
|
||||
running on the `kubernetes-master` machine. Proxy uses bearer token authentication.
|
||||
2. Processes running in Containers on Kubernetes that need to do read from
|
||||
the apiserver. Currently, these can use a service account.
|
||||
3. Scheduler and Controller-manager processes, which need to do read-write
|
||||
API operations. Currently, these have to run on the operations on the
|
||||
apiserver. Currently, these have to run on the same host as the
|
||||
apiserver and use the Localhost Port. In the future, these will be
|
||||
switched to using service accounts to avoid the need to be co-located.
|
||||
4. Kubelets, which need to do read-write API operations and are necessarily
|
||||
on different machines than the apiserver. Kubelet uses the Secure Port
|
||||
to get their pods, to find the services that a pod can see, and to
|
||||
write events. Credentials are distributed to kubelets at cluster
|
||||
setup time.
|
||||
|
||||
## Expected changes
|
||||
- Policy will limit the actions kubelets can do via the authed port.
|
||||
- Kubelets will change from token-based authentication to cert-based-auth.
|
||||
- Scheduler and Controller-manager will use the Secure Port too. They
|
||||
will then be able to run on different machines than the apiserver.
|
||||
- A general mechanism will be provided for [giving credentials to
|
||||
pods](
|
||||
https://github.com/GoogleCloudPlatform/kubernetes/issues/1907).
|
||||
- Clients, like kubectl, will all support token-based auth, and the
|
||||
Localhost will no longer be needed, and will not be the default.
|
||||
However, the localhost port may continue to be an option for
|
||||
installations that want to do their own auth proxy.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
146
docs/admin/admission-controllers.md
Normal file
146
docs/admin/admission-controllers.md
Normal file
@@ -0,0 +1,146 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Admission Controllers
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
- [Admission Controllers](#admission-controllers)
|
||||
- [What are they?](#what-are-they?)
|
||||
- [Why do I need them?](#why-do-i-need-them?)
|
||||
- [How do I turn on an admission control plug-in?](#how-do-i-turn-on-an-admission-control-plug-in?)
|
||||
- [What does each plug-in do?](#what-does-each-plug-in-do?)
|
||||
- [AlwaysAdmit](#alwaysadmit)
|
||||
- [AlwaysDeny](#alwaysdeny)
|
||||
- [DenyExecOnPrivileged](#denyexeconprivileged)
|
||||
- [ServiceAccount](#serviceaccount)
|
||||
- [SecurityContextDeny](#securitycontextdeny)
|
||||
- [ResourceQuota](#resourcequota)
|
||||
- [LimitRanger](#limitranger)
|
||||
- [NamespaceExists](#namespaceexists)
|
||||
- [NamespaceAutoProvision (deprecated)](#namespaceautoprovision-(deprecated))
|
||||
- [NamespaceLifecycle](#namespacelifecycle)
|
||||
- [Is there a recommended set of plug-ins to use?](#is-there-a-recommended-set-of-plug-ins-to-use?)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## What are they?
|
||||
|
||||
An admission control plug-in is a piece of code that intercepts requests to the Kubernetes
|
||||
API server prior to persistence of the object, but after the request is authenticated
|
||||
and authorized. The plug-in code is in the API server process
|
||||
and must be compiled into the binary in order to be used at this time.
|
||||
|
||||
Each admission control plug-in is run in sequence before a request is accepted into the cluster. If
|
||||
any of the plug-ins in the sequence reject the request, the entire request is rejected immediately
|
||||
and an error is returned to the end-user.
|
||||
|
||||
Admission control plug-ins may mutate the incoming object in some cases to apply system configured
|
||||
defaults. In addition, admission control plug-ins may mutate related resources as part of request
|
||||
processing to do things like increment quota usage.
|
||||
|
||||
## Why do I need them?
|
||||
|
||||
Many advanced features in Kubernetes require an admission control plug-in to be enabled in order
|
||||
to properly support the feature. As a result, a Kubernetes API server that is not properly
|
||||
configured with the right set of admission control plug-ins is an incomplete server and will not
|
||||
support all the features you expect.
|
||||
|
||||
## How do I turn on an admission control plug-in?
|
||||
|
||||
The Kubernetes API server supports a flag, ```admission_control``` that takes a comma-delimited,
|
||||
ordered list of admission control choices to invoke prior to modifying objects in the cluster.
|
||||
|
||||
## What does each plug-in do?
|
||||
|
||||
### AlwaysAdmit
|
||||
|
||||
Use this plugin by itself to pass-through all requests.
|
||||
|
||||
### AlwaysDeny
|
||||
|
||||
Rejects all requests. Used for testing.
|
||||
|
||||
### DenyExecOnPrivileged
|
||||
|
||||
This plug-in will intercept all requests to exec a command in a pod if that pod has a privileged container.
|
||||
|
||||
If your cluster supports privileged containers, and you want to restrict the ability of end-users to exec
|
||||
commands in those containers, we strongly encourage enabling this plug-in.
|
||||
|
||||
### ServiceAccount
|
||||
|
||||
This plug-in implements automation for [serviceAccounts](service_accounts.md).
|
||||
We strongly recommend using this plug-in if you intend to make use of Kubernetes ```ServiceAccount``` objects.
|
||||
|
||||
### SecurityContextDeny
|
||||
|
||||
This plug-in will deny any pod with a [SecurityContext](security_context.md) that defines options that were not available on the ```Container```.
|
||||
|
||||
### ResourceQuota
|
||||
|
||||
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
|
||||
enumerated in the ```ResourceQuota``` object in a ```Namespace```. If you are using ```ResourceQuota```
|
||||
objects in your Kubernetes deployment, you MUST use this plug-in to enforce quota constraints.
|
||||
|
||||
See the [resourceQuota design doc](design/admission_control_resource_quota.md).
|
||||
|
||||
It is strongly encouraged that this plug-in is configured last in the sequence of admission control plug-ins. This is
|
||||
so that quota is not prematurely incremented only for the request to be rejected later in admission control.
|
||||
|
||||
### LimitRanger
|
||||
|
||||
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
|
||||
enumerated in the ```LimitRange``` object in a ```Namespace```. If you are using ```LimitRange``` objects in
|
||||
your Kubernetes deployment, you MUST use this plug-in to enforce those constraints.
|
||||
|
||||
See the [limitRange design doc](design/admission_control_limit_range.md).
|
||||
|
||||
### NamespaceExists
|
||||
|
||||
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes ```Namespace```
|
||||
and reject the request if the ```Namespace``` was not previously created. We strongly recommend running
|
||||
this plug-in to ensure integrity of your data.
|
||||
|
||||
### NamespaceAutoProvision (deprecated)
|
||||
|
||||
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes ```Namespace```
|
||||
and create a new ```Namespace``` if one did not already exist previously.
|
||||
|
||||
We strongly recommend ```NamespaceExists``` over ```NamespaceAutoProvision```.
|
||||
|
||||
### NamespaceLifecycle
|
||||
|
||||
This plug-in enforces that a ```Namespace``` that is undergoing termination cannot have new content created in it.
|
||||
|
||||
A ```Namespace``` deletion kicks off a sequence of operations that remove all content (pods, services, etc.) in that
|
||||
namespace. In order to enforce integrity of that process, we strongly recommend running this plug-in.
|
||||
|
||||
Once ```NamespaceAutoProvision``` is deprecated, we anticipate ```NamespaceLifecycle``` and ```NamespaceExists``` will
|
||||
be merged into a single plug-in that enforces the life-cycle of a ```Namespace``` in Kubernetes.
|
||||
|
||||
## Is there a recommended set of plug-ins to use?
|
||||
|
||||
Yes.
|
||||
|
||||
For Kubernetes 1.0, we strongly recommend running the following set of admission control plug-ins (order matters):
|
||||
|
||||
```shell
|
||||
--admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
|
||||
```
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
59
docs/admin/authentication.md
Normal file
59
docs/admin/authentication.md
Normal file
@@ -0,0 +1,59 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Authentication Plugins
|
||||
|
||||
Kubernetes uses client certificates, tokens, or http basic auth to authenticate users for API calls.
|
||||
|
||||
Client certificate authentication is enabled by passing the `--client_ca_file=SOMEFILE`
|
||||
option to apiserver. The referenced file must contain one or more certificates authorities
|
||||
to use to validate client certificates presented to the apiserver. If a client certificate
|
||||
is presented and verified, the common name of the subject is used as the user name for the
|
||||
request.
|
||||
|
||||
Token authentication is enabled by passing the `--token_auth_file=SOMEFILE` option
|
||||
to apiserver. Currently, tokens last indefinitely, and the token list cannot
|
||||
be changed without restarting apiserver. We plan in the future for tokens to
|
||||
be short-lived, and to be generated as needed rather than stored in a file.
|
||||
|
||||
The token file format is implemented in `plugin/pkg/auth/authenticator/token/tokenfile/...`
|
||||
and is a csv file with 3 columns: token, user name, user uid.
|
||||
|
||||
When using token authentication from an http client the apiserver expects an `Authorization`
|
||||
header with a value of `Bearer SOMETOKEN`.
|
||||
|
||||
Basic authentication is enabled by passing the `--basic_auth_file=SOMEFILE`
|
||||
option to apiserver. Currently, the basic auth credentials last indefinitely,
|
||||
and the password cannot be changed without restarting apiserver. Note that basic
|
||||
authentication is currently supported for convenience while we finish making the
|
||||
more secure modes described above easier to use.
|
||||
|
||||
The basic auth file format is implemented in `plugin/pkg/auth/authenticator/password/passwordfile/...`
|
||||
and is a csv file with 3 columns: password, user name, user id.
|
||||
|
||||
When using basic authentication from an http client the apiserver expects an `Authorization` header
|
||||
with a value of `Basic BASE64ENCODEDUSER:PASSWORD`.
|
||||
|
||||
## Plugin Development
|
||||
|
||||
We plan for the Kubernetes API server to issue tokens
|
||||
after the user has been (re)authenticated by a *bedrock* authentication
|
||||
provider external to Kubernetes. We plan to make it easy to develop modules
|
||||
that interface between kubernetes and a bedrock authentication provider (e.g.
|
||||
github.com, google.com, enterprise directory, kerberos, etc.)
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
122
docs/admin/authorization.md
Normal file
122
docs/admin/authorization.md
Normal file
@@ -0,0 +1,122 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Authorization Plugins
|
||||
|
||||
|
||||
In Kubernetes, authorization happens as a separate step from authentication.
|
||||
See the [authentication documentation](authentication.md) for an
|
||||
overview of authentication.
|
||||
|
||||
Authorization applies to all HTTP accesses on the main apiserver port. (The
|
||||
readonly port is not currently subject to authorization, but is planned to be
|
||||
removed soon.)
|
||||
|
||||
The authorization check for any request compares attributes of the context of
|
||||
the request, (such as user, resource, and namespace) with access
|
||||
policies. An API call must be allowed by some policy in order to proceed.
|
||||
|
||||
The following implementations are available, and are selected by flag:
|
||||
- `--authorization_mode=AlwaysDeny`
|
||||
- `--authorization_mode=AlwaysAllow`
|
||||
- `--authorization_mode=ABAC`
|
||||
|
||||
`AlwaysDeny` blocks all requests (used in tests).
|
||||
`AlwaysAllow` allows all requests; use if you don't need authorization.
|
||||
`ABAC` allows for user-configured authorization policy. ABAC stands for Attribute-Based Access Control.
|
||||
|
||||
## ABAC Mode
|
||||
### Request Attributes
|
||||
|
||||
A request has 4 attributes that can be considered for authorization:
|
||||
- user (the user-string which a user was authenticated as).
|
||||
- whether the request is readonly (GETs are readonly)
|
||||
- what resource is being accessed
|
||||
- applies only to the API endpoints, such as
|
||||
`/api/v1/namespaces/default/pods`. For miscellaneous endpoints, like `/version`, the
|
||||
resource is the empty string.
|
||||
- the namespace of the object being access, or the empty string if the
|
||||
endpoint does not support namespaced objects.
|
||||
|
||||
We anticipate adding more attributes to allow finer grained access control and
|
||||
to assist in policy management.
|
||||
|
||||
### Policy File Format
|
||||
|
||||
For mode `ABAC`, also specify `--authorization_policy_file=SOME_FILENAME`.
|
||||
|
||||
The file format is [one JSON object per line](http://jsonlines.org/). There should be no enclosing list or map, just
|
||||
one map per line.
|
||||
|
||||
Each line is a "policy object". A policy object is a map with the following properties:
|
||||
- `user`, type string; the user-string from `--token_auth_file`
|
||||
- `readonly`, type boolean, when true, means that the policy only applies to GET
|
||||
operations.
|
||||
- `resource`, type string; a resource from an URL, such as `pods`.
|
||||
- `namespace`, type string; a namespace string.
|
||||
|
||||
An unset property is the same as a property set to the zero value for its type (e.g. empty string, 0, false).
|
||||
However, unset should be preferred for readability.
|
||||
|
||||
In the future, policies may be expressed in a JSON format, and managed via a REST
|
||||
interface.
|
||||
|
||||
### Authorization Algorithm
|
||||
|
||||
A request has attributes which correspond to the properties of a policy object.
|
||||
|
||||
When a request is received, the attributes are determined. Unknown attributes
|
||||
are set to the zero value of its type (e.g. empty string, 0, false).
|
||||
|
||||
An unset property will match any value of the corresponding
|
||||
attribute. An unset attribute will match any value of the corresponding property.
|
||||
|
||||
The tuple of attributes is checked for a match against every policy in the policy file.
|
||||
If at least one line matches the request attributes, then the request is authorized (but may fail later validation).
|
||||
|
||||
To permit any user to do something, write a policy with the user property unset.
|
||||
To permit an action Policy with an unset namespace applies regardless of namespace.
|
||||
|
||||
### Examples
|
||||
1. Alice can do anything: `{"user":"alice"}`
|
||||
2. Kubelet can read any pods: `{"user":"kubelet", "resource": "pods", "readonly": true}`
|
||||
3. Kubelet can read and write events: `{"user":"kubelet", "resource": "events"}`
|
||||
4. Bob can just read pods in namespace "projectCaribou": `{"user":"bob", "resource": "pods", "readonly": true, "ns": "projectCaribou"}`
|
||||
|
||||
[Complete file example](../pkg/auth/authorizer/abac/example_policy_file.jsonl)
|
||||
|
||||
## Plugin Development
|
||||
|
||||
Other implementations can be developed fairly easily.
|
||||
The APIserver calls the Authorizer interface:
|
||||
```go
|
||||
type Authorizer interface {
|
||||
Authorize(a Attributes) error
|
||||
}
|
||||
```
|
||||
to determine whether or not to allow each API action.
|
||||
|
||||
An authorization plugin is a module that implements this interface.
|
||||
Authorization plugin code goes in `pkg/auth/authorization/$MODULENAME`.
|
||||
|
||||
An authorization module can be completely implemented in go, or can call out
|
||||
to a remote authorization service. Authorization modules can implement
|
||||
their own caching to reduce the cost of repeated authorization calls with the
|
||||
same or similar arguments. Developers should then consider the interaction between
|
||||
caching and revocation of permissions.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
149
docs/admin/availability.md
Normal file
149
docs/admin/availability.md
Normal file
@@ -0,0 +1,149 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Availability
|
||||
|
||||
This document collects advice on reasoning about and provisioning for high-availability when using Kubernetes clusters.
|
||||
|
||||
## Failure modes
|
||||
|
||||
This is an incomplete list of things that could go wrong, and how to deal with them.
|
||||
|
||||
Root causes:
|
||||
- VM(s) shutdown
|
||||
- network partition within cluster, or between cluster and users.
|
||||
- crashes in Kubernetes software
|
||||
- data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume).
|
||||
- operator error misconfigures kubernetes software or application software.
|
||||
|
||||
Specific scenarios:
|
||||
- Apiserver VM shutdown or apiserver crashing
|
||||
- Results
|
||||
- unable to stop, update, or start new pods, services, replication controller
|
||||
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
|
||||
- Apiserver backing storage lost
|
||||
- Results
|
||||
- apiserver should fail to come up.
|
||||
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying.
|
||||
- manual recovery or recreation of apiserver state necessary before apiserver is restarted.
|
||||
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
|
||||
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
|
||||
- in future, these will be replicated as well and may not be co-located
|
||||
- they do not have own persistent state
|
||||
- Node (thing that runs kubelet and kube-proxy and pods) shutdown
|
||||
- Results
|
||||
- pods on that Node stop running
|
||||
- Kubelet software fault
|
||||
- Results
|
||||
- crashing kubelet cannot start new pods on the node
|
||||
- kubelet might delete the pods or not
|
||||
- node marked unhealthy
|
||||
- replication controllers start new pods elsewhere
|
||||
- Cluster operator error
|
||||
- Results:
|
||||
- loss of pods, services, etc
|
||||
- lost of apiserver backing store
|
||||
- users unable to read API
|
||||
- etc
|
||||
|
||||
Mitigations:
|
||||
- Action: Use IaaS providers automatic VM restarting feature for IaaS VMs.
|
||||
- Mitigates: Apiserver VM shutdown or apiserver crashing
|
||||
- Mitigates: Supporting services VM shutdown or crashes
|
||||
|
||||
- Action use IaaS providers reliable storage (e.g GCE PD or AWS EBS volume) for VMs with apiserver+etcd.
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
|
||||
- Action: Use Replicated APIserver feature (when complete: feature is planned but not implemented)
|
||||
- Mitigates: Apiserver VM shutdown or apiserver crashing
|
||||
- Will tolerate one or more simultaneous apiserver failures.
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
- Each apiserver has independent storage. Etcd will recover from loss of one member. Risk of total data loss greatly reduced.
|
||||
|
||||
- Action: Snapshot apiserver PDs/EBS-volumes periodically
|
||||
- Mitigates: Apiserver backing storage lost
|
||||
- Mitigates: Some cases of operator error
|
||||
- Mitigates: Some cases of kubernetes software fault
|
||||
|
||||
- Action: use replication controller and services in front of pods
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
|
||||
- Action: applications (containers) designed to tolerate unexpected restarts
|
||||
- Mitigates: Node shutdown
|
||||
- Mitigates: Kubelet software fault
|
||||
|
||||
- Action: Multiple independent clusters (and avoid making risky changes to all clusters at once)
|
||||
- Mitigates: Everything listed above.
|
||||
|
||||
## Choosing Multiple Kubernetes Clusters
|
||||
|
||||
You may want to set up multiple kubernetes clusters, both to
|
||||
have clusters in different regions to be nearer to your users; and to tolerate failures and/or invasive maintenance.
|
||||
|
||||
### Scope of a single cluster
|
||||
|
||||
On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
|
||||
[zone](https://cloud.google.com/compute/docs/zones) or [availability
|
||||
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
|
||||
We suggest that all the VMs in a Kubernetes cluster should be in the same availability zone, because:
|
||||
- compared to having a single global Kubernetes cluster, there are fewer single-points of failure
|
||||
- compared to a cluster that spans availability zones, it is easier to reason about the availability properties of a
|
||||
single-zone cluster.
|
||||
- when the Kubernetes developers are designing the system (e.g. making assumptions about latency, bandwidth, or
|
||||
correlated failures) they are assuming all the machines are in a single data center, or otherwise closely connected.
|
||||
|
||||
It is okay to have multiple clusters per availability zone, though on balance we think fewer is better.
|
||||
Reasons to prefer fewer clusters are:
|
||||
- improved bin packing of Pods in some cases with more nodes in one cluster.
|
||||
- reduced operational overhead (though the advantage is diminished as ops tooling and processes matures).
|
||||
- reduced costs for per-cluster fixed resource costs, e.g. apiserver VMs (but small as a percentage
|
||||
of overall cluster cost for medium to large clusters).
|
||||
|
||||
Reasons to have multiple clusters include:
|
||||
- strict security policies requiring isolation of one class of work from another (but, see Partitioning Clusters
|
||||
below).
|
||||
- test clusters to canary new Kubernetes releases or other cluster software.
|
||||
|
||||
### Selecting the right number of clusters
|
||||
The selection of the number of kubernetes clusters may be a relatively static choice, only revisited occasionally.
|
||||
By contrast, the number of nodes in a cluster and the number of pods in a service may be change frequently according to
|
||||
load and growth.
|
||||
|
||||
To pick the number of clusters, first, decide which regions you need to be in to have adequate latency to all your end users, for services that will run
|
||||
on Kubernetes (if you use a Content Distribution Network, the latency requirements for the CDN-hosted content need not
|
||||
be considered). Legal issues might influence this as well. For example, a company with a global customer base might decide to have clusters in US, EU, AP, and SA regions.
|
||||
Call the number of regions to be in `R`.
|
||||
|
||||
Second, decide how many clusters should be able to be unavailable at the same time, while still being available. Call
|
||||
the number that can be unavailable `U`. If you are not sure, then 1 is a fine choice.
|
||||
|
||||
If it is allowable for load-balancing to direct traffic to any region in the event of a cluster failure, then
|
||||
you need `R + U` clusters. If it is not (e.g you want to ensure low latency for all users in the event of a
|
||||
cluster failure), then you need to have `R * U` clusters (`U` in each of `R` regions). In any case, try to put each cluster in a different zone.
|
||||
|
||||
Finally, if any of your clusters would need more than the maximum recommended number of nodes for a Kubernetes cluster, then
|
||||
you may need even more clusters. Our [roadmap](roadmap.md)
|
||||
calls for maximum 100 node clusters at v1.0 and maximum 1000 node clusters in the middle of 2015.
|
||||
|
||||
## Working with multiple clusters
|
||||
|
||||
When you have multiple clusters, you would typically create services with the same config in each cluster and put each of those
|
||||
service instances behind a load balancer (AWS Elastic Load Balancer, GCE Forwarding Rule or HTTP Load Balancer), so that
|
||||
failures of a single cluster are not visible to end users.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
78
docs/admin/cluster-management.md
Normal file
78
docs/admin/cluster-management.md
Normal file
@@ -0,0 +1,78 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Cluster Management
|
||||
|
||||
This doc is in progress.
|
||||
|
||||
## Upgrading a cluster
|
||||
|
||||
The `cluster/kube-push.sh` script will do a rudimentary update; it is a 1.0 roadmap item to have a robust live cluster update system.
|
||||
|
||||
## Updgrading to a different API version
|
||||
|
||||
There is a sequence of steps to upgrade to a new API version.
|
||||
|
||||
1. Turn on the new api version
|
||||
2. Upgrade the cluster's storage to use the new version.
|
||||
3. Upgrade all config files. Identify users of the old api version endpoints.
|
||||
4. Update existing objects in the storage to new version by running cluster/update-storage-objects.sh
|
||||
3. Turn off the old version.
|
||||
|
||||
### Turn on or off an API version for your cluster
|
||||
|
||||
Specific API versions can be turned on or off by passing --runtime-config=api/<version> flag while bringing up the server. For example: to turn off v1 API, pass --runtime-config=api/v1=false.
|
||||
runtime-config also supports 2 special keys: api/all and api/legacy to control all and legacy APIs respectively. For example, for turning off all api versions except v1, pass --runtime-config=api/all=false,api/v1=true.
|
||||
|
||||
### Switching your cluster's storage API version
|
||||
|
||||
KUBE_API_VERSIONS env var controls the API versions that are supported in the cluster. The first version in the list is used as the cluster's storage version. Hence, to set a specific version as the storage version, bring it to the front of list of versions in the value of KUBE_API_VERSIONS.
|
||||
|
||||
### Switching your config files to a new API version
|
||||
|
||||
You can use the kube-version-change utility to convert config files between different API versions.
|
||||
|
||||
```
|
||||
$ hack/build-go.sh cmd/kube-version-change
|
||||
$ _output/local/go/bin/kube-version-change -i myPod.v1beta3.yaml -o myPod.v1.yaml
|
||||
```
|
||||
|
||||
### Maintenance on a Node
|
||||
|
||||
If you need to reboot a node (such as for a kernel upgrade, libc upgrade, hardware repair, etc.), and the downtime is
|
||||
brief, then when the Kubelet restarts, it will attempt to restart the pods scheduled to it. If the reboot takes longer,
|
||||
then the node controller will terminate the pods that are bound to the unavailable node. If there is a corresponding
|
||||
replication controller, then a new copy of the pod will be started on a different node. So, in the case where all
|
||||
pods are replicated, upgrades can be done without special coordination.
|
||||
|
||||
If you want more control over the upgrading process, you may use the following workflow:
|
||||
1. Mark the node to be rebooted as unschedulable:
|
||||
`kubectl replace nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": true}}'`.
|
||||
This keeps new pods from landing on the node while you are trying to get them off.
|
||||
1. Get the pods off the machine, via any of the following strategies:
|
||||
1. wait for finite-duration pods to complete
|
||||
1. delete pods with `kubectl delete pods $PODNAME`
|
||||
1. for pods with a replication controller, the pod will eventually be replaced by a new pod which will be scheduled to a new node. additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.
|
||||
1. for pods with no replication controller, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.
|
||||
1. Work on the node
|
||||
1. Make the node schedulable again:
|
||||
`kubectl replace nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": false}}'`.
|
||||
If you deleted the node's VM instance and created a new one, then a new schedulable node resource will
|
||||
be created automatically when you create a new VM instance (if you're using a cloud provider that supports
|
||||
node discovery; currently this is only Google Compute Engine, not including CoreOS on Google Compute Engine using kube-register). See [Node](node.md).
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
57
docs/admin/dns.md
Normal file
57
docs/admin/dns.md
Normal file
@@ -0,0 +1,57 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# DNS Integration with Kubernetes
|
||||
|
||||
As of kubernetes 0.8, DNS is offered as a [cluster add-on](../cluster/addons/README.md).
|
||||
If enabled, a DNS Pod and Service will be scheduled on the cluster, and the kubelets will be
|
||||
configured to tell individual containers to use the DNS Service's IP.
|
||||
|
||||
Every Service defined in the cluster (including the DNS server itself) will be
|
||||
assigned a DNS name. By default, a client Pod's DNS search list will
|
||||
include the Pod's own namespace and the cluster's default domain. This is best
|
||||
illustrated by example:
|
||||
|
||||
Assume a Service named `foo` in the kubernetes namespace `bar`. A Pod running
|
||||
in namespace `bar` can look up this service by simply doing a DNS query for
|
||||
`foo`. A Pod running in namespace `quux` can look up this service by doing a
|
||||
DNS query for `foo.bar`.
|
||||
|
||||
The cluster DNS server ([SkyDNS](https://github.com/skynetservices/skydns))
|
||||
supports forward lookups (A records) and service lookups (SRV records).
|
||||
|
||||
## How it Works
|
||||
|
||||
The running DNS pod holds 3 containers - skydns, etcd (which skydns uses),
|
||||
and a kubernetes-to-skydns bridge called kube2sky. The kube2sky process
|
||||
watches the kubernetes master for changes in Services, and then writes the
|
||||
information to etcd, which skydns reads. This etcd instance is not linked to
|
||||
any other etcd clusters that might exist, including the kubernetes master.
|
||||
|
||||
## Issues
|
||||
|
||||
The skydns service is reachable directly from kubernetes nodes (outside
|
||||
of any container) and DNS resolution works if the skydns service is targeted
|
||||
explicitly. However, nodes are not configured to use the cluster DNS service or
|
||||
to search the cluster's DNS domain by default. This may be resolved at a later
|
||||
time.
|
||||
|
||||
## For more information
|
||||
|
||||
See [the docs for the DNS cluster addon](../cluster/addons/dns/README.md).
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
15
docs/admin/namespaces.md
Normal file
15
docs/admin/namespaces.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Namespaces
|
||||
|
||||
Namespaces help different projects, teams, or customers to share a kubernetes cluster. First, they provide a scope for [Names](../identifiers.md). Second, as our access control code develops, it is expected that it will be convenient to attach authorization and other policy to namespaces.
|
||||
|
||||
Use of multiple namespaces is optional. For small teams, they may not be needed.
|
||||
|
||||
This is a placeholder document about namespace administration.
|
||||
|
||||
TODO: document namespace creation, ownership assignment, visibility rules,
|
||||
policy creation, interaction with network.
|
||||
|
||||
Namespaces are still under development. For now, the best documentation is the [Namespaces Design Document](../design/namespaces.md). The user documentation can be found at [Namespaces](../../docs/namespaces.md)
|
||||
|
||||
|
||||
[]()
|
||||
212
docs/admin/networking.md
Normal file
212
docs/admin/networking.md
Normal file
@@ -0,0 +1,212 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Networking in Kubernetes
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
- [Networking in Kubernetes](#networking-in-kubernetes)
|
||||
- [Summary](#summary)
|
||||
- [Docker model](#docker-model)
|
||||
- [Kubernetes model](#kubernetes-model)
|
||||
- [How to achieve this](#how-to-achieve-this)
|
||||
- [Google Compute Engine (GCE)](#google-compute-engine-(gce))
|
||||
- [L2 networks and linux bridging](#l2-networks-and-linux-bridging)
|
||||
- [Flannel](#flannel)
|
||||
- [OpenVSwitch](#openvswitch)
|
||||
- [Weave](#weave)
|
||||
- [Calico](#calico)
|
||||
- [Other reading](#other-reading)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
Kubernetes approaches networking somewhat differently than Docker does by
|
||||
default. There are 4 distinct networking problems to solve:
|
||||
1. Highly-coupled container-to-container communications: this is solved by
|
||||
[pods](pods.md) and `localhost` communications.
|
||||
2. Pod-to-Pod communications: this is the primary focus of this document.
|
||||
3. Pod-to-Service communications: this is covered by [services](services.md).
|
||||
4. External-to-Service communications: this is covered by [services](services.md).
|
||||
|
||||
## Summary
|
||||
|
||||
Kubernetes assumes that pods can communicate with other pods, regardless of
|
||||
which host they land on. We give every pod its own IP address so you do not
|
||||
need to explicitly create links between pods and you almost never need to deal
|
||||
with mapping container ports to host ports. This creates a clean,
|
||||
backwards-compatible model where pods can be treated much like VMs or physical
|
||||
hosts from the perspectives of port allocation, naming, service discovery, load
|
||||
balancing, application configuration, and migration.
|
||||
|
||||
To achieve this we must impose some requirements on how you set up your cluster
|
||||
networking.
|
||||
|
||||
## Docker model
|
||||
|
||||
Before discussing the Kubernetes approach to networking, it is worthwhile to
|
||||
review the "normal" way that networking works with Docker. By default, Docker
|
||||
uses host-private networking. It creates a virtual bridge, called `docker0` by
|
||||
default, and allocates a subnet from one of the private address blocks defined
|
||||
in [RFC1918](https://tools.ietf.org/html/rfc1918) for that bridge. For each
|
||||
container that Docker creates, it allocates a virtual ethernet device (called
|
||||
`veth`) which is attached to the bridge. The veth is mapped to appear as eth0
|
||||
in the container, using Linux namespaces. The in-container eth0 interface is
|
||||
given an IP address from the bridge's address range.
|
||||
|
||||
The result is that Docker containers can talk to other containers only if they
|
||||
are on the same machine (and thus the same virtual bridge). Containers on
|
||||
different machines can not reach each other - in fact they may end up with the
|
||||
exact same network ranges and IP addresses.
|
||||
|
||||
In order for Docker containers to communicate across nodes, they must be
|
||||
allocated ports on the machine's own IP address, which are then forwarded or
|
||||
proxied to the containers. This obviously means that containers must either
|
||||
coordinate which ports they use very carefully or else be allocated ports
|
||||
dynamically.
|
||||
|
||||
## Kubernetes model
|
||||
|
||||
Coordinating ports across multiple developers is very difficult to do at
|
||||
scale and exposes users to cluster-level issues outside of their control.
|
||||
Dynamic port allocation brings a lot of complications to the system - every
|
||||
application has to take ports as flags, the API servers have to know how to
|
||||
insert dynamic port numbers into configuration blocks, services have to know
|
||||
how to find each other, etc. Rather than deal with this, Kubernetes takes a
|
||||
different approach.
|
||||
|
||||
Kubernetes imposes the following fundamental requirements on any networking
|
||||
implementation (barring any intentional network segmentation policies):
|
||||
* all containers can communicate with all other containers without NAT
|
||||
* all nodes can communicate with all containers (and vice-versa) without NAT
|
||||
* the IP that a container sees itself as is the same IP that others see it as
|
||||
|
||||
What this means in practice is that you can not just take two computers
|
||||
running Docker and expect Kubernetes to work. You must ensure that the
|
||||
fundamental requirements are met.
|
||||
|
||||
This model is not only less complex overall, but it is principally compatible
|
||||
with the desire for Kubernetes to enable low-friction porting of apps from VMs
|
||||
to containers. If your job previously ran in a VM, your VM had an IP and could
|
||||
talk to other VMs in your project. This is the same basic model.
|
||||
|
||||
Until now this document has talked about containers. In reality, Kubernetes
|
||||
applies IP addresses at the `Pod` scope - containers within a `Pod` share their
|
||||
network namespaces - including their IP address. This means that containers
|
||||
within a `Pod` can all reach each other’s ports on `localhost`. This does imply
|
||||
that containers within a `Pod` must coordinate port usage, but this is no
|
||||
different that processes in a VM. We call this the "IP-per-pod" model. This
|
||||
is implemented in Docker as a "pod container" which holds the network namespace
|
||||
open while "app containers" (the things the user specified) join that namespace
|
||||
with Docker's `--net=container:<id>` function.
|
||||
|
||||
As with Docker, it is possible to request host ports, but this is reduced to a
|
||||
very niche operation. In this case a port will be allocated on the host `Node`
|
||||
and traffic will be forwarded to the `Pod`. The `Pod` itself is blind to the
|
||||
existence or non-existence of host ports.
|
||||
|
||||
## How to achieve this
|
||||
|
||||
There are a number of ways that this network model can be implemented. This
|
||||
document is not an exhaustive study of the various methods, but hopefully serves
|
||||
as an introduction to various technologies and serves as a jumping-off point.
|
||||
If some techniques become vastly preferable to others, we might detail them more
|
||||
here.
|
||||
|
||||
### Google Compute Engine (GCE)
|
||||
|
||||
For the Google Compute Engine cluster configuration scripts, we use [advanced
|
||||
routing](https://developers.google.com/compute/docs/networking#routing) to
|
||||
assign each VM a subnet (default is /24 - 254 IPs). Any traffic bound for that
|
||||
subnet will be routed directly to the VM by the GCE network fabric. This is in
|
||||
addition to the "main" IP address assigned to the VM, which is NAT'ed for
|
||||
outbound internet access. A linux bridge (called `cbr0`) is configured to exist
|
||||
on that subnet, and is passed to docker's `--bridge` flag.
|
||||
|
||||
We start Docker with:
|
||||
```
|
||||
DOCKER_OPTS="--bridge=cbr0 --iptables=false --ip-masq=false"
|
||||
```
|
||||
|
||||
This bridge is created by Kubelet (controlled by the `--configure-cbr0=true`
|
||||
flag) according to the `Node`'s `spec.podCIDR`.
|
||||
|
||||
Docker will now allocate IPs from the `cbr-cidr` block. Containers can reach
|
||||
each other and `Nodes` over the `cbr0` bridge. Those IPs are all routable
|
||||
within the GCE project network.
|
||||
|
||||
GCE itself does not know anything about these IPs, though, so it will not NAT
|
||||
them for outbound internet traffic. To achieve that we use an iptables rule to
|
||||
masquerade (aka SNAT - to make it seem as if packets came from the `Node`
|
||||
itself) traffic that is bound for IPs outside the GCE project network
|
||||
(10.0.0.0/8).
|
||||
|
||||
```
|
||||
iptables -t nat -A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE
|
||||
```
|
||||
|
||||
Lastly we enable IP forwarding in the kernel (so the kernel will process
|
||||
packets for bridged containers):
|
||||
|
||||
```
|
||||
sysctl net.ipv4.ip_forward=1
|
||||
```
|
||||
|
||||
The result of all this is that all `Pods` can reach each other and can egress
|
||||
traffic to the internet.
|
||||
|
||||
### L2 networks and linux bridging
|
||||
|
||||
If you have a "dumb" L2 network, such as a simple switch in a "bare-metal"
|
||||
environment, you should be able to do something similar to the above GCE setup.
|
||||
Note that these instructions have only been tried very casually - it seems to
|
||||
work, but has not been thoroughly tested. If you use this technique and
|
||||
perfect the process, please let us know.
|
||||
|
||||
Follow the "With Linux Bridge devices" section of [this very nice
|
||||
tutorial](http://blog.oddbit.com/2014/08/11/four-ways-to-connect-a-docker/) from
|
||||
Lars Kellogg-Stedman.
|
||||
|
||||
### Flannel
|
||||
|
||||
[Flannel](https://github.com/coreos/flannel#flannel) is a very simple overlay
|
||||
network that satisfies the Kubernetes requirements. It installs in minutes and
|
||||
should get you up and running if the above techniques are not working. Many
|
||||
people have reported success with Flannel and Kubernetes.
|
||||
|
||||
### OpenVSwitch
|
||||
|
||||
[OpenVSwitch](ovs-networking.md) is a somewhat more mature but also
|
||||
complicated way to build an overlay network. This is endorsed by several of the
|
||||
"Big Shops" for networking.
|
||||
|
||||
### Weave
|
||||
|
||||
[Weave](https://github.com/zettio/weave) is yet another way to build an overlay
|
||||
network, primarily aiming at Docker integration.
|
||||
|
||||
### Calico
|
||||
|
||||
[Calico](https://github.com/Metaswitch/calico) uses BGP to enable real container
|
||||
IPs.
|
||||
|
||||
## Other reading
|
||||
|
||||
The early design of the networking model and its rationale, and some future
|
||||
plans are described in more detail in the [networking design
|
||||
document](design/networking.md).
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
220
docs/admin/node.md
Normal file
220
docs/admin/node.md
Normal file
@@ -0,0 +1,220 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Node
|
||||
|
||||
**Table of Contents**
|
||||
<!-- BEGIN MUNGE: GENERATED_TOC -->
|
||||
- [Node](#node)
|
||||
- [What is a node?](#what-is-a-node?)
|
||||
- [Node Status](#node-status)
|
||||
- [Node Addresses](#node-addresses)
|
||||
- [Node Phase](#node-phase)
|
||||
- [Node Condition](#node-condition)
|
||||
- [Node Capacity](#node-capacity)
|
||||
- [Node Info](#node-info)
|
||||
- [Node Management](#node-management)
|
||||
- [Node Controller](#node-controller)
|
||||
- [Self-Registration of nodes](#self-registration-of-nodes)
|
||||
- [Manual Node Administration](#manual-node-administration)
|
||||
- [Node capacity](#node-capacity)
|
||||
|
||||
<!-- END MUNGE: GENERATED_TOC -->
|
||||
|
||||
## What is a node?
|
||||
|
||||
`Node` is a worker machine in Kubernetes, previously known as `Minion`. Node
|
||||
may be a VM or physical machine, depending on the cluster. Each node has
|
||||
the services necessary to run [Pods](pods.md) and be managed from the master
|
||||
systems. The services include docker, kubelet and network proxy. See
|
||||
[The Kubernetes Node](design/architecture.md#the-kubernetes-node) section in design
|
||||
doc for more details.
|
||||
|
||||
## Node Status
|
||||
|
||||
Node status describes current status of a node. For now, there are the following
|
||||
pieces of information:
|
||||
|
||||
### Node Addresses
|
||||
|
||||
<!--- TODO: this section is outdated. There is no HostIP field in the API,
|
||||
but there are addresses of type InternalIP and ExternalIP -->
|
||||
Host IP address is queried from cloudprovider and stored as part of node
|
||||
status. If kubernetes runs without cloudprovider, node's ID will be used.
|
||||
IP address can change, and there are different kind of IPs, e.g. public
|
||||
IP, private IP, dynamic IP, ipv6, etc. It makes more sense to save it as
|
||||
a status rather than spec.
|
||||
|
||||
### Node Phase
|
||||
|
||||
Node Phase is the current lifecycle phase of node, one of `Pending`,
|
||||
`Running` and `Terminated`. Node Phase management is under development,
|
||||
here is a brief overview: In kubernetes, node will be created in `Pending`
|
||||
phase, until it is discovered and checked in by kubernetes, at which time,
|
||||
kubernetes will mark it as `Running`. The end of a node's lifecycle is
|
||||
`Terminated`. A terminated node will not receive any scheduling request,
|
||||
and any running pods will be removed from the node.
|
||||
|
||||
Node with `Running` phase is necessary but not sufficient requirement for
|
||||
scheduling Pods. For a node to be considered a scheduling candidate, it
|
||||
must have appropriate conditions, see below.
|
||||
|
||||
### Node Condition
|
||||
|
||||
Node Condition describes the conditions of `Running` nodes. (However,
|
||||
it can be present also when node status is different, e.g. `Unknown`)
|
||||
Current valid condition is `Ready`. In the future, we plan to add more.
|
||||
`Ready` means kubelet is healthy and ready to accept pods. Different
|
||||
condition provides different level of understanding for node health.
|
||||
Node condition is represented as a json object. For example,
|
||||
the following conditions mean the node is in sane state:
|
||||
```json
|
||||
"conditions": [
|
||||
{
|
||||
"kind": "Ready",
|
||||
"status": "True",
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
### Node Capacity
|
||||
|
||||
Describes the resources available on the node: CPUs, memory and the maximum
|
||||
number of pods that can be scheduled on this node.
|
||||
|
||||
### Node Info
|
||||
|
||||
General information about the node, for instance kernel version, kubernetes version
|
||||
(kubelet version, kube-proxy version), docker version (if used), OS name.
|
||||
The information is gathered by Kubernetes from the node.
|
||||
|
||||
## Node Management
|
||||
|
||||
Unlike [Pods](pods.md) and [Services](services.md), a Node is not inherently
|
||||
created by Kubernetes: it is either created from cloud providers like Google Compute Engine,
|
||||
or from your physical or virtual machines. What this means is that when
|
||||
Kubernetes creates a node, it only creates a representation for the node.
|
||||
After creation, Kubernetes will check whether the node is valid or not.
|
||||
For example, if you try to create a node from the following content:
|
||||
```json
|
||||
{
|
||||
"kind": "Node",
|
||||
"apiVersion": "v1",
|
||||
"metadata": {
|
||||
"name": "10.240.79.157",
|
||||
"labels": {
|
||||
"name": "my-first-k8s-node"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Kubernetes will create a Node object internally (the representation), and
|
||||
validate the node by health checking based on the `metadata.name` field: we
|
||||
assume `metadata.name` can be resolved. If the node is valid, i.e. all necessary
|
||||
services are running, it is eligible to run a Pod; otherwise, it will be
|
||||
ignored for any cluster activity, until it becomes valid. Note that Kubernetes
|
||||
will keep invalid node unless explicitly deleted by client, and it will keep
|
||||
checking to see if it becomes valid.
|
||||
|
||||
Currently, there are two agents that interacts with Kubernetes node interface:
|
||||
Node Controller and Kube Admin.
|
||||
|
||||
### Node Controller
|
||||
|
||||
Node controller is a component in Kubernetes master which manages Node
|
||||
objects. It performs two major functions: cluster-wide node synchronization
|
||||
and single node life-cycle management.
|
||||
|
||||
Node controller has a sync loop that creates/deletes Nodes from Kubernetes
|
||||
based on all matching VM instances listed from cloud provider. The sync period
|
||||
can be controlled via flag `--node-sync-period`. If a new instance
|
||||
gets created, Node Controller creates a representation for it. If an existing
|
||||
instance gets deleted, Node Controller deletes the representation. Note however,
|
||||
Node Controller is unable to provision the node for you, i.e. it won't install
|
||||
any binary; therefore, to
|
||||
join Kubernetes cluster, you as an admin need to make sure proper services are
|
||||
running in the node. In the future, we plan to automatically provision some node
|
||||
services.
|
||||
|
||||
### Self-Registration of nodes
|
||||
|
||||
When kubelet flag `--register-node` is true (the default), then the kubelet will attempt to
|
||||
register itself with the API server. This is the preferred pattern, used by most distros.
|
||||
|
||||
For self-registration, the kubelet is started with the following options:
|
||||
- `--api-servers=` tells the kubelet the location of the apiserver.
|
||||
- `--kubeconfig` tells kubelet where to find credentials to authenticate itself to the apiserver.
|
||||
- `--cloud-provider=` tells the kubelet how to talk to a cloud provider to read metadata about itself.
|
||||
- `--register-node` tells the kubelet to create its own node resource.
|
||||
|
||||
Currently, any kubelet is authorized to create/modify any node resource, but in practice it only creates/modifies
|
||||
its own. (In the future, we plan to limit authorization to only allow a kubelet to modify its own Node resource.)
|
||||
|
||||
#### Manual Node Administration
|
||||
|
||||
A cluster administrator can create and modify Node objects.
|
||||
|
||||
If the administrator wishes to create node objects manually, set kubelet flag
|
||||
`--register-node=false`.
|
||||
|
||||
The administrator can modify Node resources (regardless of the setting of `--register-node`).
|
||||
Modifications include setting labels on the Node, and marking it unschedulable.
|
||||
|
||||
Labels on nodes can be used in conjunction with node selectors on pods to control scheduling.
|
||||
|
||||
Making a node unscheduleable will prevent new pods from being scheduled to that
|
||||
node, but will not affect any existing pods on the node. This is useful as a
|
||||
preparatory step before a node reboot, etc. For example, to mark a node
|
||||
unschedulable, run this command:
|
||||
```
|
||||
kubectl replace nodes 10.1.2.3 --patch='{"apiVersion": "v1", "unschedulable": true}'
|
||||
```
|
||||
|
||||
### Node capacity
|
||||
|
||||
The capacity of the node (number of cpus and amount of memory) is part of the node resource.
|
||||
Normally, nodes register themselves and report their capacity when creating the node resource. If
|
||||
you are doing [manual node administration](#manual-node-administration), then you need to set node
|
||||
capacity when adding a node.
|
||||
|
||||
The kubernetes scheduler ensures that there are enough resources for all the pods on a node. It
|
||||
checks that the sum of the limits of containers on the node is less than the node capacity. It
|
||||
includes all containers started by kubelet, but not containers started directly by docker, nor
|
||||
processes not in containers.
|
||||
|
||||
If you want to explicitly reserve resources for non-Pod processes, you can create a placeholder
|
||||
pod. Use the following template:
|
||||
```
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: resource-reserver
|
||||
spec:
|
||||
containers:
|
||||
- name: sleep-forever
|
||||
image: gcr.io/google_containers/pause:0.8.0
|
||||
resources:
|
||||
limits:
|
||||
cpu: 100m
|
||||
memory: 100Mi
|
||||
```
|
||||
Set the `cpu` and `memory` values to the amount of resources you want to reserve.
|
||||
Place the file in the manifest directory (`--config=DIR` flag of kubelet). Do this
|
||||
on each kubelet where you want to reserve resources.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
33
docs/admin/ovs-networking.md
Normal file
33
docs/admin/ovs-networking.md
Normal file
@@ -0,0 +1,33 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Kubernetes OpenVSwitch GRE/VxLAN networking
|
||||
|
||||
This document describes how OpenVSwitch is used to setup networking between pods across nodes.
|
||||
The tunnel type could be GRE or VxLAN. VxLAN is preferable when large scale isolation needs to be performed within the network.
|
||||
|
||||

|
||||
|
||||
The vagrant setup in Kubernetes does the following:
|
||||
|
||||
The docker bridge is replaced with a brctl generated linux bridge (kbr0) with a 256 address space subnet. Basically, a node gets 10.244.x.0/24 subnet and docker is configured to use that bridge instead of the default docker0 bridge.
|
||||
|
||||
Also, an OVS bridge is created(obr0) and added as a port to the kbr0 bridge. All OVS bridges across all nodes are linked with GRE tunnels. So, each node has an outgoing GRE tunnel to all other nodes. It does not need to be a complete mesh really, just meshier the better. STP (spanning tree) mode is enabled in the bridges to prevent loops.
|
||||
|
||||
Routing rules enable any 10.244.0.0/16 target to become reachable via the OVS bridge connected with the tunnels.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
BIN
docs/admin/ovs-networking.png
Normal file
BIN
docs/admin/ovs-networking.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 70 KiB |
120
docs/admin/resource-quota.md
Normal file
120
docs/admin/resource-quota.md
Normal file
@@ -0,0 +1,120 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Administering Resource Quotas
|
||||
|
||||
Kubernetes can limit both the number of objects created in a namespace, and the
|
||||
total amount of resources requested by pods in a namespace. This facilitates
|
||||
sharing of a single Kubernetes cluster by several teams or tenants, each in
|
||||
a namespace.
|
||||
|
||||
## Enabling Resource Quota
|
||||
|
||||
Resource Quota support is enabled by default for many kubernetes distributions. It is
|
||||
enabled when the apiserver `--admission_control=` flag has `ResourceQuota` as
|
||||
one of its arguments.
|
||||
|
||||
Resource Quota is enforced in a particular namespace when there is a
|
||||
`ResourceQuota` object in that namespace. There should be at most one
|
||||
`ResourceQuota` object in a namespace.
|
||||
|
||||
## Object Count Quota
|
||||
The number of objects of a given type can be restricted. The following types
|
||||
are supported:
|
||||
|
||||
| ResourceName | Description |
|
||||
| ------------ | ----------- |
|
||||
| pods | Total number of pods |
|
||||
| services | Total number of services |
|
||||
| replicationcontrollers | Total number of replication controllers |
|
||||
| resourcequotas | Total number of resource quotas |
|
||||
| secrets | Total number of secrets |
|
||||
| persistentvolumeclaims | Total number of persistent volume claims |
|
||||
|
||||
For example, `pods` quota counts and enforces a maximum on the number of `pods`
|
||||
created in a single namespace.
|
||||
|
||||
## Compute Resource Quota
|
||||
The total number of objects of a given type can be restricted. The following types
|
||||
are supported:
|
||||
|
||||
| ResourceName | Description |
|
||||
| ------------ | ----------- |
|
||||
| cpu | Total cpu limits of containers |
|
||||
| memory | Total memory usage limits of containers
|
||||
| `example.com/customresource` | Total of `resources.limits."example.com/customresource"` of containers |
|
||||
|
||||
For example, `cpu` quota sums up the `resources.limits.cpu` fields of every
|
||||
container of every pod in the namespace, and enforces a maximum on that sum.
|
||||
|
||||
Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes.
|
||||
|
||||
This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
|
||||
|
||||
## Viewing and Setting Quotas
|
||||
Kubectl supports creating, updating, and viewing quotas
|
||||
```
|
||||
$ kubectl namespace myspace
|
||||
$ cat <<EOF > quota.json
|
||||
{
|
||||
"apiVersion": "v1",
|
||||
"kind": "ResourceQuota",
|
||||
"metadata": {
|
||||
"name": "quota",
|
||||
},
|
||||
"spec": {
|
||||
"hard": {
|
||||
"memory": "1Gi",
|
||||
"cpu": "20",
|
||||
"pods": "10",
|
||||
"services": "5",
|
||||
"replicationcontrollers":"20",
|
||||
"resourcequotas":"1",
|
||||
},
|
||||
}
|
||||
}
|
||||
EOF
|
||||
$ kubectl create -f quota.json
|
||||
$ kubectl get quota
|
||||
NAME
|
||||
quota
|
||||
$ kubectl describe quota quota
|
||||
Name: quota
|
||||
Resource Used Hard
|
||||
-------- ---- ----
|
||||
cpu 0m 20
|
||||
memory 0 1Gi
|
||||
pods 5 10
|
||||
replicationcontrollers 5 20
|
||||
resourcequotas 1 1
|
||||
services 3 5
|
||||
```
|
||||
|
||||
## Quota and Cluster Capacity
|
||||
Resource Quota objects are independent of the Cluster Capacity. They are
|
||||
expressed in absolute units.
|
||||
|
||||
Sometimes more complex policies may be desired, such as:
|
||||
- proportionally divide total cluster resources among several teams.
|
||||
- allow each tenant to grow resource usage as needed, but have a generous
|
||||
limit to prevent accidental resource exhaustion.
|
||||
|
||||
Such policies could be implemented using ResourceQuota as a building-block, by
|
||||
writing a 'controller' which watches the quota usage and adjusts the quota
|
||||
hard limits of each namespace.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
117
docs/admin/salt.md
Normal file
117
docs/admin/salt.md
Normal file
@@ -0,0 +1,117 @@
|
||||
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
|
||||
|
||||
<!-- BEGIN STRIP_FOR_RELEASE -->
|
||||
|
||||
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
|
||||
tree only. If you are using a released version of Kubernetes, you almost
|
||||
certainly want the docs that go with that version.</h1>
|
||||
|
||||
<strong>Documentation for specific releases can be found at
|
||||
[releases.k8s.io](http://releases.k8s.io).</strong>
|
||||
|
||||
<!-- END STRIP_FOR_RELEASE -->
|
||||
|
||||
<!-- END MUNGE: UNVERSIONED_WARNING -->
|
||||
# Using Salt to configure Kubernetes
|
||||
|
||||
The Kubernetes cluster can be configured using Salt.
|
||||
|
||||
The Salt scripts are shared across multiple hosting providers, so it's important to understand some background information prior to making a modification to ensure your changes do not break hosting Kubernetes across multiple environments. Depending on where you host your Kubernetes cluster, you may be using different operating systems and different networking configurations. As a result, it's important to understand some background information before making Salt changes in order to minimize introducing failures for other hosting providers.
|
||||
|
||||
## Salt cluster setup
|
||||
|
||||
The **salt-master** service runs on the kubernetes-master node [(except on the default GCE setup)](#standalone-salt-configuration-on-gce).
|
||||
|
||||
The **salt-minion** service runs on the kubernetes-master node and each kubernetes-minion node in the cluster.
|
||||
|
||||
Each salt-minion service is configured to interact with the **salt-master** service hosted on the kubernetes-master via the **master.conf** file [(except on GCE)](#standalone-salt-configuration-on-gce).
|
||||
|
||||
```
|
||||
[root@kubernetes-master] $ cat /etc/salt/minion.d/master.conf
|
||||
master: kubernetes-master
|
||||
```
|
||||
The salt-master is contacted by each salt-minion and depending upon the machine information presented, the salt-master will provision the machine as either a kubernetes-master or kubernetes-minion with all the required capabilities needed to run Kubernetes.
|
||||
|
||||
If you are running the Vagrant based environment, the **salt-api** service is running on the kubernetes-master. It is configured to enable the vagrant user to introspect the salt cluster in order to find out about machines in the Vagrant environment via a REST API.
|
||||
|
||||
## Standalone Salt Configuration on GCE
|
||||
|
||||
On GCE, the master and nodes are all configured as [standalone minions](http://docs.saltstack.com/en/latest/topics/tutorials/standalone_minion.html). The configuration for each VM is derived from the VM's [instance metadata](https://cloud.google.com/compute/docs/metadata) and then stored in Salt grains (`/etc/salt/minion.d/grains.conf`) and pillars (`/srv/salt-overlay/pillar/cluster-params.sls`) that local Salt uses to enforce state.
|
||||
|
||||
All remaining sections that refer to master/minion setups should be ignored for GCE. One fallout of the GCE setup is that the Salt mine doesn't exist - there is no sharing of configuration amongst nodes.
|
||||
|
||||
## Salt security
|
||||
|
||||
*(Not applicable on default GCE setup.)*
|
||||
|
||||
Security is not enabled on the salt-master, and the salt-master is configured to auto-accept incoming requests from minions. It is not recommended to use this security configuration in production environments without deeper study. (In some environments this isn't as bad as it might sound if the salt master port isn't externally accessible and you trust everyone on your network.)
|
||||
|
||||
```
|
||||
[root@kubernetes-master] $ cat /etc/salt/master.d/auto-accept.conf
|
||||
open_mode: True
|
||||
auto_accept: True
|
||||
```
|
||||
|
||||
## Salt minion configuration
|
||||
|
||||
Each minion in the salt cluster has an associated configuration that instructs the salt-master how to provision the required resources on the machine.
|
||||
|
||||
An example file is presented below using the Vagrant based environment.
|
||||
|
||||
```
|
||||
[root@kubernetes-master] $ cat /etc/salt/minion.d/grains.conf
|
||||
grains:
|
||||
etcd_servers: $MASTER_IP
|
||||
cloud_provider: vagrant
|
||||
roles:
|
||||
- kubernetes-master
|
||||
```
|
||||
|
||||
Each hosting environment has a slightly different grains.conf file that is used to build conditional logic where required in the Salt files.
|
||||
|
||||
The following enumerates the set of defined key/value pairs that are supported today. If you add new ones, please make sure to update this list.
|
||||
|
||||
Key | Value
|
||||
------------- | -------------
|
||||
`api_servers` | (Optional) The IP address / host name where a kubelet can get read-only access to kube-apiserver
|
||||
`cbr-cidr` | (Optional) The minion IP address range used for the docker container bridge.
|
||||
`cloud` | (Optional) Which IaaS platform is used to host kubernetes, *gce*, *azure*, *aws*, *vagrant*
|
||||
`etcd_servers` | (Optional) Comma-delimited list of IP addresses the kube-apiserver and kubelet use to reach etcd. Uses the IP of the first machine in the kubernetes_master role, or 127.0.0.1 on GCE.
|
||||
`hostnamef` | (Optional) The full host name of the machine, i.e. uname -n
|
||||
`node_ip` | (Optional) The IP address to use to address this node
|
||||
`hostname_override` | (Optional) Mapped to the kubelet hostname_override
|
||||
`network_mode` | (Optional) Networking model to use among nodes: *openvswitch*
|
||||
`networkInterfaceName` | (Optional) Networking interface to use to bind addresses, default value *eth0*
|
||||
`publicAddressOverride` | (Optional) The IP address the kube-apiserver should use to bind against for external read-only access
|
||||
`roles` | (Required) 1. `kubernetes-master` means this machine is the master in the kubernetes cluster. 2. `kubernetes-pool` means this machine is a kubernetes-minion. Depending on the role, the Salt scripts will provision different resources on the machine.
|
||||
|
||||
These keys may be leveraged by the Salt sls files to branch behavior.
|
||||
|
||||
In addition, a cluster may be running a Debian based operating system or Red Hat based operating system (Centos, Fedora, RHEL, etc.). As a result, its important to sometimes distinguish behavior based on operating system using if branches like the following.
|
||||
|
||||
```
|
||||
{% if grains['os_family'] == 'RedHat' %}
|
||||
// something specific to a RedHat environment (Centos, Fedora, RHEL) where you may use yum, systemd, etc.
|
||||
{% else %}
|
||||
// something specific to Debian environment (apt-get, initd)
|
||||
{% endif %}
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. When configuring default arguments for processes, its best to avoid the use of EnvironmentFiles (Systemd in Red Hat environments) or init.d files (Debian distributions) to hold default values that should be common across operating system environments. This helps keep our Salt template files easy to understand for editors that may not be familiar with the particulars of each distribution.
|
||||
|
||||
## Future enhancements (Networking)
|
||||
|
||||
Per pod IP configuration is provider specific, so when making networking changes, its important to sand-box these as all providers may not use the same mechanisms (iptables, openvswitch, etc.)
|
||||
|
||||
We should define a grains.conf key that captures more specifically what network configuration environment is being used to avoid future confusion across providers.
|
||||
|
||||
## Further reading
|
||||
|
||||
The [cluster/saltbase](../cluster/saltbase/) tree has more details on the current SaltStack configuration.
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
Reference in New Issue
Block a user