move admin related docs into docs/admin

This commit is contained in:
Daniel Smith
2015-07-09 13:33:48 -07:00
parent bdbcbe2e2f
commit 2c333e4bc2
32 changed files with 45 additions and 30 deletions

92
docs/admin/README.md Normal file
View File

@@ -0,0 +1,92 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Kubernetes Cluster Admin Guide
The cluster admin guide is for anyone creating or administering a Kubernetes cluster.
It assumes some familiarity with concepts in the [User Guide](user-guide.md).
## Planning a cluster
There are many different examples of how to setup a kubernetes cluster. Many of them are listed in this
[matrix](getting-started-guides/README.md). We call each of the combinations in this matrix a *distro*.
Before choosing a particular guide, here are some things to consider:
- Are you just looking to try out Kubernetes on your laptop, or build a high-availability many-node cluster? Both
models are supported, but some distros are better for one case or the other.
- Will you be using a hosted Kubernetes cluster, such as [GKE](https://cloud.google.com/container-engine), or setting
one up yourself?
- Will your cluster be on-premises, or in the cloud (IaaS)? Kubernetes does not directly support hybrid clusters. We
recommend setting up multiple clusters rather than spanning distant locations.
- Will you be running Kubernetes on "bare metal" or virtual machines? Kubernetes supports both, via different distros.
- Do you just want to run a cluster, or do you expect to do active development of kubernetes project code? If the
latter, it is better to pick a distro actively used by other developers. Some distros only use binary releases, but
offer is a greater variety of choices.
- Not all distros are maintained as actively. Prefer ones which are listed as tested on a more recent version of
Kubernetes.
- If you are configuring kubernetes on-premises, you will need to consider what [networking
model](networking.md) fits best.
- If you are designing for very [high-availability](availability.md), you may want multiple clusters in multiple zones.
## Setting up a cluster
Pick one of the Getting Started Guides from the [matrix](getting-started-guides/README.md) and follow it.
If none of the Getting Started Guides fits, you may want to pull ideas from several of the guides.
One option for custom networking is *OpenVSwitch GRE/VxLAN networking* ([ovs-networking.md](ovs-networking.md)), which
uses OpenVSwitch to set up networking between pods across
Kubernetes nodes.
If you are modifying an existing guide which uses Salt, this document explains [how Salt is used in the Kubernetes
project.](salt.md).
## Upgrading a cluster
[Upgrading a cluster](cluster_management.md).
## Managing nodes
[Managing nodes](node.md).
## Optional Cluster Services
* **DNS Integration with SkyDNS** ([dns.md](dns.md)):
Resolving a DNS name directly to a Kubernetes service.
* **Logging** with [Kibana](logging.md)
## Multi-tenant support
* **Namespaces** ([namespaces.md](namespaces.md)): Namespaces help different
projects, teams, or customers to share a kubernetes cluster.
* **Resource Quota** ([resource_quota_admin.md](resource_quota_admin.md))
## Security
* **Kubernetes Container Environment** ([container-environment.md](container-environment.md)):
Describes the environment for Kubelet managed containers on a Kubernetes
node.
* **Securing access to the API Server** [accessing the api](accessing_the_api.md)
* **Authentication** [authentication](authentication.md)
* **Authorization** [authorization](authorization.md)
* **Admission Controllers** [admission_controllers](admission_controllers.md)
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/cluster-admin-guide.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@@ -0,0 +1,94 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Configuring APIserver ports
This document describes what ports the kubernetes apiserver
may serve on and how to reach them. The audience is
cluster administrators who want to customize their cluster
or understand the details.
Most questions about accessing the cluster are covered
in [Accessing the cluster](accessing-the-cluster.md).
## Ports and IPs Served On
The Kubernetes API is served by the Kubernetes APIServer process. Typically,
there is one of these running on a single kubernetes-master node.
By default the Kubernetes APIserver serves HTTP on 2 ports:
1. Localhost Port
- serves HTTP
- default is port 8080, change with `--insecure-port` flag.
- defaults IP is localhost, change with `--insecure-bind-address` flag.
- no authentication or authorization checks in HTTP
- protected by need to have host access
2. Secure Port
- default is port 6443, change with `--secure-port` flag.
- default IP is first non-localhost network interface, change with `--bind-address` flag.
- serves HTTPS. Set cert with `--tls-cert-file` and key with `--tls-private-key-file` flag.
- uses token-file or client-certificate based [authentication](authentication.md).
- uses policy-based [authorization](authorization.md).
3. Removed: ReadOnly Port
- For security reasons, this had to be removed. Use the service account feature instead.
## Proxies and Firewall rules
Additionally, in some configurations there is a proxy (nginx) running
on the same machine as the apiserver process. The proxy serves HTTPS protected
by Basic Auth on port 443, and proxies to the apiserver on localhost:8080. In
these configurations the secure port is typically set to 6443.
A firewall rule is typically configured to allow external HTTPS access to port 443.
The above are defaults and reflect how Kubernetes is deployed to Google Compute Engine using
kube-up.sh. Other cloud providers may vary.
## Use Cases vs IP:Ports
There are three differently configured serving ports because there are a
variety of uses cases:
1. Clients outside of a Kubernetes cluster, such as human running `kubectl`
on desktop machine. Currently, accesses the Localhost Port via a proxy (nginx)
running on the `kubernetes-master` machine. Proxy uses bearer token authentication.
2. Processes running in Containers on Kubernetes that need to do read from
the apiserver. Currently, these can use a service account.
3. Scheduler and Controller-manager processes, which need to do read-write
API operations. Currently, these have to run on the operations on the
apiserver. Currently, these have to run on the same host as the
apiserver and use the Localhost Port. In the future, these will be
switched to using service accounts to avoid the need to be co-located.
4. Kubelets, which need to do read-write API operations and are necessarily
on different machines than the apiserver. Kubelet uses the Secure Port
to get their pods, to find the services that a pod can see, and to
write events. Credentials are distributed to kubelets at cluster
setup time.
## Expected changes
- Policy will limit the actions kubelets can do via the authed port.
- Kubelets will change from token-based authentication to cert-based-auth.
- Scheduler and Controller-manager will use the Secure Port too. They
will then be able to run on different machines than the apiserver.
- A general mechanism will be provided for [giving credentials to
pods](
https://github.com/GoogleCloudPlatform/kubernetes/issues/1907).
- Clients, like kubectl, will all support token-based auth, and the
Localhost will no longer be needed, and will not be the default.
However, the localhost port may continue to be an option for
installations that want to do their own auth proxy.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/accessing_the_api.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@@ -0,0 +1,146 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Admission Controllers
**Table of Contents**
<!-- BEGIN MUNGE: GENERATED_TOC -->
- [Admission Controllers](#admission-controllers)
- [What are they?](#what-are-they?)
- [Why do I need them?](#why-do-i-need-them?)
- [How do I turn on an admission control plug-in?](#how-do-i-turn-on-an-admission-control-plug-in?)
- [What does each plug-in do?](#what-does-each-plug-in-do?)
- [AlwaysAdmit](#alwaysadmit)
- [AlwaysDeny](#alwaysdeny)
- [DenyExecOnPrivileged](#denyexeconprivileged)
- [ServiceAccount](#serviceaccount)
- [SecurityContextDeny](#securitycontextdeny)
- [ResourceQuota](#resourcequota)
- [LimitRanger](#limitranger)
- [NamespaceExists](#namespaceexists)
- [NamespaceAutoProvision (deprecated)](#namespaceautoprovision-(deprecated))
- [NamespaceLifecycle](#namespacelifecycle)
- [Is there a recommended set of plug-ins to use?](#is-there-a-recommended-set-of-plug-ins-to-use?)
<!-- END MUNGE: GENERATED_TOC -->
## What are they?
An admission control plug-in is a piece of code that intercepts requests to the Kubernetes
API server prior to persistence of the object, but after the request is authenticated
and authorized. The plug-in code is in the API server process
and must be compiled into the binary in order to be used at this time.
Each admission control plug-in is run in sequence before a request is accepted into the cluster. If
any of the plug-ins in the sequence reject the request, the entire request is rejected immediately
and an error is returned to the end-user.
Admission control plug-ins may mutate the incoming object in some cases to apply system configured
defaults. In addition, admission control plug-ins may mutate related resources as part of request
processing to do things like increment quota usage.
## Why do I need them?
Many advanced features in Kubernetes require an admission control plug-in to be enabled in order
to properly support the feature. As a result, a Kubernetes API server that is not properly
configured with the right set of admission control plug-ins is an incomplete server and will not
support all the features you expect.
## How do I turn on an admission control plug-in?
The Kubernetes API server supports a flag, ```admission_control``` that takes a comma-delimited,
ordered list of admission control choices to invoke prior to modifying objects in the cluster.
## What does each plug-in do?
### AlwaysAdmit
Use this plugin by itself to pass-through all requests.
### AlwaysDeny
Rejects all requests. Used for testing.
### DenyExecOnPrivileged
This plug-in will intercept all requests to exec a command in a pod if that pod has a privileged container.
If your cluster supports privileged containers, and you want to restrict the ability of end-users to exec
commands in those containers, we strongly encourage enabling this plug-in.
### ServiceAccount
This plug-in implements automation for [serviceAccounts](service_accounts.md).
We strongly recommend using this plug-in if you intend to make use of Kubernetes ```ServiceAccount``` objects.
### SecurityContextDeny
This plug-in will deny any pod with a [SecurityContext](security_context.md) that defines options that were not available on the ```Container```.
### ResourceQuota
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
enumerated in the ```ResourceQuota``` object in a ```Namespace```. If you are using ```ResourceQuota```
objects in your Kubernetes deployment, you MUST use this plug-in to enforce quota constraints.
See the [resourceQuota design doc](design/admission_control_resource_quota.md).
It is strongly encouraged that this plug-in is configured last in the sequence of admission control plug-ins. This is
so that quota is not prematurely incremented only for the request to be rejected later in admission control.
### LimitRanger
This plug-in will observe the incoming request and ensure that it does not violate any of the constraints
enumerated in the ```LimitRange``` object in a ```Namespace```. If you are using ```LimitRange``` objects in
your Kubernetes deployment, you MUST use this plug-in to enforce those constraints.
See the [limitRange design doc](design/admission_control_limit_range.md).
### NamespaceExists
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes ```Namespace```
and reject the request if the ```Namespace``` was not previously created. We strongly recommend running
this plug-in to ensure integrity of your data.
### NamespaceAutoProvision (deprecated)
This plug-in will observe all incoming requests that attempt to create a resource in a Kubernetes ```Namespace```
and create a new ```Namespace``` if one did not already exist previously.
We strongly recommend ```NamespaceExists``` over ```NamespaceAutoProvision```.
### NamespaceLifecycle
This plug-in enforces that a ```Namespace``` that is undergoing termination cannot have new content created in it.
A ```Namespace``` deletion kicks off a sequence of operations that remove all content (pods, services, etc.) in that
namespace. In order to enforce integrity of that process, we strongly recommend running this plug-in.
Once ```NamespaceAutoProvision``` is deprecated, we anticipate ```NamespaceLifecycle``` and ```NamespaceExists``` will
be merged into a single plug-in that enforces the life-cycle of a ```Namespace``` in Kubernetes.
## Is there a recommended set of plug-ins to use?
Yes.
For Kubernetes 1.0, we strongly recommend running the following set of admission control plug-ins (order matters):
```shell
--admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ServiceAccount,ResourceQuota
```
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admission_controllers.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@@ -0,0 +1,59 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Authentication Plugins
Kubernetes uses client certificates, tokens, or http basic auth to authenticate users for API calls.
Client certificate authentication is enabled by passing the `--client_ca_file=SOMEFILE`
option to apiserver. The referenced file must contain one or more certificates authorities
to use to validate client certificates presented to the apiserver. If a client certificate
is presented and verified, the common name of the subject is used as the user name for the
request.
Token authentication is enabled by passing the `--token_auth_file=SOMEFILE` option
to apiserver. Currently, tokens last indefinitely, and the token list cannot
be changed without restarting apiserver. We plan in the future for tokens to
be short-lived, and to be generated as needed rather than stored in a file.
The token file format is implemented in `plugin/pkg/auth/authenticator/token/tokenfile/...`
and is a csv file with 3 columns: token, user name, user uid.
When using token authentication from an http client the apiserver expects an `Authorization`
header with a value of `Bearer SOMETOKEN`.
Basic authentication is enabled by passing the `--basic_auth_file=SOMEFILE`
option to apiserver. Currently, the basic auth credentials last indefinitely,
and the password cannot be changed without restarting apiserver. Note that basic
authentication is currently supported for convenience while we finish making the
more secure modes described above easier to use.
The basic auth file format is implemented in `plugin/pkg/auth/authenticator/password/passwordfile/...`
and is a csv file with 3 columns: password, user name, user id.
When using basic authentication from an http client the apiserver expects an `Authorization` header
with a value of `Basic BASE64ENCODEDUSER:PASSWORD`.
## Plugin Development
We plan for the Kubernetes API server to issue tokens
after the user has been (re)authenticated by a *bedrock* authentication
provider external to Kubernetes. We plan to make it easy to develop modules
that interface between kubernetes and a bedrock authentication provider (e.g.
github.com, google.com, enterprise directory, kerberos, etc.)
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/authentication.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

122
docs/admin/authorization.md Normal file
View File

@@ -0,0 +1,122 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Authorization Plugins
In Kubernetes, authorization happens as a separate step from authentication.
See the [authentication documentation](authentication.md) for an
overview of authentication.
Authorization applies to all HTTP accesses on the main apiserver port. (The
readonly port is not currently subject to authorization, but is planned to be
removed soon.)
The authorization check for any request compares attributes of the context of
the request, (such as user, resource, and namespace) with access
policies. An API call must be allowed by some policy in order to proceed.
The following implementations are available, and are selected by flag:
- `--authorization_mode=AlwaysDeny`
- `--authorization_mode=AlwaysAllow`
- `--authorization_mode=ABAC`
`AlwaysDeny` blocks all requests (used in tests).
`AlwaysAllow` allows all requests; use if you don't need authorization.
`ABAC` allows for user-configured authorization policy. ABAC stands for Attribute-Based Access Control.
## ABAC Mode
### Request Attributes
A request has 4 attributes that can be considered for authorization:
- user (the user-string which a user was authenticated as).
- whether the request is readonly (GETs are readonly)
- what resource is being accessed
- applies only to the API endpoints, such as
`/api/v1/namespaces/default/pods`. For miscellaneous endpoints, like `/version`, the
resource is the empty string.
- the namespace of the object being access, or the empty string if the
endpoint does not support namespaced objects.
We anticipate adding more attributes to allow finer grained access control and
to assist in policy management.
### Policy File Format
For mode `ABAC`, also specify `--authorization_policy_file=SOME_FILENAME`.
The file format is [one JSON object per line](http://jsonlines.org/). There should be no enclosing list or map, just
one map per line.
Each line is a "policy object". A policy object is a map with the following properties:
- `user`, type string; the user-string from `--token_auth_file`
- `readonly`, type boolean, when true, means that the policy only applies to GET
operations.
- `resource`, type string; a resource from an URL, such as `pods`.
- `namespace`, type string; a namespace string.
An unset property is the same as a property set to the zero value for its type (e.g. empty string, 0, false).
However, unset should be preferred for readability.
In the future, policies may be expressed in a JSON format, and managed via a REST
interface.
### Authorization Algorithm
A request has attributes which correspond to the properties of a policy object.
When a request is received, the attributes are determined. Unknown attributes
are set to the zero value of its type (e.g. empty string, 0, false).
An unset property will match any value of the corresponding
attribute. An unset attribute will match any value of the corresponding property.
The tuple of attributes is checked for a match against every policy in the policy file.
If at least one line matches the request attributes, then the request is authorized (but may fail later validation).
To permit any user to do something, write a policy with the user property unset.
To permit an action Policy with an unset namespace applies regardless of namespace.
### Examples
1. Alice can do anything: `{"user":"alice"}`
2. Kubelet can read any pods: `{"user":"kubelet", "resource": "pods", "readonly": true}`
3. Kubelet can read and write events: `{"user":"kubelet", "resource": "events"}`
4. Bob can just read pods in namespace "projectCaribou": `{"user":"bob", "resource": "pods", "readonly": true, "ns": "projectCaribou"}`
[Complete file example](../pkg/auth/authorizer/abac/example_policy_file.jsonl)
## Plugin Development
Other implementations can be developed fairly easily.
The APIserver calls the Authorizer interface:
```go
type Authorizer interface {
Authorize(a Attributes) error
}
```
to determine whether or not to allow each API action.
An authorization plugin is a module that implements this interface.
Authorization plugin code goes in `pkg/auth/authorization/$MODULENAME`.
An authorization module can be completely implemented in go, or can call out
to a remote authorization service. Authorization modules can implement
their own caching to reduce the cost of repeated authorization calls with the
same or similar arguments. Developers should then consider the interaction between
caching and revocation of permissions.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/authorization.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

149
docs/admin/availability.md Normal file
View File

@@ -0,0 +1,149 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Availability
This document collects advice on reasoning about and provisioning for high-availability when using Kubernetes clusters.
## Failure modes
This is an incomplete list of things that could go wrong, and how to deal with them.
Root causes:
- VM(s) shutdown
- network partition within cluster, or between cluster and users.
- crashes in Kubernetes software
- data loss or unavailability of persistent storage (e.g. GCE PD or AWS EBS volume).
- operator error misconfigures kubernetes software or application software.
Specific scenarios:
- Apiserver VM shutdown or apiserver crashing
- Results
- unable to stop, update, or start new pods, services, replication controller
- existing pods and services should continue to work normally, unless they depend on the Kubernetes API
- Apiserver backing storage lost
- Results
- apiserver should fail to come up.
- kubelets will not be able to reach it but will continue to run the same pods and provide the same service proxying.
- manual recovery or recreation of apiserver state necessary before apiserver is restarted.
- Supporting services (node controller, replication controller manager, scheduler, etc) VM shutdown or crashes
- currently those are colocated with the apiserver, and their unavailability has similar consequences as apiserver
- in future, these will be replicated as well and may not be co-located
- they do not have own persistent state
- Node (thing that runs kubelet and kube-proxy and pods) shutdown
- Results
- pods on that Node stop running
- Kubelet software fault
- Results
- crashing kubelet cannot start new pods on the node
- kubelet might delete the pods or not
- node marked unhealthy
- replication controllers start new pods elsewhere
- Cluster operator error
- Results:
- loss of pods, services, etc
- lost of apiserver backing store
- users unable to read API
- etc
Mitigations:
- Action: Use IaaS providers automatic VM restarting feature for IaaS VMs.
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Mitigates: Supporting services VM shutdown or crashes
- Action use IaaS providers reliable storage (e.g GCE PD or AWS EBS volume) for VMs with apiserver+etcd.
- Mitigates: Apiserver backing storage lost
- Action: Use Replicated APIserver feature (when complete: feature is planned but not implemented)
- Mitigates: Apiserver VM shutdown or apiserver crashing
- Will tolerate one or more simultaneous apiserver failures.
- Mitigates: Apiserver backing storage lost
- Each apiserver has independent storage. Etcd will recover from loss of one member. Risk of total data loss greatly reduced.
- Action: Snapshot apiserver PDs/EBS-volumes periodically
- Mitigates: Apiserver backing storage lost
- Mitigates: Some cases of operator error
- Mitigates: Some cases of kubernetes software fault
- Action: use replication controller and services in front of pods
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault
- Action: applications (containers) designed to tolerate unexpected restarts
- Mitigates: Node shutdown
- Mitigates: Kubelet software fault
- Action: Multiple independent clusters (and avoid making risky changes to all clusters at once)
- Mitigates: Everything listed above.
## Choosing Multiple Kubernetes Clusters
You may want to set up multiple kubernetes clusters, both to
have clusters in different regions to be nearer to your users; and to tolerate failures and/or invasive maintenance.
### Scope of a single cluster
On IaaS providers such as Google Compute Engine or Amazon Web Services, a VM exists in a
[zone](https://cloud.google.com/compute/docs/zones) or [availability
zone](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html).
We suggest that all the VMs in a Kubernetes cluster should be in the same availability zone, because:
- compared to having a single global Kubernetes cluster, there are fewer single-points of failure
- compared to a cluster that spans availability zones, it is easier to reason about the availability properties of a
single-zone cluster.
- when the Kubernetes developers are designing the system (e.g. making assumptions about latency, bandwidth, or
correlated failures) they are assuming all the machines are in a single data center, or otherwise closely connected.
It is okay to have multiple clusters per availability zone, though on balance we think fewer is better.
Reasons to prefer fewer clusters are:
- improved bin packing of Pods in some cases with more nodes in one cluster.
- reduced operational overhead (though the advantage is diminished as ops tooling and processes matures).
- reduced costs for per-cluster fixed resource costs, e.g. apiserver VMs (but small as a percentage
of overall cluster cost for medium to large clusters).
Reasons to have multiple clusters include:
- strict security policies requiring isolation of one class of work from another (but, see Partitioning Clusters
below).
- test clusters to canary new Kubernetes releases or other cluster software.
### Selecting the right number of clusters
The selection of the number of kubernetes clusters may be a relatively static choice, only revisited occasionally.
By contrast, the number of nodes in a cluster and the number of pods in a service may be change frequently according to
load and growth.
To pick the number of clusters, first, decide which regions you need to be in to have adequate latency to all your end users, for services that will run
on Kubernetes (if you use a Content Distribution Network, the latency requirements for the CDN-hosted content need not
be considered). Legal issues might influence this as well. For example, a company with a global customer base might decide to have clusters in US, EU, AP, and SA regions.
Call the number of regions to be in `R`.
Second, decide how many clusters should be able to be unavailable at the same time, while still being available. Call
the number that can be unavailable `U`. If you are not sure, then 1 is a fine choice.
If it is allowable for load-balancing to direct traffic to any region in the event of a cluster failure, then
you need `R + U` clusters. If it is not (e.g you want to ensure low latency for all users in the event of a
cluster failure), then you need to have `R * U` clusters (`U` in each of `R` regions). In any case, try to put each cluster in a different zone.
Finally, if any of your clusters would need more than the maximum recommended number of nodes for a Kubernetes cluster, then
you may need even more clusters. Our [roadmap](roadmap.md)
calls for maximum 100 node clusters at v1.0 and maximum 1000 node clusters in the middle of 2015.
## Working with multiple clusters
When you have multiple clusters, you would typically create services with the same config in each cluster and put each of those
service instances behind a load balancer (AWS Elastic Load Balancer, GCE Forwarding Rule or HTTP Load Balancer), so that
failures of a single cluster are not visible to end users.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/availability.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@@ -0,0 +1,78 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Cluster Management
This doc is in progress.
## Upgrading a cluster
The `cluster/kube-push.sh` script will do a rudimentary update; it is a 1.0 roadmap item to have a robust live cluster update system.
## Updgrading to a different API version
There is a sequence of steps to upgrade to a new API version.
1. Turn on the new api version
2. Upgrade the cluster's storage to use the new version.
3. Upgrade all config files. Identify users of the old api version endpoints.
4. Update existing objects in the storage to new version by running cluster/update-storage-objects.sh
3. Turn off the old version.
### Turn on or off an API version for your cluster
Specific API versions can be turned on or off by passing --runtime-config=api/<version> flag while bringing up the server. For example: to turn off v1 API, pass --runtime-config=api/v1=false.
runtime-config also supports 2 special keys: api/all and api/legacy to control all and legacy APIs respectively. For example, for turning off all api versions except v1, pass --runtime-config=api/all=false,api/v1=true.
### Switching your cluster's storage API version
KUBE_API_VERSIONS env var controls the API versions that are supported in the cluster. The first version in the list is used as the cluster's storage version. Hence, to set a specific version as the storage version, bring it to the front of list of versions in the value of KUBE_API_VERSIONS.
### Switching your config files to a new API version
You can use the kube-version-change utility to convert config files between different API versions.
```
$ hack/build-go.sh cmd/kube-version-change
$ _output/local/go/bin/kube-version-change -i myPod.v1beta3.yaml -o myPod.v1.yaml
```
### Maintenance on a Node
If you need to reboot a node (such as for a kernel upgrade, libc upgrade, hardware repair, etc.), and the downtime is
brief, then when the Kubelet restarts, it will attempt to restart the pods scheduled to it. If the reboot takes longer,
then the node controller will terminate the pods that are bound to the unavailable node. If there is a corresponding
replication controller, then a new copy of the pod will be started on a different node. So, in the case where all
pods are replicated, upgrades can be done without special coordination.
If you want more control over the upgrading process, you may use the following workflow:
1. Mark the node to be rebooted as unschedulable:
`kubectl replace nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": true}}'`.
This keeps new pods from landing on the node while you are trying to get them off.
1. Get the pods off the machine, via any of the following strategies:
1. wait for finite-duration pods to complete
1. delete pods with `kubectl delete pods $PODNAME`
1. for pods with a replication controller, the pod will eventually be replaced by a new pod which will be scheduled to a new node. additionally, if the pod is part of a service, then clients will automatically be redirected to the new pod.
1. for pods with no replication controller, you need to bring up a new copy of the pod, and assuming it is not part of a service, redirect clients to it.
1. Work on the node
1. Make the node schedulable again:
`kubectl replace nodes $NODENAME --patch='{"apiVersion": "v1", "spec": {"unschedulable": false}}'`.
If you deleted the node's VM instance and created a new one, then a new schedulable node resource will
be created automatically when you create a new VM instance (if you're using a cloud provider that supports
node discovery; currently this is only Google Compute Engine, not including CoreOS on Google Compute Engine using kube-register). See [Node](node.md).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/cluster_management.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

57
docs/admin/dns.md Normal file
View File

@@ -0,0 +1,57 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# DNS Integration with Kubernetes
As of kubernetes 0.8, DNS is offered as a [cluster add-on](../cluster/addons/README.md).
If enabled, a DNS Pod and Service will be scheduled on the cluster, and the kubelets will be
configured to tell individual containers to use the DNS Service's IP.
Every Service defined in the cluster (including the DNS server itself) will be
assigned a DNS name. By default, a client Pod's DNS search list will
include the Pod's own namespace and the cluster's default domain. This is best
illustrated by example:
Assume a Service named `foo` in the kubernetes namespace `bar`. A Pod running
in namespace `bar` can look up this service by simply doing a DNS query for
`foo`. A Pod running in namespace `quux` can look up this service by doing a
DNS query for `foo.bar`.
The cluster DNS server ([SkyDNS](https://github.com/skynetservices/skydns))
supports forward lookups (A records) and service lookups (SRV records).
## How it Works
The running DNS pod holds 3 containers - skydns, etcd (which skydns uses),
and a kubernetes-to-skydns bridge called kube2sky. The kube2sky process
watches the kubernetes master for changes in Services, and then writes the
information to etcd, which skydns reads. This etcd instance is not linked to
any other etcd clusters that might exist, including the kubernetes master.
## Issues
The skydns service is reachable directly from kubernetes nodes (outside
of any container) and DNS resolution works if the skydns service is targeted
explicitly. However, nodes are not configured to use the cluster DNS service or
to search the cluster's DNS domain by default. This may be resolved at a later
time.
## For more information
See [the docs for the DNS cluster addon](../cluster/addons/dns/README.md).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/dns.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

15
docs/admin/namespaces.md Normal file
View File

@@ -0,0 +1,15 @@
# Namespaces
Namespaces help different projects, teams, or customers to share a kubernetes cluster. First, they provide a scope for [Names](../identifiers.md). Second, as our access control code develops, it is expected that it will be convenient to attach authorization and other policy to namespaces.
Use of multiple namespaces is optional. For small teams, they may not be needed.
This is a placeholder document about namespace administration.
TODO: document namespace creation, ownership assignment, visibility rules,
policy creation, interaction with network.
Namespaces are still under development. For now, the best documentation is the [Namespaces Design Document](../design/namespaces.md). The user documentation can be found at [Namespaces](../../docs/namespaces.md)
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/admin/namespaces.md?pixel)]()

212
docs/admin/networking.md Normal file
View File

@@ -0,0 +1,212 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Networking in Kubernetes
**Table of Contents**
<!-- BEGIN MUNGE: GENERATED_TOC -->
- [Networking in Kubernetes](#networking-in-kubernetes)
- [Summary](#summary)
- [Docker model](#docker-model)
- [Kubernetes model](#kubernetes-model)
- [How to achieve this](#how-to-achieve-this)
- [Google Compute Engine (GCE)](#google-compute-engine-(gce))
- [L2 networks and linux bridging](#l2-networks-and-linux-bridging)
- [Flannel](#flannel)
- [OpenVSwitch](#openvswitch)
- [Weave](#weave)
- [Calico](#calico)
- [Other reading](#other-reading)
<!-- END MUNGE: GENERATED_TOC -->
Kubernetes approaches networking somewhat differently than Docker does by
default. There are 4 distinct networking problems to solve:
1. Highly-coupled container-to-container communications: this is solved by
[pods](pods.md) and `localhost` communications.
2. Pod-to-Pod communications: this is the primary focus of this document.
3. Pod-to-Service communications: this is covered by [services](services.md).
4. External-to-Service communications: this is covered by [services](services.md).
## Summary
Kubernetes assumes that pods can communicate with other pods, regardless of
which host they land on. We give every pod its own IP address so you do not
need to explicitly create links between pods and you almost never need to deal
with mapping container ports to host ports. This creates a clean,
backwards-compatible model where pods can be treated much like VMs or physical
hosts from the perspectives of port allocation, naming, service discovery, load
balancing, application configuration, and migration.
To achieve this we must impose some requirements on how you set up your cluster
networking.
## Docker model
Before discussing the Kubernetes approach to networking, it is worthwhile to
review the "normal" way that networking works with Docker. By default, Docker
uses host-private networking. It creates a virtual bridge, called `docker0` by
default, and allocates a subnet from one of the private address blocks defined
in [RFC1918](https://tools.ietf.org/html/rfc1918) for that bridge. For each
container that Docker creates, it allocates a virtual ethernet device (called
`veth`) which is attached to the bridge. The veth is mapped to appear as eth0
in the container, using Linux namespaces. The in-container eth0 interface is
given an IP address from the bridge's address range.
The result is that Docker containers can talk to other containers only if they
are on the same machine (and thus the same virtual bridge). Containers on
different machines can not reach each other - in fact they may end up with the
exact same network ranges and IP addresses.
In order for Docker containers to communicate across nodes, they must be
allocated ports on the machine's own IP address, which are then forwarded or
proxied to the containers. This obviously means that containers must either
coordinate which ports they use very carefully or else be allocated ports
dynamically.
## Kubernetes model
Coordinating ports across multiple developers is very difficult to do at
scale and exposes users to cluster-level issues outside of their control.
Dynamic port allocation brings a lot of complications to the system - every
application has to take ports as flags, the API servers have to know how to
insert dynamic port numbers into configuration blocks, services have to know
how to find each other, etc. Rather than deal with this, Kubernetes takes a
different approach.
Kubernetes imposes the following fundamental requirements on any networking
implementation (barring any intentional network segmentation policies):
* all containers can communicate with all other containers without NAT
* all nodes can communicate with all containers (and vice-versa) without NAT
* the IP that a container sees itself as is the same IP that others see it as
What this means in practice is that you can not just take two computers
running Docker and expect Kubernetes to work. You must ensure that the
fundamental requirements are met.
This model is not only less complex overall, but it is principally compatible
with the desire for Kubernetes to enable low-friction porting of apps from VMs
to containers. If your job previously ran in a VM, your VM had an IP and could
talk to other VMs in your project. This is the same basic model.
Until now this document has talked about containers. In reality, Kubernetes
applies IP addresses at the `Pod` scope - containers within a `Pod` share their
network namespaces - including their IP address. This means that containers
within a `Pod` can all reach each others ports on `localhost`. This does imply
that containers within a `Pod` must coordinate port usage, but this is no
different that processes in a VM. We call this the "IP-per-pod" model. This
is implemented in Docker as a "pod container" which holds the network namespace
open while "app containers" (the things the user specified) join that namespace
with Docker's `--net=container:<id>` function.
As with Docker, it is possible to request host ports, but this is reduced to a
very niche operation. In this case a port will be allocated on the host `Node`
and traffic will be forwarded to the `Pod`. The `Pod` itself is blind to the
existence or non-existence of host ports.
## How to achieve this
There are a number of ways that this network model can be implemented. This
document is not an exhaustive study of the various methods, but hopefully serves
as an introduction to various technologies and serves as a jumping-off point.
If some techniques become vastly preferable to others, we might detail them more
here.
### Google Compute Engine (GCE)
For the Google Compute Engine cluster configuration scripts, we use [advanced
routing](https://developers.google.com/compute/docs/networking#routing) to
assign each VM a subnet (default is /24 - 254 IPs). Any traffic bound for that
subnet will be routed directly to the VM by the GCE network fabric. This is in
addition to the "main" IP address assigned to the VM, which is NAT'ed for
outbound internet access. A linux bridge (called `cbr0`) is configured to exist
on that subnet, and is passed to docker's `--bridge` flag.
We start Docker with:
```
DOCKER_OPTS="--bridge=cbr0 --iptables=false --ip-masq=false"
```
This bridge is created by Kubelet (controlled by the `--configure-cbr0=true`
flag) according to the `Node`'s `spec.podCIDR`.
Docker will now allocate IPs from the `cbr-cidr` block. Containers can reach
each other and `Nodes` over the `cbr0` bridge. Those IPs are all routable
within the GCE project network.
GCE itself does not know anything about these IPs, though, so it will not NAT
them for outbound internet traffic. To achieve that we use an iptables rule to
masquerade (aka SNAT - to make it seem as if packets came from the `Node`
itself) traffic that is bound for IPs outside the GCE project network
(10.0.0.0/8).
```
iptables -t nat -A POSTROUTING ! -d 10.0.0.0/8 -o eth0 -j MASQUERADE
```
Lastly we enable IP forwarding in the kernel (so the kernel will process
packets for bridged containers):
```
sysctl net.ipv4.ip_forward=1
```
The result of all this is that all `Pods` can reach each other and can egress
traffic to the internet.
### L2 networks and linux bridging
If you have a "dumb" L2 network, such as a simple switch in a "bare-metal"
environment, you should be able to do something similar to the above GCE setup.
Note that these instructions have only been tried very casually - it seems to
work, but has not been thoroughly tested. If you use this technique and
perfect the process, please let us know.
Follow the "With Linux Bridge devices" section of [this very nice
tutorial](http://blog.oddbit.com/2014/08/11/four-ways-to-connect-a-docker/) from
Lars Kellogg-Stedman.
### Flannel
[Flannel](https://github.com/coreos/flannel#flannel) is a very simple overlay
network that satisfies the Kubernetes requirements. It installs in minutes and
should get you up and running if the above techniques are not working. Many
people have reported success with Flannel and Kubernetes.
### OpenVSwitch
[OpenVSwitch](ovs-networking.md) is a somewhat more mature but also
complicated way to build an overlay network. This is endorsed by several of the
"Big Shops" for networking.
### Weave
[Weave](https://github.com/zettio/weave) is yet another way to build an overlay
network, primarily aiming at Docker integration.
### Calico
[Calico](https://github.com/Metaswitch/calico) uses BGP to enable real container
IPs.
## Other reading
The early design of the networking model and its rationale, and some future
plans are described in more detail in the [networking design
document](design/networking.md).
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/networking.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

220
docs/admin/node.md Normal file
View File

@@ -0,0 +1,220 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Node
**Table of Contents**
<!-- BEGIN MUNGE: GENERATED_TOC -->
- [Node](#node)
- [What is a node?](#what-is-a-node?)
- [Node Status](#node-status)
- [Node Addresses](#node-addresses)
- [Node Phase](#node-phase)
- [Node Condition](#node-condition)
- [Node Capacity](#node-capacity)
- [Node Info](#node-info)
- [Node Management](#node-management)
- [Node Controller](#node-controller)
- [Self-Registration of nodes](#self-registration-of-nodes)
- [Manual Node Administration](#manual-node-administration)
- [Node capacity](#node-capacity)
<!-- END MUNGE: GENERATED_TOC -->
## What is a node?
`Node` is a worker machine in Kubernetes, previously known as `Minion`. Node
may be a VM or physical machine, depending on the cluster. Each node has
the services necessary to run [Pods](pods.md) and be managed from the master
systems. The services include docker, kubelet and network proxy. See
[The Kubernetes Node](design/architecture.md#the-kubernetes-node) section in design
doc for more details.
## Node Status
Node status describes current status of a node. For now, there are the following
pieces of information:
### Node Addresses
<!--- TODO: this section is outdated. There is no HostIP field in the API,
but there are addresses of type InternalIP and ExternalIP -->
Host IP address is queried from cloudprovider and stored as part of node
status. If kubernetes runs without cloudprovider, node's ID will be used.
IP address can change, and there are different kind of IPs, e.g. public
IP, private IP, dynamic IP, ipv6, etc. It makes more sense to save it as
a status rather than spec.
### Node Phase
Node Phase is the current lifecycle phase of node, one of `Pending`,
`Running` and `Terminated`. Node Phase management is under development,
here is a brief overview: In kubernetes, node will be created in `Pending`
phase, until it is discovered and checked in by kubernetes, at which time,
kubernetes will mark it as `Running`. The end of a node's lifecycle is
`Terminated`. A terminated node will not receive any scheduling request,
and any running pods will be removed from the node.
Node with `Running` phase is necessary but not sufficient requirement for
scheduling Pods. For a node to be considered a scheduling candidate, it
must have appropriate conditions, see below.
### Node Condition
Node Condition describes the conditions of `Running` nodes. (However,
it can be present also when node status is different, e.g. `Unknown`)
Current valid condition is `Ready`. In the future, we plan to add more.
`Ready` means kubelet is healthy and ready to accept pods. Different
condition provides different level of understanding for node health.
Node condition is represented as a json object. For example,
the following conditions mean the node is in sane state:
```json
"conditions": [
{
"kind": "Ready",
"status": "True",
},
]
```
### Node Capacity
Describes the resources available on the node: CPUs, memory and the maximum
number of pods that can be scheduled on this node.
### Node Info
General information about the node, for instance kernel version, kubernetes version
(kubelet version, kube-proxy version), docker version (if used), OS name.
The information is gathered by Kubernetes from the node.
## Node Management
Unlike [Pods](pods.md) and [Services](services.md), a Node is not inherently
created by Kubernetes: it is either created from cloud providers like Google Compute Engine,
or from your physical or virtual machines. What this means is that when
Kubernetes creates a node, it only creates a representation for the node.
After creation, Kubernetes will check whether the node is valid or not.
For example, if you try to create a node from the following content:
```json
{
"kind": "Node",
"apiVersion": "v1",
"metadata": {
"name": "10.240.79.157",
"labels": {
"name": "my-first-k8s-node"
}
}
}
```
Kubernetes will create a Node object internally (the representation), and
validate the node by health checking based on the `metadata.name` field: we
assume `metadata.name` can be resolved. If the node is valid, i.e. all necessary
services are running, it is eligible to run a Pod; otherwise, it will be
ignored for any cluster activity, until it becomes valid. Note that Kubernetes
will keep invalid node unless explicitly deleted by client, and it will keep
checking to see if it becomes valid.
Currently, there are two agents that interacts with Kubernetes node interface:
Node Controller and Kube Admin.
### Node Controller
Node controller is a component in Kubernetes master which manages Node
objects. It performs two major functions: cluster-wide node synchronization
and single node life-cycle management.
Node controller has a sync loop that creates/deletes Nodes from Kubernetes
based on all matching VM instances listed from cloud provider. The sync period
can be controlled via flag `--node-sync-period`. If a new instance
gets created, Node Controller creates a representation for it. If an existing
instance gets deleted, Node Controller deletes the representation. Note however,
Node Controller is unable to provision the node for you, i.e. it won't install
any binary; therefore, to
join Kubernetes cluster, you as an admin need to make sure proper services are
running in the node. In the future, we plan to automatically provision some node
services.
### Self-Registration of nodes
When kubelet flag `--register-node` is true (the default), then the kubelet will attempt to
register itself with the API server. This is the preferred pattern, used by most distros.
For self-registration, the kubelet is started with the following options:
- `--api-servers=` tells the kubelet the location of the apiserver.
- `--kubeconfig` tells kubelet where to find credentials to authenticate itself to the apiserver.
- `--cloud-provider=` tells the kubelet how to talk to a cloud provider to read metadata about itself.
- `--register-node` tells the kubelet to create its own node resource.
Currently, any kubelet is authorized to create/modify any node resource, but in practice it only creates/modifies
its own. (In the future, we plan to limit authorization to only allow a kubelet to modify its own Node resource.)
#### Manual Node Administration
A cluster administrator can create and modify Node objects.
If the administrator wishes to create node objects manually, set kubelet flag
`--register-node=false`.
The administrator can modify Node resources (regardless of the setting of `--register-node`).
Modifications include setting labels on the Node, and marking it unschedulable.
Labels on nodes can be used in conjunction with node selectors on pods to control scheduling.
Making a node unscheduleable will prevent new pods from being scheduled to that
node, but will not affect any existing pods on the node. This is useful as a
preparatory step before a node reboot, etc. For example, to mark a node
unschedulable, run this command:
```
kubectl replace nodes 10.1.2.3 --patch='{"apiVersion": "v1", "unschedulable": true}'
```
### Node capacity
The capacity of the node (number of cpus and amount of memory) is part of the node resource.
Normally, nodes register themselves and report their capacity when creating the node resource. If
you are doing [manual node administration](#manual-node-administration), then you need to set node
capacity when adding a node.
The kubernetes scheduler ensures that there are enough resources for all the pods on a node. It
checks that the sum of the limits of containers on the node is less than the node capacity. It
includes all containers started by kubelet, but not containers started directly by docker, nor
processes not in containers.
If you want to explicitly reserve resources for non-Pod processes, you can create a placeholder
pod. Use the following template:
```
apiVersion: v1
kind: Pod
metadata:
name: resource-reserver
spec:
containers:
- name: sleep-forever
image: gcr.io/google_containers/pause:0.8.0
resources:
limits:
cpu: 100m
memory: 100Mi
```
Set the `cpu` and `memory` values to the amount of resources you want to reserve.
Place the file in the manifest directory (`--config=DIR` flag of kubelet). Do this
on each kubelet where you want to reserve resources.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/node.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

View File

@@ -0,0 +1,33 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Kubernetes OpenVSwitch GRE/VxLAN networking
This document describes how OpenVSwitch is used to setup networking between pods across nodes.
The tunnel type could be GRE or VxLAN. VxLAN is preferable when large scale isolation needs to be performed within the network.
![ovs-networking](ovs-networking.png "OVS Networking")
The vagrant setup in Kubernetes does the following:
The docker bridge is replaced with a brctl generated linux bridge (kbr0) with a 256 address space subnet. Basically, a node gets 10.244.x.0/24 subnet and docker is configured to use that bridge instead of the default docker0 bridge.
Also, an OVS bridge is created(obr0) and added as a port to the kbr0 bridge. All OVS bridges across all nodes are linked with GRE tunnels. So, each node has an outgoing GRE tunnel to all other nodes. It does not need to be a complete mesh really, just meshier the better. STP (spanning tree) mode is enabled in the bridges to prevent loops.
Routing rules enable any 10.244.0.0/16 target to become reachable via the OVS bridge connected with the tunnels.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/ovs-networking.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

View File

@@ -0,0 +1,120 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Administering Resource Quotas
Kubernetes can limit both the number of objects created in a namespace, and the
total amount of resources requested by pods in a namespace. This facilitates
sharing of a single Kubernetes cluster by several teams or tenants, each in
a namespace.
## Enabling Resource Quota
Resource Quota support is enabled by default for many kubernetes distributions. It is
enabled when the apiserver `--admission_control=` flag has `ResourceQuota` as
one of its arguments.
Resource Quota is enforced in a particular namespace when there is a
`ResourceQuota` object in that namespace. There should be at most one
`ResourceQuota` object in a namespace.
## Object Count Quota
The number of objects of a given type can be restricted. The following types
are supported:
| ResourceName | Description |
| ------------ | ----------- |
| pods | Total number of pods |
| services | Total number of services |
| replicationcontrollers | Total number of replication controllers |
| resourcequotas | Total number of resource quotas |
| secrets | Total number of secrets |
| persistentvolumeclaims | Total number of persistent volume claims |
For example, `pods` quota counts and enforces a maximum on the number of `pods`
created in a single namespace.
## Compute Resource Quota
The total number of objects of a given type can be restricted. The following types
are supported:
| ResourceName | Description |
| ------------ | ----------- |
| cpu | Total cpu limits of containers |
| memory | Total memory usage limits of containers
| `example.com/customresource` | Total of `resources.limits."example.com/customresource"` of containers |
For example, `cpu` quota sums up the `resources.limits.cpu` fields of every
container of every pod in the namespace, and enforces a maximum on that sum.
Any resource that is not part of core Kubernetes must follow the resource naming convention prescribed by Kubernetes.
This means the resource must have a fully-qualified name (i.e. mycompany.org/shinynewresource)
## Viewing and Setting Quotas
Kubectl supports creating, updating, and viewing quotas
```
$ kubectl namespace myspace
$ cat <<EOF > quota.json
{
"apiVersion": "v1",
"kind": "ResourceQuota",
"metadata": {
"name": "quota",
},
"spec": {
"hard": {
"memory": "1Gi",
"cpu": "20",
"pods": "10",
"services": "5",
"replicationcontrollers":"20",
"resourcequotas":"1",
},
}
}
EOF
$ kubectl create -f quota.json
$ kubectl get quota
NAME
quota
$ kubectl describe quota quota
Name: quota
Resource Used Hard
-------- ---- ----
cpu 0m 20
memory 0 1Gi
pods 5 10
replicationcontrollers 5 20
resourcequotas 1 1
services 3 5
```
## Quota and Cluster Capacity
Resource Quota objects are independent of the Cluster Capacity. They are
expressed in absolute units.
Sometimes more complex policies may be desired, such as:
- proportionally divide total cluster resources among several teams.
- allow each tenant to grow resource usage as needed, but have a generous
limit to prevent accidental resource exhaustion.
Such policies could be implemented using ResourceQuota as a building-block, by
writing a 'controller' which watches the quota usage and adjusts the quota
hard limits of each namespace.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/resource_quota_admin.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->

117
docs/admin/salt.md Normal file
View File

@@ -0,0 +1,117 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
<!-- BEGIN STRIP_FOR_RELEASE -->
<h1>*** PLEASE NOTE: This document applies to the HEAD of the source
tree only. If you are using a released version of Kubernetes, you almost
certainly want the docs that go with that version.</h1>
<strong>Documentation for specific releases can be found at
[releases.k8s.io](http://releases.k8s.io).</strong>
<!-- END STRIP_FOR_RELEASE -->
<!-- END MUNGE: UNVERSIONED_WARNING -->
# Using Salt to configure Kubernetes
The Kubernetes cluster can be configured using Salt.
The Salt scripts are shared across multiple hosting providers, so it's important to understand some background information prior to making a modification to ensure your changes do not break hosting Kubernetes across multiple environments. Depending on where you host your Kubernetes cluster, you may be using different operating systems and different networking configurations. As a result, it's important to understand some background information before making Salt changes in order to minimize introducing failures for other hosting providers.
## Salt cluster setup
The **salt-master** service runs on the kubernetes-master node [(except on the default GCE setup)](#standalone-salt-configuration-on-gce).
The **salt-minion** service runs on the kubernetes-master node and each kubernetes-minion node in the cluster.
Each salt-minion service is configured to interact with the **salt-master** service hosted on the kubernetes-master via the **master.conf** file [(except on GCE)](#standalone-salt-configuration-on-gce).
```
[root@kubernetes-master] $ cat /etc/salt/minion.d/master.conf
master: kubernetes-master
```
The salt-master is contacted by each salt-minion and depending upon the machine information presented, the salt-master will provision the machine as either a kubernetes-master or kubernetes-minion with all the required capabilities needed to run Kubernetes.
If you are running the Vagrant based environment, the **salt-api** service is running on the kubernetes-master. It is configured to enable the vagrant user to introspect the salt cluster in order to find out about machines in the Vagrant environment via a REST API.
## Standalone Salt Configuration on GCE
On GCE, the master and nodes are all configured as [standalone minions](http://docs.saltstack.com/en/latest/topics/tutorials/standalone_minion.html). The configuration for each VM is derived from the VM's [instance metadata](https://cloud.google.com/compute/docs/metadata) and then stored in Salt grains (`/etc/salt/minion.d/grains.conf`) and pillars (`/srv/salt-overlay/pillar/cluster-params.sls`) that local Salt uses to enforce state.
All remaining sections that refer to master/minion setups should be ignored for GCE. One fallout of the GCE setup is that the Salt mine doesn't exist - there is no sharing of configuration amongst nodes.
## Salt security
*(Not applicable on default GCE setup.)*
Security is not enabled on the salt-master, and the salt-master is configured to auto-accept incoming requests from minions. It is not recommended to use this security configuration in production environments without deeper study. (In some environments this isn't as bad as it might sound if the salt master port isn't externally accessible and you trust everyone on your network.)
```
[root@kubernetes-master] $ cat /etc/salt/master.d/auto-accept.conf
open_mode: True
auto_accept: True
```
## Salt minion configuration
Each minion in the salt cluster has an associated configuration that instructs the salt-master how to provision the required resources on the machine.
An example file is presented below using the Vagrant based environment.
```
[root@kubernetes-master] $ cat /etc/salt/minion.d/grains.conf
grains:
etcd_servers: $MASTER_IP
cloud_provider: vagrant
roles:
- kubernetes-master
```
Each hosting environment has a slightly different grains.conf file that is used to build conditional logic where required in the Salt files.
The following enumerates the set of defined key/value pairs that are supported today. If you add new ones, please make sure to update this list.
Key | Value
------------- | -------------
`api_servers` | (Optional) The IP address / host name where a kubelet can get read-only access to kube-apiserver
`cbr-cidr` | (Optional) The minion IP address range used for the docker container bridge.
`cloud` | (Optional) Which IaaS platform is used to host kubernetes, *gce*, *azure*, *aws*, *vagrant*
`etcd_servers` | (Optional) Comma-delimited list of IP addresses the kube-apiserver and kubelet use to reach etcd. Uses the IP of the first machine in the kubernetes_master role, or 127.0.0.1 on GCE.
`hostnamef` | (Optional) The full host name of the machine, i.e. uname -n
`node_ip` | (Optional) The IP address to use to address this node
`hostname_override` | (Optional) Mapped to the kubelet hostname_override
`network_mode` | (Optional) Networking model to use among nodes: *openvswitch*
`networkInterfaceName` | (Optional) Networking interface to use to bind addresses, default value *eth0*
`publicAddressOverride` | (Optional) The IP address the kube-apiserver should use to bind against for external read-only access
`roles` | (Required) 1. `kubernetes-master` means this machine is the master in the kubernetes cluster. 2. `kubernetes-pool` means this machine is a kubernetes-minion. Depending on the role, the Salt scripts will provision different resources on the machine.
These keys may be leveraged by the Salt sls files to branch behavior.
In addition, a cluster may be running a Debian based operating system or Red Hat based operating system (Centos, Fedora, RHEL, etc.). As a result, its important to sometimes distinguish behavior based on operating system using if branches like the following.
```
{% if grains['os_family'] == 'RedHat' %}
// something specific to a RedHat environment (Centos, Fedora, RHEL) where you may use yum, systemd, etc.
{% else %}
// something specific to Debian environment (apt-get, initd)
{% endif %}
```
## Best Practices
1. When configuring default arguments for processes, its best to avoid the use of EnvironmentFiles (Systemd in Red Hat environments) or init.d files (Debian distributions) to hold default values that should be common across operating system environments. This helps keep our Salt template files easy to understand for editors that may not be familiar with the particulars of each distribution.
## Future enhancements (Networking)
Per pod IP configuration is provider specific, so when making networking changes, its important to sand-box these as all providers may not use the same mechanisms (iptables, openvswitch, etc.)
We should define a grains.conf key that captures more specifically what network configuration environment is being used to avoid future confusion across providers.
## Further reading
The [cluster/saltbase](../cluster/saltbase/) tree has more details on the current SaltStack configuration.
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/salt.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->