mirror of
				https://github.com/optim-enterprises-bv/kubernetes.git
				synced 2025-11-04 04:08:16 +00:00 
			
		
		
		
	Update docs about new Services work
This commit is contained in:
		
							
								
								
									
										250
									
								
								docs/services.md
									
									
									
									
									
								
							
							
						
						
									
										250
									
								
								docs/services.md
									
									
									
									
									
								
							@@ -16,14 +16,15 @@ Enter `Services`.
 | 
			
		||||
 | 
			
		||||
A Kubernetes `Service` is an abstraction which defines a logical set of `Pods`
 | 
			
		||||
and a policy by which to access them - sometimes called a micro-service.  The
 | 
			
		||||
set of `Pods` targeted by a `Service` is determined by a [`Label
 | 
			
		||||
Selector`](labels.md).
 | 
			
		||||
set of `Pods` targeted by a `Service` is (usually) determined by a [`Label
 | 
			
		||||
Selector`](labels.md) (see below for why you might want a `Service` without a
 | 
			
		||||
selector).
 | 
			
		||||
 | 
			
		||||
As an example, consider an image-processing backend which is running with 3
 | 
			
		||||
replicas.  Those replicas are fungible - frontends do not care which backend
 | 
			
		||||
they use.  While the actual `Pods` that compose the backend set may change, the
 | 
			
		||||
frontend clients should not need to manage that themselves.  The `Service`
 | 
			
		||||
abstraction enables this decoupling.
 | 
			
		||||
frontend clients should not need to be aware of that or keep track of the list
 | 
			
		||||
of backends themselves.  The `Service` abstraction enables this decoupling.
 | 
			
		||||
 | 
			
		||||
For Kubernetes-native applications, Kubernetes offers a simple `Endpoints` API
 | 
			
		||||
that is updated whenever the set of `Pods` in a `Service` changes.  For
 | 
			
		||||
@@ -37,16 +38,12 @@ REST objects, a `Service` definition can be POSTed to the apiserver to create a
 | 
			
		||||
new instance.  For example, suppose you have a set of `Pods` that each expose
 | 
			
		||||
port 9376 and carry a label "app=MyApp".
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
```json
 | 
			
		||||
{
 | 
			
		||||
    "kind": "Service",
 | 
			
		||||
    "apiVersion": "v1beta3",
 | 
			
		||||
    "metadata": {
 | 
			
		||||
        "name": "my-service",
 | 
			
		||||
        "labels": {
 | 
			
		||||
            "environment": "testing"
 | 
			
		||||
        }
 | 
			
		||||
    },
 | 
			
		||||
    "spec": {
 | 
			
		||||
        "selector": {
 | 
			
		||||
@@ -64,22 +61,34 @@ port 9376 and carry a label "app=MyApp".
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
This specification will create a new `Service` object named "my-service" which
 | 
			
		||||
targets TCP port 9376 on any `Pod` with the "app=MyApp" label.  Every `Service`
 | 
			
		||||
is also assigned a virtual IP address (called the "portal IP"), which is used by
 | 
			
		||||
the service proxies (see below).  The `Service`'s selector will be evaluated
 | 
			
		||||
continuously and the results will be posted in an `Endpoints` object also named
 | 
			
		||||
"my-service".
 | 
			
		||||
targets TCP port 9376 on any `Pod` with the "app=MyApp" label.  This `Service`
 | 
			
		||||
will also be assigned an IP address (sometimes called the "portal IP"), which
 | 
			
		||||
is used by the service proxies (see below).  The `Service`'s selector will be
 | 
			
		||||
evaluated continuously and the results will be posted in an `Endpoints` object
 | 
			
		||||
also named "my-service".
 | 
			
		||||
 | 
			
		||||
Note that a `Service` can map an incoming port to any `targetPort`.  By default
 | 
			
		||||
the `targetPort` is the same as the `port` field.  Perhaps more interesting is
 | 
			
		||||
that `targetPort` can be a string, referring to the name of a port in the
 | 
			
		||||
backend `Pod`s.  The actual port number assigned to that name can be different
 | 
			
		||||
in each backend `Pod`. This offers a lot of flexibility for deploying and
 | 
			
		||||
evolving your `Service`s.  For example, you can change the port number that
 | 
			
		||||
pods expose in the next version of your backend software, without breaking
 | 
			
		||||
clients.
 | 
			
		||||
 | 
			
		||||
Kubernetes `Service`s support `TCP` and `UDP` for protocols.  The default
 | 
			
		||||
is `TCP`.
 | 
			
		||||
 | 
			
		||||
### Services without selectors
 | 
			
		||||
 | 
			
		||||
Services, in addition to providing abstractions to access `Pods`, can also
 | 
			
		||||
abstract any kind of backend.  For example:
 | 
			
		||||
Services generally abstract access to Kubernetes `Pods`, but they can also
 | 
			
		||||
abstract other kinds of backends.  For example:
 | 
			
		||||
  - you want to have an external database cluster in production, but in test
 | 
			
		||||
    you use your own databases.
 | 
			
		||||
    you use your own databases
 | 
			
		||||
  - you want to point your service to a service in another
 | 
			
		||||
    [`Namespace`](namespaces.md) or on another cluster.
 | 
			
		||||
    [`Namespace`](namespaces.md) or on another cluster
 | 
			
		||||
  - you are migrating your workload to Kubernetes and some of your backends run
 | 
			
		||||
    outside of Kubernetes.
 | 
			
		||||
    outside of Kubernetes
 | 
			
		||||
 | 
			
		||||
In any of these scenarios you can define a service without a selector:
 | 
			
		||||
 | 
			
		||||
@@ -102,7 +111,8 @@ In any of these scenarios you can define a service without a selector:
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Then you can manually map the service to a specific endpoint(s):
 | 
			
		||||
Because this has no selector, the corresponding `Endpoints` object will not be
 | 
			
		||||
created. You can manually map the service to your own specific endpoints:
 | 
			
		||||
 | 
			
		||||
```json
 | 
			
		||||
{
 | 
			
		||||
@@ -135,8 +145,8 @@ watches the Kubernetes master for the addition and removal of `Service`
 | 
			
		||||
and `Endpoints` objects. For each `Service` it opens a port (random) on the
 | 
			
		||||
local node.  Any connections made to that port will be proxied to one of the
 | 
			
		||||
corresponding backend `Pods`.  Which backend to use is decided based on the
 | 
			
		||||
AffinityPolicy of the `Service`.  Lastly, it installs iptables rules which
 | 
			
		||||
capture traffic to the `Service`'s `Port` on the `Service`'s portal IP (which
 | 
			
		||||
`SessionAffinity` of the `Service`.  Lastly, it installs iptables rules which
 | 
			
		||||
capture traffic to the `Service`'s `Port` on the `Service`'s cluster IP (which
 | 
			
		||||
is entirely virtual) and redirects that traffic to the previously described
 | 
			
		||||
port.
 | 
			
		||||
 | 
			
		||||
@@ -146,12 +156,59 @@ appropriate backend without the clients knowing anything about Kubernetes or
 | 
			
		||||
 | 
			
		||||

 | 
			
		||||
 | 
			
		||||
By default, the choice of backend is random.  Client-IP-based session affinity
 | 
			
		||||
can be selected by setting `service.spec.sessionAffinity` to  `"ClientIP"`.
 | 
			
		||||
By default, the choice of backend is random.  Client-IP based session affinity
 | 
			
		||||
can be selected by setting `service.spec.sessionAffinity` to `"ClientIP"` (the
 | 
			
		||||
default is `"None"`).
 | 
			
		||||
 | 
			
		||||
As of Kubernetes 1.0, `Service`s are a "layer 3" (TCP/UDP over IP) construct.  We do not
 | 
			
		||||
yet have a concept of "layer 7" (HTTP) services.
 | 
			
		||||
 | 
			
		||||
## Multi-Port Services
 | 
			
		||||
 | 
			
		||||
Many `Service`s need to expose more than one port.  For this case, Kubernetes
 | 
			
		||||
supports multiple port definitions on a `Service` object.  When using multiple
 | 
			
		||||
ports you must give all of your ports names, so that endpoints can be
 | 
			
		||||
disambiguated.  For example:
 | 
			
		||||
 | 
			
		||||
```json
 | 
			
		||||
{
 | 
			
		||||
    "kind": "Service",
 | 
			
		||||
    "apiVersion": "v1beta3",
 | 
			
		||||
    "metadata": {
 | 
			
		||||
        "name": "my-service",
 | 
			
		||||
    },
 | 
			
		||||
    "spec": {
 | 
			
		||||
        "selector": {
 | 
			
		||||
            "app": "MyApp"
 | 
			
		||||
        },
 | 
			
		||||
        "ports": [
 | 
			
		||||
            {
 | 
			
		||||
                "name": "http",
 | 
			
		||||
                "protocol": "TCP",
 | 
			
		||||
                "port": 80,
 | 
			
		||||
                "targetPort": 9376
 | 
			
		||||
            },
 | 
			
		||||
            {
 | 
			
		||||
                "name": "https",
 | 
			
		||||
                "protocol": "TCP",
 | 
			
		||||
                "port": 443,
 | 
			
		||||
                "targetPort": 9377
 | 
			
		||||
            }
 | 
			
		||||
        ]
 | 
			
		||||
    }
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Choosing your own PortalIP address
 | 
			
		||||
 | 
			
		||||
A user can specify their own `PortalIP` address as part of a `Service` creation
 | 
			
		||||
request.  For example, if they already have an existing DNS entry that they
 | 
			
		||||
wish to replace, or legacy systems that are configured for a specific IP
 | 
			
		||||
address and difficult to re-configure.  The `PortalIP` address that a user
 | 
			
		||||
chooses must be a valid IP address and within the portal_net CIDR range that is
 | 
			
		||||
specified by flag to the API server.  If the PortalIP value is invalid, the
 | 
			
		||||
apiserver returns a 422 HTTP status code to indicate that the value is invalid.
 | 
			
		||||
 | 
			
		||||
### Why not use round-robin DNS?
 | 
			
		||||
 | 
			
		||||
A question that pops up every now and then is why we do all this stuff with
 | 
			
		||||
@@ -208,66 +265,104 @@ DNS records for each.  If DNS has been enabled throughout the cluster then all
 | 
			
		||||
For example, if you have a `Service` called "my-service" in Kubernetes
 | 
			
		||||
`Namespace` "my-ns" a DNS record for "my-service.my-ns" is created.  `Pods`
 | 
			
		||||
which exist in the "my-ns" `Namespace` should be able to find it by simply doing
 | 
			
		||||
a name lookup for "my-service".  `Pods` which exist in other `Namespaces` must
 | 
			
		||||
a name lookup for "my-service".  `Pods` which exist in other `Namespace`s must
 | 
			
		||||
qualify the name as "my-service.my-ns".  The result of these name lookups is the
 | 
			
		||||
virtual portal IP.
 | 
			
		||||
cluster IP.
 | 
			
		||||
 | 
			
		||||
We will soon add DNS support for multi-port `Service`s in the form of SRV
 | 
			
		||||
records.
 | 
			
		||||
 | 
			
		||||
## Headless Services
 | 
			
		||||
 | 
			
		||||
Sometimes you don't need or want a single virtual IP.  In this case, you can
 | 
			
		||||
create "headless" services by specifying "None" for the PortalIP.  For such
 | 
			
		||||
services, a virtual IP is not allocated, DNS is not configured (this will be
 | 
			
		||||
fixed), and service-specific environment variables for pods are not created.
 | 
			
		||||
Additionally, the kube proxy does not handle these services and there is no
 | 
			
		||||
load balancing or proxying done by the platform for them. The endpoints
 | 
			
		||||
controller will still create endpoint records in the API for such services.
 | 
			
		||||
These services also take advantage of any UI, readiness probes, etc. that are
 | 
			
		||||
applicable for services in general.
 | 
			
		||||
Sometimes you don't need or want a single service IP.  In this case, you can
 | 
			
		||||
create "headless" services by specifying `"None"` for the `PortalIP`.  For such
 | 
			
		||||
`Service`s, a cluster IP is not allocated and service-specific environment
 | 
			
		||||
variables for `Pod`s are not created.  DNS is configured to return multiple A
 | 
			
		||||
records (addresses) for the `Service` name, which point directly to the `Pod`s
 | 
			
		||||
backing the `Service`.  Additionally, the kube proxy does not handle these
 | 
			
		||||
services and there is no load balancing or proxying done by the platform for
 | 
			
		||||
them.  The endpoints controller will still create `Endpoints` records in the
 | 
			
		||||
API.
 | 
			
		||||
 | 
			
		||||
The tradeoff for a developer would be whether to couple to the Kubernetes API
 | 
			
		||||
or to a particular discovery system. Applications can still use a
 | 
			
		||||
self-registration pattern and adapters for other discovery systems could be
 | 
			
		||||
built upon this API, as well.
 | 
			
		||||
This option allows developers to reduce coupling to the Kubernetes system, if
 | 
			
		||||
they desire, but leaves them freedom to do discovery in their own way.
 | 
			
		||||
Applications can still use a self-registration pattern and adapters for other
 | 
			
		||||
discovery systems could easily be built upon this API.
 | 
			
		||||
 | 
			
		||||
## External Services
 | 
			
		||||
 | 
			
		||||
For some parts of your application (e.g. frontends) you may want to expose a
 | 
			
		||||
Service onto an external (outside of your cluster, maybe public internet) IP
 | 
			
		||||
address.
 | 
			
		||||
address.  Kubernetes supports two ways of doing this: `NodePort`s and
 | 
			
		||||
`LoadBalancer`s.
 | 
			
		||||
 | 
			
		||||
On cloud providers which support external load balancers, this should be as
 | 
			
		||||
simple as setting the `createExternalLoadBalancer` flag of the `Service` spec
 | 
			
		||||
to `true`.  This sets up a cloud-specific load balancer and populates the
 | 
			
		||||
`publicIPs` field of the spec (see below).  Traffic from the external load 
 | 
			
		||||
balancer will be directed at the backend `Pods`, though exactly how that works 
 | 
			
		||||
depends on the cloud provider.
 | 
			
		||||
Every `Service` has a `Type` field which defines how the `Service` can be
 | 
			
		||||
accessed.  Valid values for this field are:
 | 
			
		||||
   - ClusterIP: use a cluster-internal IP (portal) only - this is the default
 | 
			
		||||
   - NodePort: use a cluster IP, but also expose the service on a port on each
 | 
			
		||||
     node of the cluster (the same port on each)
 | 
			
		||||
   - LoadBalancer: use a ClusterIP and a NodePort, but also ask the cloud
 | 
			
		||||
     provider for a load balancer which forwards to the `Service`
 | 
			
		||||
 | 
			
		||||
For cloud providers which do not support external load balancers, there is
 | 
			
		||||
another approach that is a bit more "do-it-yourself" - the `publicIPs` field.
 | 
			
		||||
Any address you put into the `publicIPs` array will be handled the same as the
 | 
			
		||||
portal IP - the kube-proxy will install iptables rules which proxy traffic
 | 
			
		||||
through to the backends.  You are then responsible for ensuring that traffic to
 | 
			
		||||
those IPs gets sent to one or more Kubernetes `Nodes`.  As long as the traffic
 | 
			
		||||
arrives at a Node, it will be be subject to the iptables rules.
 | 
			
		||||
Note that while `NodePort`s can be TCP or UDP, `LoadBalancer`s only support TCP
 | 
			
		||||
as of Kubernetes 1.0.
 | 
			
		||||
 | 
			
		||||
An common situation is when a `Node` has both internal and an external network
 | 
			
		||||
interfaces.  If you put that `Node`'s external IP in `publicIPs`, you can
 | 
			
		||||
then aim traffic at the `Service` port on that `Node` and it will be proxied to
 | 
			
		||||
the backends.  If you set all `Node`s' external IPs as `publicIPs` you can then
 | 
			
		||||
reach a `Service` through any `Node`, which means you can build your own
 | 
			
		||||
load-balancer or even just use DNS round-robin.  The downside to this approach
 | 
			
		||||
is that all such `Service`s share a port space - only one of them can have port
 | 
			
		||||
80, for example.
 | 
			
		||||
### Type = NodePort
 | 
			
		||||
 | 
			
		||||
## Choosing your own PortalIP address
 | 
			
		||||
If you set the `type` field to `"NodePort"`, the Kubernetes master will
 | 
			
		||||
allocate you a port (from a flag-configured range) on each node for each port
 | 
			
		||||
exposed by your `Service`.  That port will be reported in your `Service`'s
 | 
			
		||||
`spec.ports[*].nodePort` field.  If you specify a value in that field, the
 | 
			
		||||
system will allocate you that port or else will fail the API transaction.
 | 
			
		||||
 | 
			
		||||
A user can specify their own `PortalIP` address as part of a service creation
 | 
			
		||||
request.  For example, if they already have an existing DNS entry that they
 | 
			
		||||
wish to replace, or legacy systems that are configured for a specific IP
 | 
			
		||||
address and difficult to re-configure.  The `PortalIP` address that a user
 | 
			
		||||
chooses must be a valid IP address and within the portal net CIDR range that is
 | 
			
		||||
specified by flag to the API server.  If the PortalIP value is invalid, the
 | 
			
		||||
apiserver returns a 422 HTTP status code to indicate that the value is invalid.
 | 
			
		||||
This gives developers the freedom to set up their own load balancers, to
 | 
			
		||||
configure cloud environments that are not fully supported by Kubernetes, or
 | 
			
		||||
even to just expose one or more nodes' IPs directly.
 | 
			
		||||
 | 
			
		||||
### Type = LoadBalancer
 | 
			
		||||
 | 
			
		||||
On cloud providers which support external load balancers, setting the `type`
 | 
			
		||||
field to `"LoadBalancer"` will provision a load balancer for your `Service`.
 | 
			
		||||
The actual creation of the load balancer happens asynchronously, and
 | 
			
		||||
information about the provisioned balancer will be published in the `Service`'s
 | 
			
		||||
`status.loadBalancer` field.  For example:
 | 
			
		||||
 | 
			
		||||
```json
 | 
			
		||||
{
 | 
			
		||||
    "kind": "Service",
 | 
			
		||||
    "apiVersion": "v1beta3",
 | 
			
		||||
    "metadata": {
 | 
			
		||||
        "name": "my-service",
 | 
			
		||||
    },
 | 
			
		||||
    "spec": {
 | 
			
		||||
        "selector": {
 | 
			
		||||
            "app": "MyApp"
 | 
			
		||||
        },
 | 
			
		||||
        "ports": [
 | 
			
		||||
            {
 | 
			
		||||
                "protocol": "TCP",
 | 
			
		||||
                "port": 80,
 | 
			
		||||
                "targetPort": 9376,
 | 
			
		||||
                "nodePort": 30061
 | 
			
		||||
            }
 | 
			
		||||
        ],
 | 
			
		||||
        "portalIP": "10.0.171.239",
 | 
			
		||||
        "type": "LoadBalancer"
 | 
			
		||||
    },
 | 
			
		||||
    "status": {
 | 
			
		||||
        "loadBalancer": {
 | 
			
		||||
            "ingress": [
 | 
			
		||||
                {
 | 
			
		||||
                    "ip": "146.148.47.155"
 | 
			
		||||
                }
 | 
			
		||||
            ]
 | 
			
		||||
        }
 | 
			
		||||
    }
 | 
			
		||||
}
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Traffic from the external load balancer will be directed at the backend `Pods`,
 | 
			
		||||
though exactly how that works depends on the cloud provider.
 | 
			
		||||
 | 
			
		||||
## Shortcomings
 | 
			
		||||
 | 
			
		||||
@@ -280,6 +375,13 @@ details.
 | 
			
		||||
Using the kube-proxy obscures the source-IP of a packet accessing a `Service`.
 | 
			
		||||
This makes some kinds of firewalling impossible.
 | 
			
		||||
 | 
			
		||||
LoadBalancers only support TCP, not UDP.
 | 
			
		||||
 | 
			
		||||
The `Type` field is designed as nested functionality - each level adds to the
 | 
			
		||||
previous.  This is not strictly required on all cloud providers (e.g. GCE does
 | 
			
		||||
not need to allocate a `NodePort` to make `LoadBalancer` work, but AWS does)
 | 
			
		||||
but the current API requires it.
 | 
			
		||||
 | 
			
		||||
## Future work
 | 
			
		||||
 | 
			
		||||
In the future we envision that the proxy policy can become more nuanced than
 | 
			
		||||
@@ -293,11 +395,11 @@ eliminate userspace proxying in favor of doing it all in iptables.  This should
 | 
			
		||||
perform better and fix the source-IP obfuscation, though is less flexible than
 | 
			
		||||
arbitrary userspace code.
 | 
			
		||||
 | 
			
		||||
We hope to make the situation around external load balancers and public IPs
 | 
			
		||||
simpler and easier to comprehend.
 | 
			
		||||
 | 
			
		||||
We intend to have first-class support for L7 (HTTP) `Service`s.
 | 
			
		||||
 | 
			
		||||
We intend to have more flexible ingress modes for `Service`s which encompass
 | 
			
		||||
the current `ClusterIP`, `NodePort`, and `LoadBalancer` modes and more.
 | 
			
		||||
 | 
			
		||||
## The gory details of portals
 | 
			
		||||
 | 
			
		||||
The previous information should be sufficient for many people who just want to
 | 
			
		||||
@@ -348,9 +450,9 @@ When a client connects to the portal the iptables rule kicks in, and redirects
 | 
			
		||||
the packets to the `Service proxy`'s own port.  The `Service proxy` chooses a
 | 
			
		||||
backend, and starts proxying traffic from the client to the backend.
 | 
			
		||||
 | 
			
		||||
This means that `Service` owners can choose any `Service` port they want without
 | 
			
		||||
risk of collision.  Clients can simply connect to an IP and port, without
 | 
			
		||||
being aware of which `Pods` they are actually accessing.
 | 
			
		||||
This means that `Service` owners can choose any port they want without risk of
 | 
			
		||||
collision.  Clients can simply connect to an IP and port, without being aware
 | 
			
		||||
of which `Pod`s they are actually accessing.
 | 
			
		||||
 | 
			
		||||

 | 
			
		||||
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user