mirror of
				https://github.com/optim-enterprises-bv/kubernetes.git
				synced 2025-11-03 19:58:17 +00:00 
			
		
		
		
	Proposal for introducing Protobuf serialization
This commit is contained in:
		
							
								
								
									
										509
									
								
								docs/proposals/protobuf.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										509
									
								
								docs/proposals/protobuf.md
									
									
									
									
									
										Normal file
									
								
							@@ -0,0 +1,509 @@
 | 
				
			|||||||
 | 
					<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!-- BEGIN STRIP_FOR_RELEASE -->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | 
				
			||||||
 | 
					     width="25" height="25">
 | 
				
			||||||
 | 
					<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | 
				
			||||||
 | 
					     width="25" height="25">
 | 
				
			||||||
 | 
					<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | 
				
			||||||
 | 
					     width="25" height="25">
 | 
				
			||||||
 | 
					<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | 
				
			||||||
 | 
					     width="25" height="25">
 | 
				
			||||||
 | 
					<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
 | 
				
			||||||
 | 
					     width="25" height="25">
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you are using a released version of Kubernetes, you should
 | 
				
			||||||
 | 
					refer to the docs that go with that version.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Documentation for other releases can be found at
 | 
				
			||||||
 | 
					[releases.k8s.io](http://releases.k8s.io).
 | 
				
			||||||
 | 
					</strong>
 | 
				
			||||||
 | 
					--
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!-- END STRIP_FOR_RELEASE -->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!-- END MUNGE: UNVERSIONED_WARNING -->
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Protobuf serialization and internal storage
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					@smarterclayton
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					March 2016
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Proposal and Motivation
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The Kubernetes API server is a "dumb server" which offers storage, versioning,
 | 
				
			||||||
 | 
					validation, update, and watch semantics on API resources. In a large cluster
 | 
				
			||||||
 | 
					the API server must efficiently retrieve, store, and deliver large numbers
 | 
				
			||||||
 | 
					of coarse-grained objects to many clients. In addition, Kubernetes traffic is
 | 
				
			||||||
 | 
					heavily biased towards intra-cluster traffic - as much as 90% of the requests
 | 
				
			||||||
 | 
					served by the APIs are for internal cluster components like nodes, controllers,
 | 
				
			||||||
 | 
					and proxies. The primary format for intercluster API communication is JSON
 | 
				
			||||||
 | 
					today for ease of client construction.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					At the current time, the latency of reaction to change in the cluster is
 | 
				
			||||||
 | 
					dominated by the time required to load objects from persistent store (etcd),
 | 
				
			||||||
 | 
					convert them to an output version, serialize them JSON over the network, and
 | 
				
			||||||
 | 
					then perform the reverse operation in clients. The cost of
 | 
				
			||||||
 | 
					serialization/deserialization and the size of the bytes on the wire, as well
 | 
				
			||||||
 | 
					as the memory garbage created during those operations, dominate the CPU and
 | 
				
			||||||
 | 
					network usage of the API servers.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In order to reach clusters of 10k nodes, we need roughly an order of magnitude
 | 
				
			||||||
 | 
					efficiency improvement in a number of areas of the cluster, starting with the
 | 
				
			||||||
 | 
					masters but also including API clients like controllers, kubelets, and node
 | 
				
			||||||
 | 
					proxies.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We propose to introduce a Protobuf serialization for all common API objects
 | 
				
			||||||
 | 
					that can optionally be used by intra-cluster components. Experiments have
 | 
				
			||||||
 | 
					demonstrated a 10x reduction in CPU use during serialization and deserialization,
 | 
				
			||||||
 | 
					a 2x reduction in size in bytes on the wire, and a 6-9x reduction in the amount
 | 
				
			||||||
 | 
					of objects created on the heap during serialization. The Protobuf schema
 | 
				
			||||||
 | 
					for each object will be automatically generated from the external API Go structs
 | 
				
			||||||
 | 
					we use to serialize to JSON.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Benchmarking showed that the time spent on the server in a typical GET
 | 
				
			||||||
 | 
					resembles:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					          etcd -> decode -> defaulting -> convert to internal ->
 | 
				
			||||||
 | 
					    JSON          50us      5us           15us
 | 
				
			||||||
 | 
					    Proto         5us
 | 
				
			||||||
 | 
					    JSON          150allocs               80allocs
 | 
				
			||||||
 | 
					    Proto         100allocs
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					          process -> convert to external -> encode -> client
 | 
				
			||||||
 | 
					    JSON             15us                   40us
 | 
				
			||||||
 | 
					    Proto                                   5us
 | 
				
			||||||
 | 
					    JSON             80allocs               100allocs
 | 
				
			||||||
 | 
					    Proto                                   4allocs
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					 Protobuf has a huge benefit on encoding because it does not need to allocate
 | 
				
			||||||
 | 
					 temporary objects, just one large buffer. Changing to protobuf moves our
 | 
				
			||||||
 | 
					 hotspot back to conversion, not serialization.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Design Points
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Generate Protobuf schema from Go structs (like we do for JSON) to avoid
 | 
				
			||||||
 | 
					  manual schema update and drift
 | 
				
			||||||
 | 
					* Generate Protobuf schema that is field equivalent to the JSON fields (no
 | 
				
			||||||
 | 
					  special types or enumerations), reducing drift for clients across formats.
 | 
				
			||||||
 | 
					* Follow our existing API versioning rules (backwards compatible in major
 | 
				
			||||||
 | 
					  API versions, breaking changes across major versions) by creating one
 | 
				
			||||||
 | 
					  Protobuf schema per API type.
 | 
				
			||||||
 | 
					* Continue to use the existing REST API patterns but offer an alternative
 | 
				
			||||||
 | 
					  serialization, which means existing client and server tooling can remain
 | 
				
			||||||
 | 
					  the same while benefiting from faster decoding.
 | 
				
			||||||
 | 
					* Protobuf objects on disk or in etcd will need to be self identifying at
 | 
				
			||||||
 | 
					  rest, like JSON, in order for backwards compatibility in storage to work,
 | 
				
			||||||
 | 
					  so we must add an envelope with apiVersion and kind to wrap the nested
 | 
				
			||||||
 | 
					  object, and make the data format recognizable to clients.
 | 
				
			||||||
 | 
					* Use the [gogo-protobuf](https://github.com/gogo/protobuf) Golang library to generate marshal/unmarshal
 | 
				
			||||||
 | 
					  operations, allowing us to bypass the expensive reflection used by the
 | 
				
			||||||
 | 
					  golang JSOn operation
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Alternatives
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* We considered JSON compression to reduce size on wire, but that does not
 | 
				
			||||||
 | 
					  reduce the amount of memory garbage created during serialization and
 | 
				
			||||||
 | 
					  deserialization.
 | 
				
			||||||
 | 
					* More efficient formats like Msgpack were considered, but they only offer
 | 
				
			||||||
 | 
					  2x speed up vs the 10x observed for Protobuf
 | 
				
			||||||
 | 
					* gRPC was considered, but is a larger change that requires more core
 | 
				
			||||||
 | 
					  refactoring. This approach does not eliminate the possibility of switching
 | 
				
			||||||
 | 
					  to gRPC in the future.
 | 
				
			||||||
 | 
					* We considered attempting to improve JSON serialization, but the cost of
 | 
				
			||||||
 | 
					  implementing a more efficient serializer library than ugorji is
 | 
				
			||||||
 | 
					  significantly higher than creating a protobuf schema from our Go structs.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Schema
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The Protobuf schema for each API group and version will be generated from
 | 
				
			||||||
 | 
					the objects in that API group and version. The schema will be named using
 | 
				
			||||||
 | 
					the package identifier of the Go package, i.e.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    k8s.io/kubernetes/pkg/api/v1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Each top level object will be generated as a Protobuf message, i.e.:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    type Pod struct { ... }
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    message Pod {}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Since the Go structs are designed to be serialized to JSON (with only the
 | 
				
			||||||
 | 
					int, string, bool, map, and array primitive types), we will use the
 | 
				
			||||||
 | 
					canonical JSON serialization as the protobuf field type wherever possible,
 | 
				
			||||||
 | 
					i.e.:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    JSON      Protobuf
 | 
				
			||||||
 | 
					    string -> string
 | 
				
			||||||
 | 
					    int    -> varint
 | 
				
			||||||
 | 
					    bool   -> bool
 | 
				
			||||||
 | 
					    array  -> repeating message|primitive
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We disallow the use of the Go `int` type in external fields because it is
 | 
				
			||||||
 | 
					ambiguous depending on compiler platform, and instead always use `int32` or
 | 
				
			||||||
 | 
					`int64`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We will use maps (a protobuf 3 extension that can serialize to protobuf 2)
 | 
				
			||||||
 | 
					to represent JSON maps:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    JSON      Protobuf            Wire (proto2)
 | 
				
			||||||
 | 
					    map    -> map<string, ...> -> repeated Message { key string; value bytes }
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We will not convert known string constants to enumerations, since that
 | 
				
			||||||
 | 
					would require extra logic we do not already have in JSOn.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To begin with, we will use Protobuf 3 to generate a Protobuf 2 schema, and
 | 
				
			||||||
 | 
					in the future investigate a Protobuf 3 serialization. We will introduce
 | 
				
			||||||
 | 
					abstractions that let us have more than a single protobuf serialization if
 | 
				
			||||||
 | 
					necessary. Protobuf 3 would require us to support message types for
 | 
				
			||||||
 | 
					pointer primitive (nullable) fields, which is more complex than Protobuf 2's
 | 
				
			||||||
 | 
					support for pointers.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Example of generated proto IDL
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Without gogo extensions:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					syntax = 'proto2';
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					package k8s.io.kubernetes.pkg.api.v1;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/api/resource/generated.proto";
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/api/unversioned/generated.proto";
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/runtime/generated.proto";
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/util/intstr/generated.proto";
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// Package-wide variables from generator "generated".
 | 
				
			||||||
 | 
					option go_package = "v1";
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// Represents a Persistent Disk resource in AWS.
 | 
				
			||||||
 | 
					//
 | 
				
			||||||
 | 
					// An AWS EBS disk must exist before mounting to a container. The disk
 | 
				
			||||||
 | 
					// must also be in the same AWS zone as the kubelet. An AWS EBS disk
 | 
				
			||||||
 | 
					// can only be mounted as read/write once. AWS EBS volumes support
 | 
				
			||||||
 | 
					// ownership management and SELinux relabeling.
 | 
				
			||||||
 | 
					message AWSElasticBlockStoreVolumeSource {
 | 
				
			||||||
 | 
					  // Unique ID of the persistent disk resource in AWS (Amazon EBS volume).
 | 
				
			||||||
 | 
					  // More info: http://releases.k8s.io/HEAD/docs/user-guide/volumes.md#awselasticblockstore
 | 
				
			||||||
 | 
					  optional string volumeID = 1;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  // Filesystem type of the volume that you want to mount.
 | 
				
			||||||
 | 
					  // Tip: Ensure that the filesystem type is supported by the host operating system.
 | 
				
			||||||
 | 
					  // Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
 | 
				
			||||||
 | 
					  // More info: http://releases.k8s.io/HEAD/docs/user-guide/volumes.md#awselasticblockstore
 | 
				
			||||||
 | 
					  // TODO: how do we prevent errors in the filesystem from compromising the machine
 | 
				
			||||||
 | 
					  optional string fsType = 2;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  // The partition in the volume that you want to mount.
 | 
				
			||||||
 | 
					  // If omitted, the default is to mount by volume name.
 | 
				
			||||||
 | 
					  // Examples: For volume /dev/sda1, you specify the partition as "1".
 | 
				
			||||||
 | 
					  // Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty).
 | 
				
			||||||
 | 
					  optional int32 partition = 3;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  // Specify "true" to force and set the ReadOnly property in VolumeMounts to "true".
 | 
				
			||||||
 | 
					  // If omitted, the default is "false".
 | 
				
			||||||
 | 
					  // More info: http://releases.k8s.io/HEAD/docs/user-guide/volumes.md#awselasticblockstore
 | 
				
			||||||
 | 
					  optional bool readOnly = 4;
 | 
				
			||||||
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// Affinity is a group of affinity scheduling rules, currently
 | 
				
			||||||
 | 
					// only node affinity, but in the future also inter-pod affinity.
 | 
				
			||||||
 | 
					message Affinity {
 | 
				
			||||||
 | 
					  // Describes node affinity scheduling rules for the pod.
 | 
				
			||||||
 | 
					  optional NodeAffinity nodeAffinity = 1;
 | 
				
			||||||
 | 
					}
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					With extensions:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					syntax = 'proto2';
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					package k8s.io.kubernetes.pkg.api.v1;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					import "github.com/gogo/protobuf/gogoproto/gogo.proto";
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/api/resource/generated.proto";
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/api/unversioned/generated.proto";
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/runtime/generated.proto";
 | 
				
			||||||
 | 
					import "k8s.io/kubernetes/pkg/util/intstr/generated.proto";
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// Package-wide variables from generator "generated".
 | 
				
			||||||
 | 
					option (gogoproto.marshaler_all) = true;
 | 
				
			||||||
 | 
					option (gogoproto.sizer_all) = true;
 | 
				
			||||||
 | 
					option (gogoproto.unmarshaler_all) = true;
 | 
				
			||||||
 | 
					option (gogoproto.goproto_unrecognized_all) = false;
 | 
				
			||||||
 | 
					option (gogoproto.goproto_enum_prefix_all) = false;
 | 
				
			||||||
 | 
					option (gogoproto.goproto_getters_all) = false;
 | 
				
			||||||
 | 
					option go_package = "v1";
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// Represents a Persistent Disk resource in AWS.
 | 
				
			||||||
 | 
					//
 | 
				
			||||||
 | 
					// An AWS EBS disk must exist before mounting to a container. The disk
 | 
				
			||||||
 | 
					// must also be in the same AWS zone as the kubelet. An AWS EBS disk
 | 
				
			||||||
 | 
					// can only be mounted as read/write once. AWS EBS volumes support
 | 
				
			||||||
 | 
					// ownership management and SELinux relabeling.
 | 
				
			||||||
 | 
					message AWSElasticBlockStoreVolumeSource {
 | 
				
			||||||
 | 
					  // Unique ID of the persistent disk resource in AWS (Amazon EBS volume).
 | 
				
			||||||
 | 
					  // More info: http://releases.k8s.io/HEAD/docs/user-guide/volumes.md#awselasticblockstore
 | 
				
			||||||
 | 
					  optional string volumeID = 1 [(gogoproto.customname) = "VolumeID", (gogoproto.nullable) = false];
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  // Filesystem type of the volume that you want to mount.
 | 
				
			||||||
 | 
					  // Tip: Ensure that the filesystem type is supported by the host operating system.
 | 
				
			||||||
 | 
					  // Examples: "ext4", "xfs", "ntfs". Implicitly inferred to be "ext4" if unspecified.
 | 
				
			||||||
 | 
					  // More info: http://releases.k8s.io/HEAD/docs/user-guide/volumes.md#awselasticblockstore
 | 
				
			||||||
 | 
					  // TODO: how do we prevent errors in the filesystem from compromising the machine
 | 
				
			||||||
 | 
					  optional string fsType = 2 [(gogoproto.customname) = "FSType", (gogoproto.nullable) = false];
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  // The partition in the volume that you want to mount.
 | 
				
			||||||
 | 
					  // If omitted, the default is to mount by volume name.
 | 
				
			||||||
 | 
					  // Examples: For volume /dev/sda1, you specify the partition as "1".
 | 
				
			||||||
 | 
					  // Similarly, the volume partition for /dev/sda is "0" (or you can leave the property empty).
 | 
				
			||||||
 | 
					  optional int32 partition = 3 [(gogoproto.customname) = "Partition", (gogoproto.nullable) = false];
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  // Specify "true" to force and set the ReadOnly property in VolumeMounts to "true".
 | 
				
			||||||
 | 
					  // If omitted, the default is "false".
 | 
				
			||||||
 | 
					  // More info: http://releases.k8s.io/HEAD/docs/user-guide/volumes.md#awselasticblockstore
 | 
				
			||||||
 | 
					  optional bool readOnly = 4 [(gogoproto.customname) = "ReadOnly", (gogoproto.nullable) = false];
 | 
				
			||||||
 | 
					}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					// Affinity is a group of affinity scheduling rules, currently
 | 
				
			||||||
 | 
					// only node affinity, but in the future also inter-pod affinity.
 | 
				
			||||||
 | 
					message Affinity {
 | 
				
			||||||
 | 
					  // Describes node affinity scheduling rules for the pod.
 | 
				
			||||||
 | 
					  optional NodeAffinity nodeAffinity = 1 [(gogoproto.customname) = "NodeAffinity"];
 | 
				
			||||||
 | 
					}
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Wire format
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In order to make Protobuf serialized objects recognizable in a binary form,
 | 
				
			||||||
 | 
					the encoded object must be prefixed by a magic number, and then wrap the
 | 
				
			||||||
 | 
					non-self-describing Protobuf object in a Protobuf object that contains
 | 
				
			||||||
 | 
					schema information.  The protobuf object is referred to as the `raw` object
 | 
				
			||||||
 | 
					and the encapsulation is referred to as `wrapper` object.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The simplest serialization is the raw Protobuf object with no identifying
 | 
				
			||||||
 | 
					information. In some use cases, we may wish to have the server identify the
 | 
				
			||||||
 | 
					raw object type on the wire using a protocol dependent format (gRPC uses
 | 
				
			||||||
 | 
					a type HTTP header). This works when all objects are of the same type, but
 | 
				
			||||||
 | 
					we occasionally have reasons to encode different object types in the same
 | 
				
			||||||
 | 
					context (watches, lists of objects on disk, and API calls that may return
 | 
				
			||||||
 | 
					errors).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To identify the type of a wrapped Protobuf object, we wrap it in a message
 | 
				
			||||||
 | 
					in package `k8s.io/kubernetes/pkg/runtime` with message name `Unknown`
 | 
				
			||||||
 | 
					having the following schema:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    message Unknown {
 | 
				
			||||||
 | 
					      optional TypeMeta typeMeta = 1;
 | 
				
			||||||
 | 
					      optional bytes value = 2;
 | 
				
			||||||
 | 
					      optional string contentEncoding = 3;
 | 
				
			||||||
 | 
					      optional string contentType = 4;
 | 
				
			||||||
 | 
					    }
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    message TypeMeta {
 | 
				
			||||||
 | 
					      optional string apiVersion = 1;
 | 
				
			||||||
 | 
					      optional string kind = 2;
 | 
				
			||||||
 | 
					    }
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The `value` field is an encoded protobuf object that matches the schema
 | 
				
			||||||
 | 
					defined in `typeMeta` and has optional `contentType` and `contentEncoding`
 | 
				
			||||||
 | 
					fields.  `contentType` and `contentEncoding` have the same meaning as in
 | 
				
			||||||
 | 
					HTTP, if unspecified `contentType` means "raw protobuf object", and
 | 
				
			||||||
 | 
					`contentEncoding` defaults to no encoding. If `contentEncoding` is
 | 
				
			||||||
 | 
					specified, the defined transformation should be applied to `value` before
 | 
				
			||||||
 | 
					attempting to decode the value.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The `contentType` field is required to support objects without a defined
 | 
				
			||||||
 | 
					protobuf schema, like the ThirdPartyResource or templates. Those objects
 | 
				
			||||||
 | 
					would have to be encoded as JSON or another structure compatible form
 | 
				
			||||||
 | 
					when used with Protobuf. Generic clients must deal with the possibility
 | 
				
			||||||
 | 
					that the returned value is not in the known type.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					We add the `contentEncoding` field here to preserve room for future
 | 
				
			||||||
 | 
					optimizations like encryption-at-rest or compression of the nested content.
 | 
				
			||||||
 | 
					Clients should error when receiving an encoding they do not support.
 | 
				
			||||||
 | 
					Negotioting encoding is not defined here, but introducing new encodings
 | 
				
			||||||
 | 
					is similar to introducing a schema change or new API version.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A client should use the `kind` and `apiVersion` fields to identify the
 | 
				
			||||||
 | 
					correct protobuf IDL for that message and version, and then decode the
 | 
				
			||||||
 | 
					`bytes` field into that Protobuf message.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Any Unknown value written to stable storage will be given a 4 byte prefix
 | 
				
			||||||
 | 
					`0x6b, 0x38, 0x73, 0x00`, which correspond to `k8s` followed by a zero byte.
 | 
				
			||||||
 | 
					The content-type `application/vnd.kubernetes.protobuf` is defined as
 | 
				
			||||||
 | 
					representing the following schema:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    MESSAGE = '0x6b 0x38 0x73 0x00' UNKNOWN
 | 
				
			||||||
 | 
					    UNKNOWN = <protobuf serialization of k8s.io/kubernetes/pkg/runtime#Unknown>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A client should check for the first four bytes, then perform a protobuf
 | 
				
			||||||
 | 
					deserialization of the remaining bytes into the `runtime.Unknown` type.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Streaming wire format
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					While the majority of Kubernetes APIs return single objects that can vary
 | 
				
			||||||
 | 
					in type (Pod vs Status, PodList vs Status), the watch APIs return a stream
 | 
				
			||||||
 | 
					of identical objects (Events). At the time of this writing, this is the only
 | 
				
			||||||
 | 
					current or anticipated streaming RESTful protocol (logging, port-forwarding,
 | 
				
			||||||
 | 
					and exec protocols use a binary protocol over Websockets or SPDY).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In JSON, this API is implemented as a stream of JSON objects that are
 | 
				
			||||||
 | 
					separated by their syntax (the closing `}` brace is followed by whitespace
 | 
				
			||||||
 | 
					and the opening `{` brace starts the next object). There is no formal
 | 
				
			||||||
 | 
					specification covering this pattern, nor a unique content-type. Each object
 | 
				
			||||||
 | 
					is expected to be of type `watch.Event`, and is currently not self describing.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For expediency and consistency, we define a format for Protobuf watch Events
 | 
				
			||||||
 | 
					that is similar. Since protobuf messages are not self describing, we must
 | 
				
			||||||
 | 
					identify the boundaries between Events (a `frame`). We do that by prefixing
 | 
				
			||||||
 | 
					each frame of N bytes with a 4-byte, big-endian, unsigned integer with the
 | 
				
			||||||
 | 
					value N.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    frame  = length body
 | 
				
			||||||
 | 
					    length = 32-bit unsigned integer in big-endian order, denoting length of
 | 
				
			||||||
 | 
					             bytes of body
 | 
				
			||||||
 | 
					    body = <bytes>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    # frame containing a single byte 0a
 | 
				
			||||||
 | 
					    frame = 01 00 00 00 0a
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    # equivalent JSON
 | 
				
			||||||
 | 
					    frame = {"type": "added", ...}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The body of each frame is a serialized Protobuf message `Event` in package
 | 
				
			||||||
 | 
					`k8s.io/kubernetes/pkg/watch/versioned`. The content type used for this
 | 
				
			||||||
 | 
					format is `application/vnd.kubernetes.protobuf;type=watch`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Negotiation
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To allow clients to request protobuf serialization optionally, the `Accept`
 | 
				
			||||||
 | 
					HTTP header is used by callers to indicate which serialization they wish
 | 
				
			||||||
 | 
					returned in the response, and the `Content-Type` header is used to tell the
 | 
				
			||||||
 | 
					server how to decode the bytes sent in the request (for DELETE/POST/PUT/PATCH
 | 
				
			||||||
 | 
					requests). The server will return 406 if the `Accept` header is not
 | 
				
			||||||
 | 
					recognized or 415 if the `Content-Type` is not recognized (as defined in
 | 
				
			||||||
 | 
					RFC2616).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To be backwards compatible, clients must consider that the server does not
 | 
				
			||||||
 | 
					support protobuf serialization. A number of options are possible:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Preconfigured
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Clients can have a configuration setting that instructs them which version
 | 
				
			||||||
 | 
					to use. This is the simplest option, but requires intervention when the
 | 
				
			||||||
 | 
					component upgrades to protobuf.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Include serialization information in api-discovery
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Servers can define the list of content types they accept and return in
 | 
				
			||||||
 | 
					their API discovery docs, and clients can use protobuf if they support it.
 | 
				
			||||||
 | 
					Allows dynamic configuration during upgrade if the client is already using
 | 
				
			||||||
 | 
					API-discovery.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Optimistically attempt to send and receive requests using protobuf
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Using multiple `Accept` values:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    Accept: application/vnd.kubernetes.protobuf, application/json
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					clients can indicate their preferences and handle the returned
 | 
				
			||||||
 | 
					`Content-Type` using whatever the server responds. On update operations,
 | 
				
			||||||
 | 
					clients can try protobuf and if they receive a 415 error, record that and
 | 
				
			||||||
 | 
					fall back to JSON. Allows the client to be backwards compatible with
 | 
				
			||||||
 | 
					any server, but comes at the cost of some implementation complexity.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Generation process
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Generation proceeds in five phases:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					1. Generate a gogo-protobuf annotated IDL from the source Go struct.
 | 
				
			||||||
 | 
					2. Generate temporary Go structs from the IDL using gogo-protobuf.
 | 
				
			||||||
 | 
					3. Generate marshaller/unmarshallers based on the IDL using gogo-protobuf.
 | 
				
			||||||
 | 
					4. Take all tag numbers generated for the IDL and apply them as struct tags
 | 
				
			||||||
 | 
					   to the original Go types.
 | 
				
			||||||
 | 
					5. Generate a final IDL without gogo-protobuf annotations as the canonical IDL.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The output is a `generated.proto` file in each package containing a standard
 | 
				
			||||||
 | 
					proto2 IDL, and a `generated.pb.go` file in each package that contains the
 | 
				
			||||||
 | 
					generated marshal/unmarshallers.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The Go struct generated by gogo-protobuf from the first IDL must be identical
 | 
				
			||||||
 | 
					to the origin struct - a number of changes have been made to gogo-protobuf
 | 
				
			||||||
 | 
					to ensure exact 1-1 conversion. A small number of additions may be necessary
 | 
				
			||||||
 | 
					in the future if we introduce more exotic field types (Go type aliases, maps
 | 
				
			||||||
 | 
					with aliased Go types, and embedded fields were fixed). If they are identical,
 | 
				
			||||||
 | 
					the output marshallers/unmarshallers can then work on the origin struct.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Whenever a new field is added, generation will assign that field a unique tag
 | 
				
			||||||
 | 
					and the 4th phase will write that tag back to the origin Go struct as a `protobuf`
 | 
				
			||||||
 | 
					struct tag. This ensures subsequent generation passes are stable, even in the
 | 
				
			||||||
 | 
					face of internal refactors. The first time a field is added, the author will
 | 
				
			||||||
 | 
					need to check in both the new IDL AND the protobuf struct tag changes.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The second IDL is generated without gogo-protobuf annotations to allow clients
 | 
				
			||||||
 | 
					in other languages to generate easily.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Any errors in the generation process are considered fatal and must be resolved
 | 
				
			||||||
 | 
					early (being unable to identify a field type for conversion, duplicate fields,
 | 
				
			||||||
 | 
					duplicate tags, protoc errors, etc). The conversion fuzzer is used to ensure
 | 
				
			||||||
 | 
					that a Go struct can be round-tripped to protobuf and back, as we do for JSON
 | 
				
			||||||
 | 
					and conversion testing.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Changes to development process
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					All existing API change rules would still apply. New fields added would be
 | 
				
			||||||
 | 
					automatically assigned a tag by the generation process. New API versions will
 | 
				
			||||||
 | 
					have a new proto IDL, and field name and changes across API versions would be
 | 
				
			||||||
 | 
					handled using our existing API change rules. Tags cannot change within an
 | 
				
			||||||
 | 
					API version.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Generation would be done by developers and then checked into source control,
 | 
				
			||||||
 | 
					like conversions and ugorji JSON codecs.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Because protoc is not packaged well across all platforms, we will add it to
 | 
				
			||||||
 | 
					the `kube-cross` Docker image and developers can use that to generate
 | 
				
			||||||
 | 
					updated protobufs. Protobuf 3 beta is required.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The generated protobuf will be checked with a verify script before merging.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Implications
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* The generated marshal code is large and will increase build times and binary
 | 
				
			||||||
 | 
					  size. We may be able to remove ugorji after protobuf is added, since the
 | 
				
			||||||
 | 
					  bulk of our decoding would switch to protobuf.
 | 
				
			||||||
 | 
					* The protobuf schema is naive, which means it may not be as a minimal as
 | 
				
			||||||
 | 
					  possible.
 | 
				
			||||||
 | 
					* Debugging of protobuf related errors is harder due to the binary nature of
 | 
				
			||||||
 | 
					  the format.
 | 
				
			||||||
 | 
					* Migrating API object storage from JSON to protobuf will require that all
 | 
				
			||||||
 | 
					  API servers are upgraded before beginning to write protobuf to disk, since
 | 
				
			||||||
 | 
					  old servers won't recognize protobuf.
 | 
				
			||||||
 | 
					* Transport of protobuf between etcd and the api server will be less efficient
 | 
				
			||||||
 | 
					  in etcd2 than etcd3 (since etcd2 must encode binary values returned as JSON).
 | 
				
			||||||
 | 
					  Should still be smaller than current JSON request.
 | 
				
			||||||
 | 
					* Third-party API objects must be stored as JSON inside of a protobuf wrapper
 | 
				
			||||||
 | 
					  in etcd, and the API endpoints will not benefit from clients that speak
 | 
				
			||||||
 | 
					  protobuf. Clients will have to deal with some API objects not supporting
 | 
				
			||||||
 | 
					  protobuf.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Open Questions
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					* Is supporting stored protobuf files on disk in the kubectl client worth it?
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 | 
				
			||||||
 | 
					[]()
 | 
				
			||||||
 | 
					<!-- END MUNGE: GENERATED_ANALYTICS -->
 | 
				
			||||||
		Reference in New Issue
	
	Block a user