Automatic merge from submit-queue (batch tested with PRs 66098, 66389, 66400, 66413, 66378). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
GCE: Return correct error type and HTTP Status code for operation errors
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#66399
**Special notes for your reviewer**:
/assign bowei, zihongz, rramkumar
/cc bowei
**Release note**:
```release-note
GCE: Fixes loadbalancer creation and deletion issues appearing in 1.10.5.
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add initial availability zones support for Azure nodes
**What this PR does / why we need it**:
The first part of [Azure Availability Zone feature](https://github.com/kubernetes/features/issues/586).
This PR adds initial availability zone (AZ) support for Azure nodes. With this PR, Azure nodes with AZ will have label `failure-domain.beta.kubernetes.io/zone=<region>-<zoneID>`, e.g. `southeastasia-1`.
It also updates instance metadata api-version to 2017-12-01, which is required for AZ.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
VirtualMachineScaleSetVM doesn't have AZ info yet. It will be supported later after new Azure Go SDK releases.
**Release note**:
```release-note
Azure nodes with availability zone now will have label `failure-domain.beta.kubernetes.io/zone=<region>-<zoneID>`.
```
/kind feature
/sig azure
/assign @brendandburns @khenidak @andyzhangx
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add error check and ignore unused variable (SA4006)
**What this PR does / why we need it**:
Fix some bugs in cloud provider vsphere, issue can be found here #66303
```pkg/cloudprovider/providers/vsphere/nodemanager.go:176:5: defers in this range loop won't run unless the channel gets closed (SA9001)
pkg/cloudprovider/providers/vsphere/vclib/diskmanagers/vmdm.go:129:8: this value of err is never used (SA4006)
pkg/cloudprovider/providers/vsphere/vsphere.go:596:34: argument ctx is overwritten before first use (SA4009)
pkg/cloudprovider/providers/vsphere/vsphere_test.go:360:2: this value of instanceID is never used (SA4006)
pkg/cloudprovider/providers/vsphere/vsphere_util.go:301:3: defers in this infinite loop will never run (SA5003)
```
**Special notes for your reviewer**:
I fixed ```SA4006``` report in that issue, but there are still other code needed to discuss and fix.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix locating resourcepool-path specified in the vsphere.conf file
**What this PR does / why we need it**:
When volume is provisioned using the vsphere storage policy, `resourcepool-path` specified in the `vsphere.conf` file is used for creating a shadow/dummy VM. Dummy VM is temporarily created and then deleted once volume is created on the compatible Datastore.
At present If user specifies `resourcepool-path` in the `vsphere.conf` file, volume provisioner is not able to locate the compute resource for the given path. This is because look up is made using `finder.DefaultComputeResource(ctx)` and `finder.ComputeResource(ctx, computePath)`, which is not correct. If user specifies name of the cluster or cluster path then provisioning works.
This is resolved with using correct govmomi method - `func (f *Finder) ResourcePoolOrDefault(ctx context.Context, path string) (*object.ResourcePool, error)`
**Which issue(s) this PR fixes**
Fixes # https://github.com/vmware/kubernetes/issues/493
**Special notes for your reviewer**:
Following testing is performed for this change.
1) specified resource-pool path in the `vsphere.conf` file and verified VM is created under the specified resource pool.
```
resourcepool-path="ClusterFolder-1/cluster-vsan-1/Resources/ShadowVMPool"
```
2) If resource pool is not available, specified cluster's default resource pool path in the `vsphere.conf` file and verified volume provisioning works. For this case, VM is directly created under cluster.
```
resourcepool-path="ClusterFolder-1/cluster-vsan-1/Resources"
```
3) Verified above with having multiple clusters with the same name in one datacenter.
4) Verified with empty resource pool path in the vsphere.conf file.
```
resourcepool-path=""
```
As expected, provisioning is failing with `Failed to provision volume with StorageClass "vsan-gold-policy": no default resource pool found`.
Refer to this datacenter inventory for the path specified in the `resourcepool-path` configuration.

Current documentation describes `resourcepool-path`configuration is optional, which needs to be corrected once PR is merged. For policy based provisioning this is not an optional parameter.
Documentation link: https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html
**Release note**:
```release-note
Fix for resourcepool-path configuration in the vsphere.conf file.
```
cc: @kubernetes/vmware
Automatic merge from submit-queue (batch tested with PRs 66121, 66140, 66045). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Support setting azure LB idle timeout
**What this PR does / why we need it**:
Adds a new annotation to allow users to configure the idle timeout of
the Azure LB.
**Release note**:
```release-note
Support configuring the Azure load balancer idle connection timeout for services
```
Automatic merge from submit-queue (batch tested with PRs 66122, 66007). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Return vmUUID when renewing nodeinfo in VCP
**What this PR does / why we need it**:
This PR fixes an issue that VM UUID is removed when renewing node information in vSphere cloud provider
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
/sig vmware
Automatic merge from submit-queue (batch tested with PRs 66076, 65792, 65649). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
kubernetes: fix printf format errors
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
```release-note
NONE
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Add region label to dynamic provisioned cinder PVs
**What this PR does / why we need it**:
This PR adds region label to dynamic provisioned Cinder PVs at the time of the PV creation.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65977
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.
Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
pkg/cloudprovider/provivers/vsphere/nodemanager.go
Automatic merge from submit-queue (batch tested with PRs 65902, 65781). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
if loadbalancer section is not defined in cloudconfig, do not init support
**What this PR does / why we need it**: if LoadBalancer section is not defined in cloudconfig, we should not initialize loadbalancer support for openstack cloudprovider.
**Which issue(s) this PR fixes**:
Fixes#65775
**Special notes for your reviewer**:
**Release note**:
```release-note
If LoadBalancer is not defined in cloud config, the loadbalancer is not initialized anymore in openstack. All setups must have some setting under that section
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Store the latest cloud provider node addresses
**What this PR does / why we need it**:
Buffer the recently retrieved node address so they can be used as soon as the next node status update is run.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65814
**Special notes for your reviewer**:
**Release note**:
```release-note
None
```
Setting the client UserAgent makes it easier to identify vCenter sessions
used by the vSphere Cloud Provider. This is useful to remove sessions that
have leaked, such as when a VCP process goes away without calling Logout().
And to test that VCP properly re-authenticates when a session is removed.
Example use:
govc session.ls | grep kubernetes-cloudprovider | awk '{print $1}' | xargs -n1 govc session.rm
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
skip nic that are in failing state
**What this PR does / why we need it**: this fixes partially #65025. Currently when getting primary NIC for VMSS the provisioning state isn't returned.
**Which issue(s) this PR fixes** : Fixes partially (for VMAS) #65025
**Special notes for your reviewer**:
/assign @feiskyer
**Release note**:
```release-note
skip nodes that have a primary NIC in a 'Failed' provisioningState
```
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Setup TLS with CA Cert for vsphere cloud provider
- Extend config to take a path to a CA Certificate
- Use the CA Cert when establishing a connection with the SOAP client
Testing
We provide certs and keys for tests as fixtures, `vclib/fixtures`.
Those were created (and can be regenerated) using `vclib/fixtures/createCerts.sh`.
At the moment it's possible to configure a CA path and at the same time allow insecure
communication between vsphere cloud provider and vcenter. This may
change in the future; we might opt for overwriting the insecure
communication if a CA is configured / log and transparently pass the
arguments to the vcenter command / other. To be discussed.
At the moment the CA is a global level configuration. In other
words, all vcenter servers need to use certificates signed by the same
CA. There might be use cases for different CA per vcenter server; to be
discussed.
**What this PR does / why we need it**:
This PR adds the option of configuring a trusted CA for the communication between the vsphere cloud provider and the vcenter control plane.
**Which issue(s) this PR fixes**:
Fixes#64222
**Special notes for your reviewer**:
**Release note**:
```release-note
- Can configure the vsphere cloud provider with a trusted Root-CA
```
Automatic merge from submit-queue (batch tested with PRs 64575, 65120, 65463, 65434, 65522). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Revendor GCE Go Client
Revendor GCE API go client and switch to use beta neg api in gce cloud provider.
```release-note
None
```
Automatic merge from submit-queue (batch tested with PRs 65492, 65516, 65447). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
fix azure disk creation issue when specifying external resource group
**What this PR does / why we need it**:
fix azure disk creation issue when specifying external resource group, after azure disk creation succeeded, it fails to get azure disk state since it's still using original resource group
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#65515
**Special notes for your reviewer**:
Together with https://github.com/kubernetes/kubernetes/pull/65443, this feature has been done, I will cherry-pick to prior versions later.
So in the end, we have two ways to make azure disk dynamic provision under an external resource group
- specify `resourcegroup` parameter in azure disk storage class
```
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: hdd
provisioner: kubernetes.io/azure-disk
parameters:
skuname: Standard_LRS
kind: managed
cachingmode: None
resourcegroup: USER-SPECIFIED-RG
```
- specify `volume.beta.kubernetes.io/resource-group` in PVC annotations
```
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-azuredisk
annotations:
volume.beta.kubernetes.io/resource-group: "USER-SPECIFIED-RG"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: hdd
```
**Release note**:
```
fix azure disk issue when specifying external resource group
```
/kind bug
/sig azure
@jsafrane @rootfs
Just FYI @khenidak @brendandburns @feiskyer
Automatic merge from submit-queue (batch tested with PRs 65507, 65508, 65486). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
Fix typo in vsphere cloud provider comment
**What this PR does / why we need it**:
Fix typo in code of vsphere cloud provider
As far as I know, it's not purchased by Google right..?
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 65449, 65373, 49410). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.
OpenStack LBaaS fix: must use ID, not name, of the node security group
This is a bugfix for the OpenStack LBaaS cloud provider security group management.
A bit of context: When creating a load balancer for a given `type: LoadBalancer` service, the provider will try to:
(see `pkg/cloudprovider/providers/openstack/openstack_loadbalancer.go`/`EnsureLoadBalancer`)
1. create a load balancer (LB) in Openstack with listeners corresponding to the service's ports
2. attach a floating IP to the LB's network port
If `manage-security-groups` is enabled in controller-manager's cloud.conf:
3. create a security group with ingress rules corresponding to the LB's listeners, and attach it to the LB's network port
4. for all nodes of the cluster, pick an existing security group for the nodes ("node security group") and add ingress rules to it exposing the service's NodePorts to the security group created in step 3.
In the current upstream master, steps 1 through 3 work fine, step 4 fails, leading to a service that's not accessible via the LB without further manual intervention.
The bug is in the "pick an existing security group" operation (func `getNodeSecurityGroupIDForLB`), which, contrary to its name, will return the security group's name rather than its ID (actually it returns a list of names rather than IDs, apparently to cover some corner cases where you might have more than one node security group, but anyway). This will then be used when trying to add the ingress rules to the group, which the Openstack API will reject with a 404 (at least on our (fairly standard) Openstack Ocata installation) because we're giving it a name where it expects an ID.
The PR adds a "get ID given a name" lookup to the `getNodeSecurityGroupIDForLB` function, so it actually returns IDs. That's it. I'm not sure if the upstream code wasn't really tested, or maybe other people use other Openstacks with more lenient APIs. The bug and the fix is always reproducible on our installation.
**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes#58145
**Special notes for your reviewer:**
Should we turn `getNodeSecurityGroupIDForLB` into a method with the lbaas as its receiver because it now requires two of the lbaas's attributes? I'm not sure what the conventions are here, if any.
**Release note**:
```release-note
NONE
```