mirror of
https://github.com/lingble/talos.git
synced 2025-11-02 05:28:09 +00:00
fix: random failures in cluster health checks
The problem was that some of the health checks sort the list of the nodes in place (via `sort.Strings()`). If cluster info provider returns original slice, it might be mutated in such a way that it gets corrupted. We never noticed it before CAPI clusters, as in our tests IPs are assigned sequentially, and sort operation is a no-op. Specifically, the problem was with the `Nodes()` function, it returns `append(controlPlaneNodes, workerNodes...)` slice, which by definition might share memory with `controlPlaneNodes` slice. For example, if control plane nodes were `4, 5, 6` and worker nodes were `3`, the returned slice will be `4, 5, 6, 3`, and it shares memory with `controlPlaneNodes` slice (firs three items). If we apply `sort` to the returned slice, it re-orders it as `3, 4, 5, 6`, but as it is done in-place, the `controlPlaneNodes` slice is now `3, 4, 5`, which is obviously wrong. Fix that by always returning a copy of the slice from the functions implementing `ClusterInfo` interface. Signed-off-by: Andrey Smirnov <smirnov.andrey@gmail.com>
This commit is contained in:
committed by
talos-bot
parent
371cbfa7ae
commit
dc6ea74c35
@@ -14,7 +14,7 @@ type infoWrapper struct {
|
||||
}
|
||||
|
||||
func (wrapper *infoWrapper) Nodes() []string {
|
||||
return append(wrapper.masterNodes, wrapper.workerNodes...)
|
||||
return append([]string(nil), append(wrapper.masterNodes, wrapper.workerNodes...)...)
|
||||
}
|
||||
|
||||
func (wrapper *infoWrapper) NodesByType(t machine.Type) []string {
|
||||
@@ -22,9 +22,9 @@ func (wrapper *infoWrapper) NodesByType(t machine.Type) []string {
|
||||
case machine.TypeInit:
|
||||
return nil
|
||||
case machine.TypeControlPlane:
|
||||
return wrapper.masterNodes
|
||||
return append([]string(nil), wrapper.masterNodes...)
|
||||
case machine.TypeJoin:
|
||||
return wrapper.workerNodes
|
||||
return append([]string(nil), wrapper.workerNodes...)
|
||||
default:
|
||||
panic("unreachable")
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user