Compare commits

...

56 Commits

Author SHA1 Message Date
kklinch0
39b31ca9e5 fix updateStatus field 2025-04-10 14:28:29 +03:00
kklinch0
db7c591957 fix image tag for victorialogs 2025-04-10 14:04:45 +03:00
kklinch0
5baa48022e fix 2025-04-10 11:58:50 +03:00
Andrei Kvapil
1234872bda Upd: Kube-OVN to v1.13.6 2025-04-10 11:58:50 +03:00
Andrei Kvapil
6afb1aad03 Upd: Cilium to v1.17.2
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 11:58:50 +03:00
Andrei Kvapil
ad8e09bb35 Upd: Kamaji to v0.9.2
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 11:58:50 +03:00
Andrei Kvapil
e8faf193eb Upd: Keycloak-operator to v1.25.0
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 11:58:50 +03:00
Andrei Kvapil
2393e3427c Update Cluster-API operator to v0.18.1
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 11:58:50 +03:00
Andrei Kvapil
ddb237718b Upd: victoria-metrics operator to v0.55.0
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 11:58:50 +03:00
Andrei Kvapil
ae619953fb [tests] Fix e2e tests (dependencies and timeouts)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 11:58:50 +03:00
Andrei Kvapil
434c5d1b9c [ci] Add talos-kernel and talos-initramfs to assets (#784)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 10:55:23 +02:00
Andrei Kvapil
cc9abfe03f [ci] Add talos-kernel and talos-initramfs to assets
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 10:54:54 +02:00
Andrei Kvapil
e02fd14a3c Fix: versions_map, use awk instead of grep (#780)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Enhanced the internal version verification process to ensure improved
precision and reliability in version validation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-10 10:42:52 +02:00
Andrei Kvapil
559eb8dea9 Fix: versions_map, use awk instead of grep
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 10:42:31 +02:00
Andrei Kvapil
9e6478b9c9 [linstor] Add plunger check for disconnected DRBD peers. (#707)
Sometimes DRBD devices get stuck in "Connecting" state, probably due to
some
race conditions. This scriptlet provides a workaround for such
situations.
2025-04-10 10:38:29 +02:00
Andrei Kvapil
3a295c4474 Add guard against empty cloudInit in vm-instance app (#646)
Prevent the VM resource from referencing a non-existent secret when
`sshKeys` are set and `cloudInit` is set to empty.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Improved cloud-init configuration handling with conditional logic and
clearer error messaging when expected configuration values are missing.

- **Documentation**
- Refined virtual machine configuration guides by reformatting parameter
tables and correcting typographical errors in parameter descriptions.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-10 10:37:50 +02:00
Andrei Kvapil
f8dfc43cae [e2e] Add mirror.gcr.io as default mirror for docker.io (#782)
related issues:
- https://github.com/cozystack/talm/pull/48
- https://github.com/cozystack/website/pull/154
- https://github.com/cozystack/cozystack/pull/782


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced additional configuration options that enable using Docker
image mirrors. This enhancement can improve image retrieval performance
and provide redundancy while maintaining the existing functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-10 10:36:23 +02:00
Andrei Kvapil
3e19bc74d4 [tests] Add mirror.gcr.io as default mirror for docker.io
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-10 10:36:02 +02:00
klinch0
2966922c0b feat(vpa): separate-crds (#781)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Improved autoscaling deployment by integrating an additional component
for managing custom resource definitions.
- Enhanced dependency management now ensures critical prerequisites are
deployed in the correct order.
- Introduced an automated update mechanism to keep resource definitions
current.
- Added a new configuration option, giving users the flexibility to
enable or disable custom resource definitions as needed.
- Introduced two new Custom Resource Definitions:
`VerticalPodAutoscalerCheckpoint` and `VerticalPodAutoscaler`.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-10 11:35:36 +03:00
Denis Seleznev
991c7e1943 Handle empty cloudInit.
Add a no-op user-data when sshKeys are specified.

Signed-off-by: Denis Seleznev <kto.3decb@gmail.com>
2025-04-10 10:07:43 +02:00
kklinch0
c31a7710ad feat(vpa): separate-crds
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-10 10:57:50 +03:00
Andrei Kvapil
f4cace093c Add a setting to VMs that allows users to trigger cloud-init full reconfiguration. (#767)
This will trigger cloud-init reinitialization, including ssh keys update
and static network config refresh.
2025-04-10 09:20:55 +02:00
Denis Seleznev
01e417d436 Add Linstor plunger scriptlet to fix DRBD devices that are stuck disconnected.
Sometimes DRBD devices get stuck in "Connecting" state, probably due to some
race conditions. This scriptlet provides a workaround for such situations.

Signed-off-by: Denis Seleznev <kto.3decb@gmail.com>
2025-04-10 03:49:23 +02:00
Denis Seleznev
261ce4278f Add a setting to VMs that allows users to trigger cloud-init full reconfiguration.
Changing `cloudInitSeed`  will trigger cloud-init reinitialization, including ssh keys update and static network config refresh.

Signed-off-by: Denis Seleznev <kto.3decb@gmail.com>
2025-04-09 20:48:18 +02:00
Timofei Larkin
785898b507 Delete a Workload if the related object is absent (#779)
Workload object counts were previously getting out of control as the recreation of a related Pod would spawn a new workload, while the old one would never get deleted (except for StatefulSets, where the names of Pods are stable). Workloads without a matching object are now deleted.
2025-04-09 21:02:22 +04:00
Timofei Larkin
47a2cf7cd5 Track public IP usage (#769)
Like the existing behavior for Pods and the recently merged behavior for PVCs, the WorkloadMonitor controller now creates Workload objects for Services with Type==LoadBalancer to keep track of public IP reservations.
2025-04-09 21:00:35 +04:00
Timofei Larkin
1f19793613 Merge branch 'main' into 176-track-ips 2025-04-09 20:26:29 +04:00
Timofei Larkin
a0df2989af Track public IP usage
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-04-09 19:24:36 +03:00
Timofei Larkin
bdb538ab42 Track PVCs with WorkloadMonitor (#768)
The WorkloadMonitor controller now also watches PVCs, just like it has been watching Pods and creates Workloads per PVC according to the `spec.selector` field to track the used storage space.
2025-04-09 20:01:13 +04:00
klinch0
c844a4fb2b Fix: versions_map, include only versions from tags (#777)
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling so that missing chart versions no longer halt
processing, ensuring smoother operations.

- **Chores**
- Simplified the version tag lookup to rely solely on remote repository
tags for increased consistency and reliability.
  - Updated the Kafka application version from `0.5.0` to `0.5.2`.
- Adjusted versioning information for the Kafka package to reflect fixed
commit references.
- Streamlined the pre-commit workflow by removing unnecessary steps and
logging.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-04-09 18:46:58 +03:00
Timofei Larkin
fea142774a Delete a Workload if the related object is absent
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-04-09 18:36:11 +03:00
Andrei Kvapil
4b575299bc Log verbose state for DRBD devices that are not healthy. (#771)
This will help troubleshoot issues that occurred in the past but have
already been resolved.
2025-04-09 14:16:53 +02:00
Andrei Kvapil
4eec016f7d Merge pull request #757 from jokeOps/main
kubevirt for able to run CX or RT type of instances.
2025-04-09 14:12:31 +02:00
Andrei Kvapil
4078b21ac6 Merge branch 'main' into main 2025-04-09 14:09:59 +02:00
Andrei Kvapil
1721d397a7 Fix: versions_map, include only versions from tags
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-09 14:07:07 +02:00
Andrei Kvapil
558a0572f5 Merge pull request #776 from cozystack/bugfix/fix_version_map
fix version map
2025-04-09 14:06:31 +02:00
kklinch0
d60b81c8a0 fix version map
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-09 14:44:42 +03:00
Timofei Larkin
cc14c1fbab Track PVCs with WorkloadMonitor
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-04-09 14:09:36 +03:00
Nick Volynkin
80aee1354b Merge pull request #774 from cozystack/update-readme
*   [docs] Update links after restructuring docs
    
    Follow-up to cozystack/website#138
*   [docs] Proofread the readme and contributing
    
    Fix a few errors here and there.

*   [ci] Run pre-commit checks once on PRs
    
    Pre-commit checks used to trigger twice on PRs: for `push` and `pull_request`
    triggers. Now they will only run on `push` to the main branch and on regular
    updates to pull requests, except for those that only change the documentation.
    
    Note that pushes to feature branches will not trigger this check until
    a PR was opened.
2025-04-09 11:47:43 +03:00
Andrei Kvapil
332d69259b Merge pull request #766 from cozystack/vm-gpu
[virtual-machine] Add GPU support
2025-04-09 10:40:52 +02:00
Andrei Kvapil
9ad6b0d726 [virtual-machine] Add GPU support
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-09 10:39:49 +02:00
Andrei Kvapil
ea9df9e371 Merge pull request #765 from cozystack/gpu-operator
[gpu-operator] Introduce GPU-operator
2025-04-09 10:37:52 +02:00
Nick Volynkin
d69a9c4862 [docs] Proofread the readme and contributing
Fix a few errors here and there.

Signed-off-by: Nick Volynkin <nick.volynkin@gmail.com>
2025-04-09 11:09:19 +03:00
Nick Volynkin
6270a11bb1 [docs] Update links after restructuring docs
Follow-up to cozystack/website#138

Signed-off-by: Nick Volynkin <nick.volynkin@gmail.com>
2025-04-09 11:09:18 +03:00
Nick Volynkin
18726483a6 [ci] Run pre-commit checks once on PRs
Pre-commit checks used to trigger twice on PRs: for `push` and `pull_request`
triggers. Now they will only run on `push` to the main branch and on regular
updates to pull requests, except for those that only change the documentation.

Note that pushes to feature branches will not trigger this check until
a PR was opened.

Signed-off-by: Nick Volynkin <nick.volynkin@gmail.com>
2025-04-09 11:07:38 +03:00
Denis Seleznev
aed184f6ef Log verbose state for DRBD devices that are not healthy.
This will help troubleshoot issues that occurred in the past but have already been resolved.

Signed-off-by: Denis Seleznev <kto.3decb@gmail.com>
2025-04-09 03:46:37 +02:00
Andrei Kvapil
f688a57132 Merge pull request #773 from cozystack/upload-vmlinuz-and-initramfs
Upload kernel and initramfs to release assets
2025-04-08 23:31:34 +02:00
Andrei Kvapil
e954ab7f8b Upload kernel and initramfs to release assets
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-08 23:27:37 +02:00
Timofei Larkin
c9c8235c64 Merge pull request #772 from klinch0/monitoring-add-vpa-for-vmagent
[monitoring] add vpa for vmagent
2025-04-08 17:48:38 +04:00
kklinch0
8e2e77da56 [monitoring] add vpa for vmagent
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-08 16:40:39 +03:00
Andrei Kvapil
1e27dedde5 [gpu-operator] Introduce GPU-operator
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-08 14:03:52 +02:00
Timofei Larkin
e947805c15 Track PVCs with WorkloadMonitor
Signed-off-by: Timofei Larkin <lllamnyp@gmail.com>
2025-04-08 11:44:36 +03:00
Pavlo Gaponuk
7a1c3b6209 Need for CX or RX type of instances
Signed-off-by: Pavlo Gaponuk <pashagaponuk@gmail.com>
2025-04-07 19:03:04 +02:00
Andrei Kvapil
49b5b510ee Merge pull request #758 from klinch0/k8s-change-CP-default-resourcesPreset
[k8s] change CP default resourcesPreset
2025-04-05 21:35:11 +02:00
kklinch0
3cf850c2c4 [k8s] change CP default resourcesPreset
Signed-off-by: kklinch0 <kklinch0@gmail.com>
2025-04-05 21:31:17 +03:00
Andrei Kvapil
1fbbfcd063 [ci] Rename workflows
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
2025-04-03 17:05:19 +02:00
187 changed files with 18362 additions and 9149 deletions

View File

@@ -1,7 +1,12 @@
name: Pre-Commit Checks
on: [push, pull_request]
on:
push:
branches:
- main
pull_request:
paths-ignore:
- '**.md'
jobs:
pre-commit:
runs-on: ubuntu-22.04

View File

@@ -1,4 +1,4 @@
name: Verify and Finalize Release PR
name: Releasing PR
on:
pull_request:

View File

@@ -1,4 +1,4 @@
name: Build and Test
name: Pull Request
on:
pull_request:
@@ -6,7 +6,7 @@ on:
jobs:
e2e:
name: Build and Test for Pull Requests
name: Build and Test
runs-on: [self-hosted]
permissions:
contents: read

View File

@@ -1,4 +1,4 @@
name: Prepare Release
name: Versioned Tag
on:
push:

View File

@@ -6,13 +6,13 @@ As you get started, you are in the best position to give us feedbacks on areas o
* Problems found while setting up the development environment
* Gaps in our documentation
* Bugs in our Github actions
* Bugs in our GitHub actions
First, though, it is important that you read the [code of conduct](CODE_OF_CONDUCT.md).
First, though, it is important that you read the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
The guidelines below are a starting point. We don't want to limit your
creativity, passion, and initiative. If you think there's a better way, please
feel free to bring it up in a Github discussion, or open a pull request. We're
feel free to bring it up in a GitHub discussion, or open a pull request. We're
certain there are always better ways to do things, we just need to start some
constructive dialogue!
@@ -23,9 +23,9 @@ We welcome many types of contributions including:
* New features
* Builds, CI/CD
* Bug fixes
* [Documentation](https://github.com/cozystack/cozystack-website/tree/main)
* [Documentation](https://GitHub.com/cozystack/cozystack-website/tree/main)
* Issue Triage
* Answering questions on Slack or Github Discussions
* Answering questions on Slack or GitHub Discussions
* Web design
* Communications / Social Media / Blog Posts
* Events participation
@@ -34,7 +34,7 @@ We welcome many types of contributions including:
## Ask for Help
The best way to reach us with a question when contributing is to drop a line in
our [Telegram channel](https://t.me/cozystack), or start a new Github discussion.
our [Telegram channel](https://t.me/cozystack), or start a new GitHub discussion.
## Raising Issues

View File

@@ -12,20 +12,21 @@
**Cozystack** is a free PaaS platform and framework for building clouds.
With Cozystack, you can transform your bunch of servers into an intelligent system with a simple REST API for spawning Kubernetes clusters, Database-as-a-Service, virtual machines, load balancers, HTTP caching services, and other services with ease.
With Cozystack, you can transform a bunch of servers into an intelligent system with a simple REST API for spawning Kubernetes clusters,
Database-as-a-Service, virtual machines, load balancers, HTTP caching services, and other services with ease.
You can use Cozystack to build your own cloud or to provide a cost-effective development environments.
Use Cozystack to build your own cloud or provide a cost-effective development environment.
## Use-Cases
* [**Using Cozystack to build public cloud**](https://cozystack.io/docs/use-cases/public-cloud/)
You can use Cozystack as backend for a public cloud
* [**Using Cozystack to build a public cloud**](https://cozystack.io/docs/guides/use-cases/public-cloud/)
You can use Cozystack as a backend for a public cloud
* [**Using Cozystack to build private cloud**](https://cozystack.io/docs/use-cases/private-cloud/)
You can use Cozystack as platform to build a private cloud powered by Infrastructure-as-Code approach
* [**Using Cozystack to build a private cloud**](https://cozystack.io/docs/guides/use-cases/private-cloud/)
You can use Cozystack as a platform to build a private cloud powered by Infrastructure-as-Code approach
* [**Using Cozystack as Kubernetes distribution**](https://cozystack.io/docs/use-cases/kubernetes-distribution/)
You can use Cozystack as Kubernetes distribution for Bare Metal
* [**Using Cozystack as a Kubernetes distribution**](https://cozystack.io/docs/guides/use-cases/kubernetes-distribution/)
You can use Cozystack as a Kubernetes distribution for Bare Metal
## Screenshot
@@ -33,11 +34,11 @@ You can use Cozystack as Kubernetes distribution for Bare Metal
## Documentation
The documentation is located on official [cozystack.io](https://cozystack.io) website.
The documentation is located on the [cozystack.io](https://cozystack.io) website.
Read [Get Started](https://cozystack.io/docs/get-started/) section for a quick start.
Read the [Getting Started](https://cozystack.io/docs/getting-started/) section for a quick start.
If you encounter any difficulties, start with the [troubleshooting guide](https://cozystack.io/docs/troubleshooting/), and work your way through the process that we've outlined.
If you encounter any difficulties, start with the [troubleshooting guide](https://cozystack.io/docs/operations/troubleshooting/) and work your way through the process that we've outlined.
## Versioning
@@ -50,15 +51,15 @@ A full list of the available releases is available in the GitHub repository's [R
Contributions are highly appreciated and very welcomed!
In case of bugs, please, check if the issue has been already opened by checking the [GitHub Issues](https://github.com/cozystack/cozystack/issues) section.
In case it isn't, you can open a new one: a detailed report will help us to replicate it, assess it, and work on a fix.
In case of bugs, please check if the issue has already been opened by checking the [GitHub Issues](https://github.com/cozystack/cozystack/issues) section.
If it isn't, you can open a new one. A detailed report will help us replicate it, assess it, and work on a fix.
You can express your intention in working on the fix on your own.
You can express your intention to on the fix on your own.
Commits are used to generate the changelog, and their author will be referenced in it.
In case of **Feature Requests** please use the [Discussion's Feature Request section](https://github.com/cozystack/cozystack/discussions/categories/feature-requests).
If you have **Feature Requests** please use the [Discussion's Feature Request section](https://github.com/cozystack/cozystack/discussions/categories/feature-requests).
You can join our weekly community meetings (just add this events to your [Google Calendar](https://calendar.google.com/calendar?cid=ZTQzZDIxZTVjOWI0NWE5NWYyOGM1ZDY0OWMyY2IxZTFmNDMzZTJlNjUzYjU2ZGJiZGE3NGNhMzA2ZjBkMGY2OEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t) or [iCal](https://calendar.google.com/calendar/ical/e43d21e5c9b45a95f28c5d649c2cb1e1f433e2e653b56dbbda74ca306f0d0f68%40group.calendar.google.com/public/basic.ics)) or [Telegram group](https://t.me/cozystack).
You are welcome to join our weekly community meetings (just add this events to your [Google Calendar](https://calendar.google.com/calendar?cid=ZTQzZDIxZTVjOWI0NWE5NWYyOGM1ZDY0OWMyY2IxZTFmNDMzZTJlNjUzYjU2ZGJiZGE3NGNhMzA2ZjBkMGY2OEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t) or [iCal](https://calendar.google.com/calendar/ical/e43d21e5c9b45a95f28c5d649c2cb1e1f433e2e653b56dbbda74ca306f0d0f68%40group.calendar.google.com/public/basic.ics)) or [Telegram group](https://t.me/cozystack).
## License

View File

@@ -178,6 +178,15 @@ func main() {
setupLog.Error(err, "unable to create controller", "controller", "WorkloadMonitor")
os.Exit(1)
}
if err = (&controller.WorkloadReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Workload")
os.Exit(1)
}
// +kubebuilder:scaffold:builder
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil {

View File

@@ -113,6 +113,11 @@ machine:
- usermode_helper=disabled
- name: zfs
- name: spl
registries:
mirrors:
docker.io:
endpoints:
- https://mirror.gcr.io
files:
- content: |
[plugins]
@@ -313,7 +318,12 @@ kubectl patch -n tenant-root tenants.apps.cozystack.io root --type=merge -p '{"s
timeout 60 sh -c 'until kubectl get hr -n tenant-root etcd ingress monitoring tenant-root; do sleep 1; done'
# Wait for HelmReleases be installed
kubectl wait --timeout=2m --for=condition=ready -n tenant-root hr etcd ingress monitoring tenant-root
kubectl wait --timeout=2m --for=condition=ready -n tenant-root hr etcd ingress tenant-root
if ! kubectl wait --timeout=2m --for=condition=ready -n tenant-root hr monitoring; then
flux reconcile hr monitoring -n tenant-root --force
kubectl wait --timeout=2m --for=condition=ready -n tenant-root hr monitoring
fi
kubectl patch -n tenant-root ingresses.apps.cozystack.io ingress --type=merge -p '{"spec":{
"dashboard": true
@@ -328,7 +338,7 @@ kubectl wait --timeout=5m --for=jsonpath=.status.readyReplicas=3 -n tenant-root
# Wait for Victoria metrics
kubectl wait --timeout=5m --for=jsonpath=.status.updateStatus=operational -n tenant-root vmalert/vmalert-shortterm vmalertmanager/alertmanager
kubectl wait --timeout=5m --for=jsonpath=.status.status=operational -n tenant-root vlogs/generic
kubectl wait --timeout=5m --for=jsonpath=.status.updateStatus=operational -n tenant-root vlogs/generic
kubectl wait --timeout=5m --for=jsonpath=.status.clusterStatus=operational -n tenant-root vmcluster/shortterm vmcluster/longterm
# Wait for grafana
@@ -347,5 +357,5 @@ kubectl patch -n cozy-system cm/cozystack --type=merge -p '{"data":{
"oidc-enabled": "true"
}}'
timeout 60 sh -c 'until kubectl get hr -n cozy-keycloak keycloak keycloak-configure keycloak-operator; do sleep 1; done'
timeout 120 sh -c 'until kubectl get hr -n cozy-keycloak keycloak keycloak-configure keycloak-operator; do sleep 1; done'
kubectl wait --timeout=10m --for=condition=ready -n cozy-keycloak hr keycloak keycloak-configure keycloak-operator

View File

@@ -19,21 +19,19 @@ fi
miss_map=$(echo "$new_map" | awk 'NR==FNR { nm[$1 " " $2] = $3; next } { if (!($1 " " $2 in nm)) print $1, $2, $3}' - "$file")
# search accross all tags sorted by version
search_commits=$(git ls-remote --tags origin | grep 'refs/tags/v' | sort -k2,2 -rV | awk '{print $1}')
# add latest main commit to search
search_commits="${search_commits} $(git rev-parse "origin/main")"
search_commits=$(git ls-remote --tags origin | awk -F/ '$3 ~ /v[0-9]+.[0-9]+.[0-9]+/ {print}' | sort -k2,2 -rV | awk '{print $1}')
resolved_miss_map=$(
echo "$miss_map" | while read -r chart version commit; do
# if version is found in HEAD, it's HEAD
if grep -q "^version: $version$" ./${chart}/Chart.yaml; then
if [ $(awk '$1 == "version:" {print $2}' ./${chart}/Chart.yaml) = "${version}" ]; then
echo "$chart $version HEAD"
continue
fi
# if commit is not HEAD, check if it's valid
if [ $commit != "HEAD" ]; then
if ! git show "${commit}:./${chart}/Chart.yaml" 2>/dev/null | grep -q "^version: $version$"; then
if [ $(git show "${commit}:./${chart}/Chart.yaml" 2>/dev/null | awk '$1 == "version:" {print $2}') != "${version}" ]; then
echo "Commit $commit for $chart $version is not valid" >&2
exit 1
fi
@@ -46,15 +44,15 @@ resolved_miss_map=$(
# if commit is HEAD, but version is not found in HEAD, check all tags
found_tag=""
for tag in $search_commits; do
if git show "${tag}:./${chart}/Chart.yaml" 2>/dev/null | grep -q "^version: $version$"; then
if [ $(git show "${tag}:./${chart}/Chart.yaml" 2>/dev/null | awk '$1 == "version:" {print $2}') = "${version}" ]; then
found_tag=$(git rev-parse --short "${tag}")
break
fi
done
if [ -z "$found_tag" ]; then
echo "Can't find $chart $version in any version tag or in the latest main commit" >&2
exit 1
echo "Can't find $chart $version in any version tag, removing it" >&2
continue
fi
echo "$chart $version $found_tag"

View File

@@ -7,3 +7,5 @@ gh release upload --clobber $version _out/assets/cozystack-installer.yaml
gh release upload --clobber $version _out/assets/metal-amd64.iso
gh release upload --clobber $version _out/assets/metal-amd64.raw.xz
gh release upload --clobber $version _out/assets/nocloud-amd64.raw.xz
gh release upload --clobber $version _out/assets/kernel-amd64
gh release upload --clobber $version _out/assets/initramfs-metal-amd64.xz

View File

@@ -0,0 +1,87 @@
package controller
import (
"context"
"strings"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
cozyv1alpha1 "github.com/cozystack/cozystack/api/v1alpha1"
)
// WorkloadMonitorReconciler reconciles a WorkloadMonitor object
type WorkloadReconciler struct {
client.Client
Scheme *runtime.Scheme
}
func (r *WorkloadReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
w := &cozyv1alpha1.Workload{}
err := r.Get(ctx, req.NamespacedName, w)
if err != nil {
if apierrors.IsNotFound(err) {
return ctrl.Result{}, nil
}
logger.Error(err, "Unable to fetch Workload")
return ctrl.Result{}, err
}
// it's being deleted, nothing to handle
if w.DeletionTimestamp != nil {
return ctrl.Result{}, nil
}
t := getMonitoredObject(w)
err = r.Get(ctx, types.NamespacedName{Name: t.GetName(), Namespace: t.GetNamespace()}, t)
// found object, nothing to do
if err == nil {
return ctrl.Result{}, nil
}
// error getting object but not 404 -- requeue
if !apierrors.IsNotFound(err) {
logger.Error(err, "failed to get dependent object", "kind", t.GetObjectKind(), "dependent-object-name", t.GetName())
return ctrl.Result{}, err
}
err = r.Delete(ctx, w)
if err != nil {
logger.Error(err, "failed to delete workload")
}
return ctrl.Result{}, err
}
// SetupWithManager registers our controller with the Manager and sets up watches.
func (r *WorkloadReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
// Watch WorkloadMonitor objects
For(&cozyv1alpha1.Workload{}).
Complete(r)
}
func getMonitoredObject(w *cozyv1alpha1.Workload) client.Object {
if strings.HasPrefix(w.Name, "pvc-") {
obj := &corev1.PersistentVolumeClaim{}
obj.Name = strings.TrimPrefix(w.Name, "pvc-")
obj.Namespace = w.Namespace
return obj
}
if strings.HasPrefix(w.Name, "svc-") {
obj := &corev1.Service{}
obj.Name = strings.TrimPrefix(w.Name, "svc-")
obj.Namespace = w.Namespace
return obj
}
obj := &corev1.Pod{}
obj.Name = w.Name
obj.Namespace = w.Namespace
return obj
}

View File

@@ -3,6 +3,7 @@ package controller
import (
"context"
"encoding/json"
"fmt"
"sort"
apierrors "k8s.io/apimachinery/pkg/api/errors"
@@ -33,6 +34,17 @@ type WorkloadMonitorReconciler struct {
// +kubebuilder:rbac:groups=cozystack.io,resources=workloads,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=cozystack.io,resources=workloads/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch
// +kubebuilder:rbac:groups=core,resources=persistentvolumeclaims,verbs=get;list;watch
// isServiceReady checks if the service has an external IP bound
func (r *WorkloadMonitorReconciler) isServiceReady(svc *corev1.Service) bool {
return len(svc.Status.LoadBalancer.Ingress) > 0
}
// isPVCReady checks if the PVC is bound
func (r *WorkloadMonitorReconciler) isPVCReady(pvc *corev1.PersistentVolumeClaim) bool {
return pvc.Status.Phase == corev1.ClaimBound
}
// isPodReady checks if the Pod is in the Ready condition.
func (r *WorkloadMonitorReconciler) isPodReady(pod *corev1.Pod) bool {
@@ -88,6 +100,96 @@ func updateOwnerReferences(obj metav1.Object, monitor client.Object) {
obj.SetOwnerReferences(owners)
}
// reconcileServiceForMonitor creates or updates a Workload object for the given Service and WorkloadMonitor.
func (r *WorkloadMonitorReconciler) reconcileServiceForMonitor(
ctx context.Context,
monitor *cozyv1alpha1.WorkloadMonitor,
svc corev1.Service,
) error {
logger := log.FromContext(ctx)
workload := &cozyv1alpha1.Workload{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("svc-%s", svc.Name),
Namespace: svc.Namespace,
},
}
resources := make(map[string]resource.Quantity)
q := resource.MustParse("0")
for _, ing := range svc.Status.LoadBalancer.Ingress {
if ing.IP != "" {
q.Add(resource.MustParse("1"))
}
}
resources["public-ips"] = q
_, err := ctrl.CreateOrUpdate(ctx, r.Client, workload, func() error {
// Update owner references with the new monitor
updateOwnerReferences(workload.GetObjectMeta(), monitor)
workload.Labels = svc.Labels
// Fill Workload status fields:
workload.Status.Kind = monitor.Spec.Kind
workload.Status.Type = monitor.Spec.Type
workload.Status.Resources = resources
workload.Status.Operational = r.isServiceReady(&svc)
return nil
})
if err != nil {
logger.Error(err, "Failed to CreateOrUpdate Workload", "workload", workload.Name)
return err
}
return nil
}
// reconcilePVCForMonitor creates or updates a Workload object for the given PVC and WorkloadMonitor.
func (r *WorkloadMonitorReconciler) reconcilePVCForMonitor(
ctx context.Context,
monitor *cozyv1alpha1.WorkloadMonitor,
pvc corev1.PersistentVolumeClaim,
) error {
logger := log.FromContext(ctx)
workload := &cozyv1alpha1.Workload{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("pvc-%s", pvc.Name),
Namespace: pvc.Namespace,
},
}
resources := make(map[string]resource.Quantity)
for resourceName, resourceQuantity := range pvc.Status.Capacity {
resources[resourceName.String()] = resourceQuantity
}
_, err := ctrl.CreateOrUpdate(ctx, r.Client, workload, func() error {
// Update owner references with the new monitor
updateOwnerReferences(workload.GetObjectMeta(), monitor)
workload.Labels = pvc.Labels
// Fill Workload status fields:
workload.Status.Kind = monitor.Spec.Kind
workload.Status.Type = monitor.Spec.Type
workload.Status.Resources = resources
workload.Status.Operational = r.isPVCReady(&pvc)
return nil
})
if err != nil {
logger.Error(err, "Failed to CreateOrUpdate Workload", "workload", workload.Name)
return err
}
return nil
}
// reconcilePodForMonitor creates or updates a Workload object for the given Pod and WorkloadMonitor.
func (r *WorkloadMonitorReconciler) reconcilePodForMonitor(
ctx context.Context,
@@ -205,6 +307,45 @@ func (r *WorkloadMonitorReconciler) Reconcile(ctx context.Context, req ctrl.Requ
}
}
pvcList := &corev1.PersistentVolumeClaimList{}
if err := r.List(
ctx,
pvcList,
client.InNamespace(monitor.Namespace),
client.MatchingLabels(monitor.Spec.Selector),
); err != nil {
logger.Error(err, "Unable to list PVCs for WorkloadMonitor", "monitor", monitor.Name)
return ctrl.Result{}, err
}
for _, pvc := range pvcList.Items {
if err := r.reconcilePVCForMonitor(ctx, monitor, pvc); err != nil {
logger.Error(err, "Failed to reconcile Workload for PVC", "PVC", pvc.Name)
continue
}
}
svcList := &corev1.ServiceList{}
if err := r.List(
ctx,
svcList,
client.InNamespace(monitor.Namespace),
client.MatchingLabels(monitor.Spec.Selector),
); err != nil {
logger.Error(err, "Unable to list Services for WorkloadMonitor", "monitor", monitor.Name)
return ctrl.Result{}, err
}
for _, svc := range svcList.Items {
if svc.Spec.Type != corev1.ServiceTypeLoadBalancer {
continue
}
if err := r.reconcileServiceForMonitor(ctx, monitor, svc); err != nil {
logger.Error(err, "Failed to reconcile Workload for Service", "Service", svc.Name)
continue
}
}
// Update WorkloadMonitor status based on observed pods
monitor.Status.ObservedReplicas = observedReplicas
monitor.Status.AvailableReplicas = availableReplicas
@@ -233,41 +374,51 @@ func (r *WorkloadMonitorReconciler) SetupWithManager(mgr ctrl.Manager) error {
// Also watch Pod objects and map them back to WorkloadMonitor if labels match
Watches(
&corev1.Pod{},
handler.EnqueueRequestsFromMapFunc(func(ctx context.Context, obj client.Object) []reconcile.Request {
pod, ok := obj.(*corev1.Pod)
if !ok {
return nil
}
var monitorList cozyv1alpha1.WorkloadMonitorList
// List all WorkloadMonitors in the same namespace
if err := r.List(ctx, &monitorList, client.InNamespace(pod.Namespace)); err != nil {
return nil
}
// Match each monitor's selector with the Pod's labels
var requests []reconcile.Request
for _, m := range monitorList.Items {
matches := true
for k, v := range m.Spec.Selector {
if podVal, exists := pod.Labels[k]; !exists || podVal != v {
matches = false
break
}
}
if matches {
requests = append(requests, reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: m.Namespace,
Name: m.Name,
},
})
}
}
return requests
}),
handler.EnqueueRequestsFromMapFunc(mapObjectToMonitor(&corev1.Pod{}, r.Client)),
).
// Watch PVCs as well
Watches(
&corev1.PersistentVolumeClaim{},
handler.EnqueueRequestsFromMapFunc(mapObjectToMonitor(&corev1.PersistentVolumeClaim{}, r.Client)),
).
// Watch for changes to Workload objects we create (owned by WorkloadMonitor)
Owns(&cozyv1alpha1.Workload{}).
Complete(r)
}
func mapObjectToMonitor[T client.Object](_ T, c client.Client) func(ctx context.Context, obj client.Object) []reconcile.Request {
return func(ctx context.Context, obj client.Object) []reconcile.Request {
concrete, ok := obj.(T)
if !ok {
return nil
}
var monitorList cozyv1alpha1.WorkloadMonitorList
// List all WorkloadMonitors in the same namespace
if err := c.List(ctx, &monitorList, client.InNamespace(concrete.GetNamespace())); err != nil {
return nil
}
labels := concrete.GetLabels()
// Match each monitor's selector with the Pod's labels
var requests []reconcile.Request
for _, m := range monitorList.Items {
matches := true
for k, v := range m.Spec.Selector {
if labelVal, exists := labels[k]; !exists || labelVal != v {
matches = false
break
}
}
if matches {
requests = append(requests, reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: m.Namespace,
Name: m.Name,
},
})
}
}
return requests
}
}

View File

@@ -16,7 +16,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.17.0
version: 0.17.1
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to

View File

@@ -85,7 +85,7 @@ kamajiControlPlane:
# memory: 512Mi
## @param kamajiControlPlane.apiServer.resourcesPreset Set container resources according to one common preset (allowed values: none, nano, micro, small, medium, large, xlarge, 2xlarge). This is ignored if resources is set (resources is recommended for production).
resourcesPreset: "micro"
resourcesPreset: "small"
controllerManager:
## @param kamajiControlPlane.controllerManager.resources Resources

View File

@@ -4,4 +4,4 @@ description: Separated tenant namespace
icon: /logos/tenant.svg
type: application
version: 1.9.1
version: 1.9.2

View File

@@ -46,4 +46,8 @@ spec:
resources: {}
oncall:
enabled: false
{{- if .Values.ingress }}
dependsOn:
- name: ingress
{{- end }}
{{- end }}

View File

@@ -56,7 +56,8 @@ kubernetes 0.15.0 4e68e65c
kubernetes 0.15.1 160e4e2a
kubernetes 0.15.2 8267072d
kubernetes 0.16.0 077045b0
kubernetes 0.17.0 HEAD
kubernetes 0.17.0 1fbbfcd0
kubernetes 0.17.1 HEAD
mysql 0.1.0 263e47be
mysql 0.2.0 c24a103f
mysql 0.3.0 53f2365e
@@ -127,7 +128,8 @@ tenant 1.6.8 bc95159a
tenant 1.7.0 24fa7222
tenant 1.8.0 160e4e2a
tenant 1.9.0 728743db
tenant 1.9.1 HEAD
tenant 1.9.1 de19450f
tenant 1.9.2 HEAD
virtual-machine 0.1.4 f2015d65
virtual-machine 0.1.5 263e47be
virtual-machine 0.2.0 c0685f43
@@ -139,7 +141,8 @@ virtual-machine 0.7.0 e23286a3
virtual-machine 0.7.1 0ab39f20
virtual-machine 0.8.0 3fa4dd3a
virtual-machine 0.8.1 93c46161
virtual-machine 0.8.2 HEAD
virtual-machine 0.8.2 de19450f
virtual-machine 0.9.0 HEAD
vm-disk 0.1.0 d971f2ff
vm-disk 0.1.1 HEAD
vm-instance 0.1.0 1ec10165
@@ -148,7 +151,8 @@ vm-instance 0.3.0 4e68e65c
vm-instance 0.4.0 e23286a3
vm-instance 0.4.1 0ab39f20
vm-instance 0.5.0 3fa4dd3a
vm-instance 0.5.1 HEAD
vm-instance 0.5.1 de19450f
vm-instance 0.6.0 HEAD
vpn 0.1.0 263e47be
vpn 0.2.0 53f2365e
vpn 0.3.0 6c5cf5bf

View File

@@ -17,10 +17,10 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.8.2
version: 0.9.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "0.8.2"
appVersion: 0.9.0

View File

@@ -2,6 +2,7 @@ include ../../../scripts/package.mk
generate:
readme-generator -v values.yaml -s values.schema.json -r README.md
yq -o json -i '.properties.gpus.items.type = "object" | .properties.gpus.default = []' values.schema.json
INSTANCE_TYPES=$$(yq e '.metadata.name' -o=json -r ../../system/kubevirt-instancetypes/templates/instancetypes.yaml | yq 'split(" ") | . + [""]' -o json) \
&& yq -i -o json ".properties.instanceType.optional=true | .properties.instanceType.enum = $${INSTANCE_TYPES}" values.schema.json
PREFERENCES=$$(yq e '.metadata.name' -o=json -r ../../system/kubevirt-instancetypes/templates/preferences.yaml | yq 'split(" ") | . + [""]' -o json) \

View File

@@ -36,22 +36,23 @@ virtctl ssh <user>@<vm>
### Common parameters
| Name | Description | Value |
| ------------------------- | ---------------------------------------------------------------------------------------------------------- | ---------------- |
| `external` | Enable external access from outside the cluster | `false` |
| `externalMethod` | specify method to passthrough the traffic to the virtual machine. Allowed values: `WholeIP` and `PortList` | `WholeIP` |
| `externalPorts` | Specify ports to forward from outside the cluster | `[]` |
| `running` | Determines if the virtual machine should be running | `true` |
| `instanceType` | Virtual Machine instance type | `u1.medium` |
| `instanceProfile` | Virtual Machine prefferences profile | `ubuntu` |
| `systemDisk.image` | The base image for the virtual machine. Allowed values: `ubuntu`, `cirros`, `alpine`, `fedora` and `talos` | `ubuntu` |
| `systemDisk.storage` | The size of the disk allocated for the virtual machine | `5Gi` |
| `systemDisk.storageClass` | StorageClass used to store the data | `replicated` |
| `resources.cpu` | The number of CPU cores allocated to the virtual machine | `""` |
| `resources.memory` | The amount of memory allocated to the virtual machine | `""` |
| `sshKeys` | List of SSH public keys for authentication. Can be a single key or a list of keys. | `[]` |
| `cloudInit` | cloud-init user data config. See cloud-init documentation for more details. | `#cloud-config
` |
| Name | Description | Value |
| ------------------------- | ---------------------------------------------------------------------------------------------------------- | ------------ |
| `external` | Enable external access from outside the cluster | `false` |
| `externalMethod` | specify method to passthrough the traffic to the virtual machine. Allowed values: `WholeIP` and `PortList` | `WholeIP` |
| `externalPorts` | Specify ports to forward from outside the cluster | `[]` |
| `running` | Determines if the virtual machine should be running | `true` |
| `instanceType` | Virtual Machine instance type | `u1.medium` |
| `instanceProfile` | Virtual Machine preferences profile | `ubuntu` |
| `systemDisk.image` | The base image for the virtual machine. Allowed values: `ubuntu`, `cirros`, `alpine`, `fedora` and `talos` | `ubuntu` |
| `systemDisk.storage` | The size of the disk allocated for the virtual machine | `5Gi` |
| `systemDisk.storageClass` | StorageClass used to store the data | `replicated` |
| `gpus` | List of GPUs to attach | `[]` |
| `resources.cpu` | The number of CPU cores allocated to the virtual machine | `""` |
| `resources.memory` | The amount of memory allocated to the virtual machine | `""` |
| `sshKeys` | List of SSH public keys for authentication. Can be a single key or a list of keys. | `[]` |
| `cloudInit` | cloud-init user data config. See cloud-init documentation for more details. | `""` |
| `cloudInitSeed` | A seed string to generate an SMBIOS UUID for the VM. | `""` |
## U Series

View File

@@ -49,3 +49,23 @@ Selector labels
app.kubernetes.io/name: {{ include "virtual-machine.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Generate a stable UUID for cloud-init re-initialization upon upgrade.
*/}}
{{- define "virtual-machine.stableUuid" -}}
{{- $source := printf "%s-%s-%s" .Release.Namespace (include "virtual-machine.fullname" .) .Values.cloudInitSeed }}
{{- $hash := sha256sum $source }}
{{- $uuid := printf "%s-%s-4%s-9%s-%s" (substr 0 8 $hash) (substr 8 12 $hash) (substr 13 16 $hash) (substr 17 20 $hash) (substr 20 32 $hash) }}
{{- if eq .Values.cloudInitSeed "" }}
{{- /* Try to save previous uuid to not trigger full cloud-init again if user decided to remove the seed. */}}
{{- $vmResource := lookup "kubevirt.io/v1" "VirtualMachine" .Release.Namespace (include "virtual-machine.fullname" .) -}}
{{- if $vmResource }}
{{- $existingUuid := $vmResource | dig "spec" "template" "spec" "domain" "firmware" "uuid" "" }}
{{- if $existingUuid }}
{{- $uuid = $existingUuid }}
{{- end }}
{{- end }}
{{- end }}
{{- $uuid }}
{{- end }}

View File

@@ -68,7 +68,15 @@ spec:
requests:
memory: {{ .Values.resources.memory | quote }}
{{- end }}
firmware:
uuid: {{ include "virtual-machine.stableUuid" . }}
devices:
{{- if .Values.gpus }}
gpus:
{{- range $i, $gpu := .Values.gpus }}
- deviceName: {{ $gpu.name }}
{{- end }}
{{- end }}
disks:
- disk:
bus: scsi
@@ -90,6 +98,7 @@ spec:
secret:
secretName: {{ include "virtual-machine.fullname" $ }}-ssh-keys
propagationMethod:
# keys will be injected into metadata part of cloud-init disk
noCloud: {}
{{- end }}
terminationGracePeriodSeconds: 30
@@ -100,8 +109,14 @@ spec:
{{- if or .Values.sshKeys .Values.cloudInit }}
- name: cloudinitdisk
cloudInitNoCloud:
{{- if .Values.cloudInit }}
secretRef:
name: {{ include "virtual-machine.fullname" . }}-cloud-init
{{- else }}
userData: |
#cloud-config
final_message: Cloud-init user-data was left blank intentionally.
{{- end }}
{{- end }}
networks:
- name: default

View File

@@ -88,7 +88,7 @@
},
"instanceProfile": {
"type": "string",
"description": "Virtual Machine prefferences profile",
"description": "Virtual Machine preferences profile",
"default": "ubuntu",
"optional": true,
"enum": [
@@ -164,6 +164,14 @@
}
}
},
"gpus": {
"type": "array",
"description": "List of GPUs to attach",
"default": [],
"items": {
"type": "object"
}
},
"resources": {
"type": "object",
"properties": {
@@ -190,7 +198,12 @@
"cloudInit": {
"type": "string",
"description": "cloud-init user data config. See cloud-init documentation for more details.",
"default": "#cloud-config\n"
"default": ""
},
"cloudInitSeed": {
"type": "string",
"description": "A seed string to generate an SMBIOS UUID for the VM.",
"default": ""
}
}
}

View File

@@ -12,7 +12,7 @@ externalPorts:
running: true
## @param instanceType Virtual Machine instance type
## @param instanceProfile Virtual Machine prefferences profile
## @param instanceProfile Virtual Machine preferences profile
##
instanceType: "u1.medium"
instanceProfile: ubuntu
@@ -26,6 +26,12 @@ systemDisk:
storage: 5Gi
storageClass: replicated
## @param gpus [array] List of GPUs to attach
## Example:
## gpus:
## - name: nvidia.com/GA102GL_A10
gpus: []
## @param resources.cpu The number of CPU cores allocated to the virtual machine
## @param resources.memory The amount of memory allocated to the virtual machine
resources:
@@ -49,5 +55,13 @@ sshKeys: []
## password: ubuntu
## chpasswd: { expire: False }
##
cloudInit: |
#cloud-config
cloudInit: ""
## @param cloudInitSeed A seed string to generate an SMBIOS UUID for the VM.
cloudInitSeed: ""
## Change it to any new value to force a full cloud-init reconfiguration. Change it when you want to apply
## to an existing VM settings that are usually written only once, like new SSH keys or new network configuration.
## An empty value does nothing (and the existing UUID is not reverted). Please note that changing this value
## does not trigger a VM restart. You must perform the restart separately.
## Example:
## cloudInitSeed: "upd1"

View File

@@ -17,10 +17,10 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.5.1
version: 0.6.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
# It is recommended to use it with quotes.
appVersion: "0.5.1"
appVersion: 0.6.0

View File

@@ -3,6 +3,7 @@ include ../../../scripts/package.mk
generate:
readme-generator -v values.yaml -s values.schema.json -r README.md
yq -o json -i '.properties.disks.items.type = "object" | .properties.disks.default = []' values.schema.json
yq -o json -i '.properties.gpus.items.type = "object" | .properties.gpus.default = []' values.schema.json
INSTANCE_TYPES=$$(yq e '.metadata.name' -o=json -r ../../system/kubevirt-instancetypes/templates/instancetypes.yaml | yq 'split(" ") | . + [""]' -o json) \
&& yq -i -o json ".properties.instanceType.optional=true | .properties.instanceType.enum = $${INSTANCE_TYPES}" values.schema.json
PREFERENCES=$$(yq e '.metadata.name' -o=json -r ../../system/kubevirt-instancetypes/templates/preferences.yaml | yq 'split(" ") | . + [""]' -o json) \

View File

@@ -36,20 +36,21 @@ virtctl ssh <user>@<vm>
### Common parameters
| Name | Description | Value |
| ------------------ | ---------------------------------------------------------------------------------------------------------- | ---------------- |
| `external` | Enable external access from outside the cluster | `false` |
| `externalMethod` | specify method to passthrough the traffic to the virtual machine. Allowed values: `WholeIP` and `PortList` | `WholeIP` |
| `externalPorts` | Specify ports to forward from outside the cluster | `[]` |
| `running` | Determines if the virtual machine should be running | `true` |
| `instanceType` | Virtual Machine instance type | `u1.medium` |
| `instanceProfile` | Virtual Machine prefferences profile | `ubuntu` |
| `disks` | List of disks to attach | `[]` |
| `resources.cpu` | The number of CPU cores allocated to the virtual machine | `""` |
| `resources.memory` | The amount of memory allocated to the virtual machine | `""` |
| `sshKeys` | List of SSH public keys for authentication. Can be a single key or a list of keys. | `[]` |
| `cloudInit` | cloud-init user data config. See cloud-init documentation for more details. | `#cloud-config
` |
| Name | Description | Value |
| ------------------ | ---------------------------------------------------------------------------------------------------------- | ----------- |
| `external` | Enable external access from outside the cluster | `false` |
| `externalMethod` | specify method to passthrough the traffic to the virtual machine. Allowed values: `WholeIP` and `PortList` | `WholeIP` |
| `externalPorts` | Specify ports to forward from outside the cluster | `[]` |
| `running` | Determines if the virtual machine should be running | `true` |
| `instanceType` | Virtual Machine instance type | `u1.medium` |
| `instanceProfile` | Virtual Machine preferences profile | `ubuntu` |
| `disks` | List of disks to attach | `[]` |
| `gpus` | List of GPUs to attach | `[]` |
| `resources.cpu` | The number of CPU cores allocated to the virtual machine | `""` |
| `resources.memory` | The amount of memory allocated to the virtual machine | `""` |
| `sshKeys` | List of SSH public keys for authentication. Can be a single key or a list of keys. | `[]` |
| `cloudInit` | cloud-init user data config. See cloud-init documentation for more details. | `""` |
| `cloudInitSeed` | A seed string to generate an SMBIOS UUID for the VM. | `""` |
## U Series

View File

@@ -49,3 +49,23 @@ Selector labels
app.kubernetes.io/name: {{ include "virtual-machine.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Generate a stable UUID for cloud-init re-initialization upon upgrade.
*/}}
{{- define "virtual-machine.stableUuid" -}}
{{- $source := printf "%s-%s-%s" .Release.Namespace (include "virtual-machine.fullname" .) .Values.cloudInitSeed }}
{{- $hash := sha256sum $source }}
{{- $uuid := printf "%s-%s-4%s-9%s-%s" (substr 0 8 $hash) (substr 8 12 $hash) (substr 13 16 $hash) (substr 17 20 $hash) (substr 20 32 $hash) }}
{{- if eq .Values.cloudInitSeed "" }}
{{- /* Try to save previous uuid to not trigger full cloud-init again if user decided to remove the seed. */}}
{{- $vmResource := lookup "kubevirt.io/v1" "VirtualMachine" .Release.Namespace (include "virtual-machine.fullname" .) -}}
{{- if $vmResource }}
{{- $existingUuid := $vmResource | dig "spec" "template" "spec" "domain" "firmware" "uuid" "" }}
{{- if $existingUuid }}
{{- $uuid = $existingUuid }}
{{- end }}
{{- end }}
{{- end }}
{{- $uuid }}
{{- end }}

View File

@@ -22,5 +22,5 @@ spec:
kind: virtual-machine
type: virtual-machine
selector:
vm.kubevirt.io/name: {{ $.Release.Name }}
{{- include "virtual-machine.selectorLabels" . | nindent 4 }}
version: {{ $.Chart.Version }}

View File

@@ -1,8 +1,8 @@
{{- if and .Values.instanceType (not (lookup "instancetype.kubevirt.io/v1beta1" "VirtualMachineClusterInstancetype" "" .Values.instanceType)) }}
{{- fail (printf "Specified instancetype not exists in cluster: %s" .Values.instanceType) }}
{{- fail (printf "Specified instanceType does not exist in the cluster: %s" .Values.instanceType) }}
{{- end }}
{{- if and .Values.instanceProfile (not (lookup "instancetype.kubevirt.io/v1beta1" "VirtualMachineClusterPreference" "" .Values.instanceProfile)) }}
{{- fail (printf "Specified profile not exists in cluster: %s" .Values.instanceProfile) }}
{{- fail (printf "Specified instanceProfile does not exist in the cluster: %s" .Values.instanceProfile) }}
{{- end }}
apiVersion: kubevirt.io/v1
@@ -40,11 +40,19 @@ spec:
requests:
memory: {{ .Values.resources.memory | quote }}
{{- end }}
firmware:
uuid: {{ include "virtual-machine.stableUuid" . }}
devices:
{{- if .Values.gpus }}
gpus:
{{- range $i, $gpu := .Values.gpus }}
- deviceName: {{ $gpu.name }}
{{- end }}
{{- end }}
disks:
{{- range $i, $disk := .Values.disks }}
- name: disk-{{ .name }}
{{- $disk := lookup "cdi.kubevirt.io/v1beta1" "DataVolume" $.Release.Namespace (printf "vm-disk-%s" .name) }}
- name: disk-{{ $disk.name }}
{{- $disk := lookup "cdi.kubevirt.io/v1beta1" "DataVolume" $.Release.Namespace (printf "vm-disk-%s" $disk.name) }}
{{- if $disk }}
{{- if and (hasKey $disk.metadata.annotations "vm-disk.cozystack.io/optical") (eq (index $disk.metadata.annotations "vm-disk.cozystack.io/optical") "true") }}
cdrom: {}
@@ -75,6 +83,7 @@ spec:
secret:
secretName: {{ include "virtual-machine.fullname" $ }}-ssh-keys
propagationMethod:
# keys will be injected into metadata part of cloud-init disk
noCloud: {}
{{- end }}
terminationGracePeriodSeconds: 30
@@ -87,8 +96,14 @@ spec:
{{- if or .Values.sshKeys .Values.cloudInit }}
- name: cloudinitdisk
cloudInitNoCloud:
{{- if .Values.cloudInit }}
secretRef:
name: {{ include "virtual-machine.fullname" . }}-cloud-init
{{- else }}
userData: |
#cloud-config
final_message: Cloud-init user-data was left blank intentionally.
{{- end }}
{{- end }}
networks:
- name: default

View File

@@ -88,7 +88,7 @@
},
"instanceProfile": {
"type": "string",
"description": "Virtual Machine prefferences profile",
"description": "Virtual Machine preferences profile",
"default": "ubuntu",
"optional": true,
"enum": [
@@ -145,6 +145,14 @@
"type": "object"
}
},
"gpus": {
"type": "array",
"description": "List of GPUs to attach",
"default": [],
"items": {
"type": "object"
}
},
"resources": {
"type": "object",
"properties": {
@@ -171,7 +179,12 @@
"cloudInit": {
"type": "string",
"description": "cloud-init user data config. See cloud-init documentation for more details.",
"default": "#cloud-config\n"
"default": ""
},
"cloudInitSeed": {
"type": "string",
"description": "A seed string to generate an SMBIOS UUID for the VM.",
"default": ""
}
}
}

View File

@@ -12,7 +12,7 @@ externalPorts:
running: true
## @param instanceType Virtual Machine instance type
## @param instanceProfile Virtual Machine prefferences profile
## @param instanceProfile Virtual Machine preferences profile
##
instanceType: "u1.medium"
instanceProfile: ubuntu
@@ -24,6 +24,12 @@ instanceProfile: ubuntu
## - name: example-data
disks: []
## @param gpus [array] List of GPUs to attach
## Example:
## gpus:
## - name: nvidia.com/GA102GL_A10
gpus: []
## @param resources.cpu The number of CPU cores allocated to the virtual machine
## @param resources.memory The amount of memory allocated to the virtual machine
resources:
@@ -47,5 +53,13 @@ sshKeys: []
## password: ubuntu
## chpasswd: { expire: False }
##
cloudInit: |
#cloud-config
cloudInit: ""
## @param cloudInitSeed A seed string to generate an SMBIOS UUID for the VM.
cloudInitSeed: ""
## Change it to any new value to force a full cloud-init reconfiguration. Change it when you want to apply
## to an existing VM settings that are usually written only once, like new SSH keys or new network configuration.
## An empty value does nothing (and the existing UUID is not reverted). Please note that changing this value
## does not trigger a VM restart. You must perform the restart separately.
## Example:
## cloudInitSeed: "upd1"

View File

@@ -59,7 +59,7 @@ image-matchbox:
> ../../extra/bootbox/images/matchbox.tag
rm -f images/matchbox.json
assets: talos-iso talos-nocloud talos-metal
assets: talos-iso talos-nocloud talos-metal talos-kernel talos-initramfs
talos-initramfs talos-kernel talos-installer talos-iso talos-nocloud talos-metal:
mkdir -p ../../../_out/assets

View File

@@ -116,7 +116,7 @@ releases:
chart: cozy-monitoring-agents
namespace: cozy-monitoring
privileged: true
dependsOn: [cilium,kubeovn,victoria-metrics-operator]
dependsOn: [victoria-metrics-operator, vertical-pod-autoscaler-crds]
values:
scrapeRules:
etcd:
@@ -153,6 +153,17 @@ releases:
namespace: cozy-kubevirt-cdi
dependsOn: [cilium,kubeovn,kubevirt-cdi-operator]
- name: gpu-operator
releaseName: gpu-operator
chart: cozy-gpu-operator
namespace: cozy-gpu-operator
privileged: true
optional: true
dependsOn: [cilium,kubeovn]
valuesFiles:
- values.yaml
- values-talos.yaml
- name: metallb
releaseName: metallb
chart: cozy-metallb
@@ -388,6 +399,13 @@ releases:
privileged: true
dependsOn: [monitoring-agents]
- name: vertical-pod-autoscaler-crds
releaseName: vertical-pod-autoscaler-crds
chart: cozy-vertical-pod-autoscaler-crds
namespace: cozy-vertical-pod-autoscaler
privileged: true
dependsOn: [cilium, kubeovn]
- name: reloader
releaseName: reloader
chart: cozy-reloader

View File

@@ -69,7 +69,7 @@ releases:
chart: cozy-monitoring-agents
namespace: cozy-monitoring
privileged: true
dependsOn: [victoria-metrics-operator]
dependsOn: [victoria-metrics-operator, vertical-pod-autoscaler-crds]
values:
scrapeRules:
etcd:
@@ -254,3 +254,10 @@ releases:
namespace: cozy-vertical-pod-autoscaler
privileged: true
dependsOn: [monitoring-agents]
- name: vertical-pod-autoscaler-crds
releaseName: vertical-pod-autoscaler-crds
chart: cozy-vertical-pod-autoscaler-crds
namespace: cozy-vertical-pod-autoscaler
privileged: true
dependsOn: [cilium, kubeovn]

View File

@@ -14,3 +14,4 @@ RUN curl -LO "https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kube
&& mv kubectl /usr/local/bin/kubectl
RUN curl -sSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash -s - --version "v${HELM_VERSION}"
RUN wget https://github.com/mikefarah/yq/releases/download/v4.44.3/yq_linux_amd64 -O /usr/local/bin/yq && chmod +x /usr/local/bin/yq
RUN curl -s https://fluxcd.io/install.sh | bash

View File

@@ -4,6 +4,8 @@ kind: VLogs
metadata:
name: {{ .name }}
spec:
image:
tag: v1.17.0-victorialogs
storage:
resources:
requests:

View File

@@ -1,6 +1,6 @@
apiVersion: v2
appVersion: 0.17.0
appVersion: 0.18.1
description: Cluster API Operator
name: cluster-api-operator
type: application
version: 0.17.0
version: 0.18.1

View File

@@ -26,8 +26,10 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: {{ $addonNamespace }}
---
@@ -37,8 +39,10 @@ metadata:
name: {{ $addonName }}
namespace: {{ $addonNamespace }}
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- if or $addonVersion $.Values.secretName }}
spec:

View File

@@ -26,8 +26,11 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: {{ $bootstrapNamespace }}
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
@@ -36,8 +39,11 @@ metadata:
name: {{ $bootstrapName }}
namespace: {{ $bootstrapNamespace }}
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- if or $bootstrapVersion $.Values.configSecret.name }}
spec:
{{- end}}

View File

@@ -26,8 +26,11 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: {{ $controlPlaneNamespace }}
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
@@ -36,8 +39,11 @@ metadata:
name: {{ $controlPlaneName }}
namespace: {{ $controlPlaneNamespace }}
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- if or $controlPlaneVersion $.Values.configSecret.name $.Values.manager }}
spec:
{{- end}}

View File

@@ -1,4 +1,4 @@
{{- if or .Values.addon .Values.bootstrap .Values.controlPlane .Values.infrastructure }}
{{- if or .Values.addon .Values.bootstrap .Values.controlPlane .Values.infrastructure .Values.ipam }}
# Deploy core components if not specified
{{- if not .Values.core }}
---
@@ -6,8 +6,11 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: capi-system
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
@@ -16,8 +19,11 @@ metadata:
name: cluster-api
namespace: capi-system
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- with .Values.configSecret }}
spec:
configSecret:
@@ -28,4 +34,3 @@ spec:
{{- end }}
{{- end }}
{{- end }}

View File

@@ -25,8 +25,11 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: {{ $coreNamespace }}
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
@@ -35,8 +38,10 @@ metadata:
name: {{ $coreName }}
namespace: {{ $coreNamespace }}
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- if or $coreVersion $.Values.configSecret.name $.Values.manager }}
spec:
@@ -45,8 +50,8 @@ spec:
version: {{ $coreVersion }}
{{- end }}
{{- if $.Values.manager }}
manager:
{{- if and $.Values.manager.featureGates $.Values.manager.featureGates.core }}
manager:
featureGates:
{{- range $key, $value := $.Values.manager.featureGates.core }}
{{ $key }}: {{ $value }}

View File

@@ -7,8 +7,10 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: capi-kubeadm-bootstrap-system
---
@@ -18,8 +20,10 @@ metadata:
name: kubeadm
namespace: capi-kubeadm-bootstrap-system
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- with .Values.configSecret }}
spec:
@@ -37,8 +41,10 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: capi-kubeadm-control-plane-system
---
@@ -48,14 +54,16 @@ metadata:
name: kubeadm
namespace: capi-kubeadm-control-plane-system
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- with .Values.configSecret }}
spec:
{{- if $.Values.manager }}
manager:
{{- if and $.Values.manager.featureGates $.Values.manager.featureGates.kubeadm }}
manager:
featureGates:
{{- range $key, $value := $.Values.manager.featureGates.kubeadm }}
{{ $key }}: {{ $value }}

View File

@@ -26,8 +26,10 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: {{ $infrastructureNamespace }}
---
@@ -37,8 +39,10 @@ metadata:
name: {{ $infrastructureName }}
namespace: {{ $infrastructureNamespace }}
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- if or $infrastructureVersion $.Values.configSecret.name $.Values.manager $.Values.additionalDeployments }}
spec:
@@ -47,8 +51,8 @@ spec:
version: {{ $infrastructureVersion }}
{{- end }}
{{- if $.Values.manager }}
manager:
{{- if and (kindIs "map" $.Values.manager.featureGates) (hasKey $.Values.manager.featureGates $infrastructureName) }}
manager:
{{- range $key, $value := $.Values.manager.featureGates }}
{{- if eq $key $infrastructureName }}
featureGates:

View File

@@ -26,8 +26,10 @@ apiVersion: v1
kind: Namespace
metadata:
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "1"
{{- end }}
"argocd.argoproj.io/sync-wave": "1"
name: {{ $ipamNamespace }}
---
@@ -37,8 +39,10 @@ metadata:
name: {{ $ipamName }}
namespace: {{ $ipamNamespace }}
annotations:
{{- if $.Values.enableHelmHook }}
"helm.sh/hook": "post-install,post-upgrade"
"helm.sh/hook-weight": "2"
{{- end }}
"argocd.argoproj.io/sync-wave": "2"
{{- if or $ipamVersion $.Values.configSecret.name $.Values.manager $.Values.additionalDeployments }}
spec:
@@ -47,8 +51,8 @@ spec:
version: {{ $ipamVersion }}
{{- end }}
{{- if $.Values.manager }}
manager:
{{- if and (kindIs "map" $.Values.manager.featureGates) (hasKey $.Values.manager.featureGates $ipamName) }}
manager:
{{- range $key, $value := $.Values.manager.featureGates }}
{{- if eq $key $ipamName }}
featureGates:

View File

@@ -21,7 +21,7 @@ leaderElection:
image:
manager:
repository: registry.k8s.io/capi-operator/cluster-api-operator
tag: v0.17.0
tag: v0.18.1
pullPolicy: IfNotPresent
env:
manager: []
@@ -69,3 +69,4 @@ volumeMounts:
- mountPath: /tmp/k8s-webhook-server/serving-certs
name: cert
readOnly: true
enableHelmHook: true

View File

@@ -79,7 +79,7 @@ annotations:
Pod IP Pool\n description: |\n CiliumPodIPPool defines an IP pool that can
be used for pooled IPAM (i.e. the multi-pool IPAM mode).\n"
apiVersion: v2
appVersion: 1.17.1
appVersion: 1.17.2
description: eBPF-based Networking, Security, and Observability
home: https://cilium.io/
icon: https://cdn.jsdelivr.net/gh/cilium/cilium@main/Documentation/images/logo-solo.svg
@@ -95,4 +95,4 @@ kubeVersion: '>= 1.21.0-0'
name: cilium
sources:
- https://github.com/cilium/cilium
version: 1.17.1
version: 1.17.2

View File

@@ -1,6 +1,6 @@
# cilium
![Version: 1.17.1](https://img.shields.io/badge/Version-1.17.1-informational?style=flat-square) ![AppVersion: 1.17.1](https://img.shields.io/badge/AppVersion-1.17.1-informational?style=flat-square)
![Version: 1.17.2](https://img.shields.io/badge/Version-1.17.2-informational?style=flat-square) ![AppVersion: 1.17.2](https://img.shields.io/badge/AppVersion-1.17.2-informational?style=flat-square)
Cilium is open source software for providing and transparently securing
network connectivity and loadbalancing between application workloads such as
@@ -85,7 +85,7 @@ contributors across the globe, there is almost always someone available to help.
| authentication.mutual.spire.install.agent.tolerations | list | `[{"effect":"NoSchedule","key":"node.kubernetes.io/not-ready"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane"},{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"},{"key":"CriticalAddonsOnly","operator":"Exists"}]` | SPIRE agent tolerations configuration By default it follows the same tolerations as the agent itself to allow the Cilium agent on this node to connect to SPIRE. ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ |
| authentication.mutual.spire.install.enabled | bool | `true` | Enable SPIRE installation. This will only take effect only if authentication.mutual.spire.enabled is true |
| authentication.mutual.spire.install.existingNamespace | bool | `false` | SPIRE namespace already exists. Set to true if Helm should not create, manage, and import the SPIRE namespace. |
| authentication.mutual.spire.install.initImage | object | `{"digest":"sha256:a5d0ce49aa801d475da48f8cb163c354ab95cab073cd3c138bd458fc8257fbf1","override":null,"pullPolicy":"IfNotPresent","repository":"docker.io/library/busybox","tag":"1.37.0","useDigest":true}` | init container image of SPIRE agent and server |
| authentication.mutual.spire.install.initImage | object | `{"digest":"sha256:498a000f370d8c37927118ed80afe8adc38d1edcbfc071627d17b25c88efcab0","override":null,"pullPolicy":"IfNotPresent","repository":"docker.io/library/busybox","tag":"1.37.0","useDigest":true}` | init container image of SPIRE agent and server |
| authentication.mutual.spire.install.namespace | string | `"cilium-spire"` | SPIRE namespace to install into |
| authentication.mutual.spire.install.server.affinity | object | `{}` | SPIRE server affinity configuration |
| authentication.mutual.spire.install.server.annotations | object | `{}` | SPIRE server annotations |
@@ -131,6 +131,8 @@ contributors across the globe, there is almost always someone available to help.
| bpf.ctTcpMax | int | `524288` | Configure the maximum number of entries in the TCP connection tracking table. |
| bpf.datapathMode | string | `veth` | Mode for Pod devices for the core datapath (veth, netkit, netkit-l2, lb-only) |
| bpf.disableExternalIPMitigation | bool | `false` | Disable ExternalIP mitigation (CVE-2020-8554) |
| bpf.distributedLRU | object | `{"enabled":false}` | Control to use a distributed per-CPU backend memory for the core BPF LRU maps which Cilium uses. This improves performance significantly, but it is also recommended to increase BPF map sizing along with that. |
| bpf.distributedLRU.enabled | bool | `false` | Enable distributed LRU backend memory. For compatibility with existing installations it is off by default. |
| bpf.enableTCX | bool | `true` | Attach endpoint programs using tcx instead of legacy tc hooks on supported kernels. |
| bpf.events | object | `{"default":{"burstLimit":null,"rateLimit":null},"drop":{"enabled":true},"policyVerdict":{"enabled":true},"trace":{"enabled":true}}` | Control events generated by the Cilium datapath exposed to Cilium monitor and Hubble. Helm configuration for BPF events map rate limiting is experimental and might change in upcoming releases. |
| bpf.events.default | object | `{"burstLimit":null,"rateLimit":null}` | Default settings for all types of events except dbg and pcap. |
@@ -195,7 +197,7 @@ contributors across the globe, there is almost always someone available to help.
| clustermesh.apiserver.extraVolumeMounts | list | `[]` | Additional clustermesh-apiserver volumeMounts. |
| clustermesh.apiserver.extraVolumes | list | `[]` | Additional clustermesh-apiserver volumes. |
| clustermesh.apiserver.healthPort | int | `9880` | TCP port for the clustermesh-apiserver health API. |
| clustermesh.apiserver.image | object | `{"digest":"sha256:1de22f46bfdd638de72c2224d5223ddc3bbeacda1803cb75799beca3d4bf7a4c","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/clustermesh-apiserver","tag":"v1.17.1","useDigest":true}` | Clustermesh API server image. |
| clustermesh.apiserver.image | object | `{"digest":"sha256:981250ebdc6e66e190992eaf75cfca169113a8f08d5c3793fe15822176980398","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/clustermesh-apiserver","tag":"v1.17.2","useDigest":true}` | Clustermesh API server image. |
| clustermesh.apiserver.kvstoremesh.enabled | bool | `true` | Enable KVStoreMesh. KVStoreMesh caches the information retrieved from the remote clusters in the local etcd instance. |
| clustermesh.apiserver.kvstoremesh.extraArgs | list | `[]` | Additional KVStoreMesh arguments. |
| clustermesh.apiserver.kvstoremesh.extraEnv | list | `[]` | Additional KVStoreMesh environment variables. |
@@ -375,7 +377,7 @@ contributors across the globe, there is almost always someone available to help.
| envoy.healthPort | int | `9878` | TCP port for the health API. |
| envoy.httpRetryCount | int | `3` | Maximum number of retries for each HTTP request |
| envoy.idleTimeoutDurationSeconds | int | `60` | Set Envoy upstream HTTP idle connection timeout seconds. Does not apply to connections with pending requests. Default 60s |
| envoy.image | object | `{"digest":"sha256:fc708bd36973d306412b2e50c924cd8333de67e0167802c9b48506f9d772f521","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/cilium-envoy","tag":"v1.31.5-1739264036-958bef243c6c66fcfd73ca319f2eb49fff1eb2ae","useDigest":true}` | Envoy container image. |
| envoy.image | object | `{"digest":"sha256:377c78c13d2731f3720f931721ee309159e782d882251709cb0fac3b42c03f4b","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/cilium-envoy","tag":"v1.31.5-1741765102-efed3defcc70ab5b263a0fc44c93d316b846a211","useDigest":true}` | Envoy container image. |
| envoy.initialFetchTimeoutSeconds | int | `30` | Time in seconds after which the initial fetch on an xDS stream is considered timed out |
| envoy.livenessProbe.failureThreshold | int | `10` | failure threshold of liveness probe |
| envoy.livenessProbe.periodSeconds | int | `30` | interval between checks of the liveness probe |
@@ -392,6 +394,7 @@ contributors across the globe, there is almost always someone available to help.
| envoy.podLabels | object | `{}` | Labels to be added to envoy pods |
| envoy.podSecurityContext | object | `{"appArmorProfile":{"type":"Unconfined"}}` | Security Context for cilium-envoy pods. |
| envoy.podSecurityContext.appArmorProfile | object | `{"type":"Unconfined"}` | AppArmorProfile options for the `cilium-agent` and init containers |
| envoy.policyRestoreTimeoutDuration | string | `nil` | Max duration to wait for endpoint policies to be restored on restart. Default "3m". |
| envoy.priorityClassName | string | `nil` | The priority class to use for cilium-envoy. |
| envoy.prometheus | object | `{"enabled":true,"port":"9964","serviceMonitor":{"annotations":{},"enabled":false,"interval":"10s","labels":{},"metricRelabelings":null,"relabelings":[{"replacement":"${1}","sourceLabels":["__meta_kubernetes_pod_node_name"],"targetLabel":"node"}]}}` | Configure Cilium Envoy Prometheus options. Note that some of these apply to either cilium-agent or cilium-envoy. |
| envoy.prometheus.enabled | bool | `true` | Enable prometheus metrics for cilium-envoy |
@@ -515,7 +518,7 @@ contributors across the globe, there is almost always someone available to help.
| hubble.relay.extraVolumes | list | `[]` | Additional hubble-relay volumes. |
| hubble.relay.gops.enabled | bool | `true` | Enable gops for hubble-relay |
| hubble.relay.gops.port | int | `9893` | Configure gops listen port for hubble-relay |
| hubble.relay.image | object | `{"digest":"sha256:397e8fbb188157f744390a7b272a1dec31234e605bcbe22d8919a166d202a3dc","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/hubble-relay","tag":"v1.17.1","useDigest":true}` | Hubble-relay container image. |
| hubble.relay.image | object | `{"digest":"sha256:42a8db5c256c516cacb5b8937c321b2373ad7a6b0a1e5a5120d5028433d586cc","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/hubble-relay","tag":"v1.17.2","useDigest":true}` | Hubble-relay container image. |
| hubble.relay.listenHost | string | `""` | Host to listen to. Specify an empty string to bind to all the interfaces. |
| hubble.relay.listenPort | string | `"4245"` | Port to listen to. |
| hubble.relay.nodeSelector | object | `{"kubernetes.io/os":"linux"}` | Node labels for pod assignment ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector |
@@ -582,7 +585,7 @@ contributors across the globe, there is almost always someone available to help.
| hubble.ui.backend.extraEnv | list | `[]` | Additional hubble-ui backend environment variables. |
| hubble.ui.backend.extraVolumeMounts | list | `[]` | Additional hubble-ui backend volumeMounts. |
| hubble.ui.backend.extraVolumes | list | `[]` | Additional hubble-ui backend volumes. |
| hubble.ui.backend.image | object | `{"digest":"sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/hubble-ui-backend","tag":"v0.13.1","useDigest":true}` | Hubble-ui backend image. |
| hubble.ui.backend.image | object | `{"digest":"sha256:a034b7e98e6ea796ed26df8f4e71f83fc16465a19d166eff67a03b822c0bfa15","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/hubble-ui-backend","tag":"v0.13.2","useDigest":true}` | Hubble-ui backend image. |
| hubble.ui.backend.livenessProbe.enabled | bool | `false` | Enable liveness probe for Hubble-ui backend (requires Hubble-ui 0.12+) |
| hubble.ui.backend.readinessProbe.enabled | bool | `false` | Enable readiness probe for Hubble-ui backend (requires Hubble-ui 0.12+) |
| hubble.ui.backend.resources | object | `{}` | Resource requests and limits for the 'backend' container of the 'hubble-ui' deployment. |
@@ -592,7 +595,7 @@ contributors across the globe, there is almost always someone available to help.
| hubble.ui.frontend.extraEnv | list | `[]` | Additional hubble-ui frontend environment variables. |
| hubble.ui.frontend.extraVolumeMounts | list | `[]` | Additional hubble-ui frontend volumeMounts. |
| hubble.ui.frontend.extraVolumes | list | `[]` | Additional hubble-ui frontend volumes. |
| hubble.ui.frontend.image | object | `{"digest":"sha256:e2e9313eb7caf64b0061d9da0efbdad59c6c461f6ca1752768942bfeda0796c6","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/hubble-ui","tag":"v0.13.1","useDigest":true}` | Hubble-ui frontend image. |
| hubble.ui.frontend.image | object | `{"digest":"sha256:9e37c1296b802830834cc87342a9182ccbb71ffebb711971e849221bd9d59392","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/hubble-ui","tag":"v0.13.2","useDigest":true}` | Hubble-ui frontend image. |
| hubble.ui.frontend.resources | object | `{}` | Resource requests and limits for the 'frontend' container of the 'hubble-ui' deployment. |
| hubble.ui.frontend.securityContext | object | `{}` | Hubble-ui frontend security context. |
| hubble.ui.frontend.server.ipv6 | object | `{"enabled":true}` | Controls server listener for ipv6 |
@@ -622,7 +625,7 @@ contributors across the globe, there is almost always someone available to help.
| hubble.ui.updateStrategy | object | `{"rollingUpdate":{"maxUnavailable":1},"type":"RollingUpdate"}` | hubble-ui update strategy. |
| identityAllocationMode | string | `"crd"` | Method to use for identity allocation (`crd`, `kvstore` or `doublewrite-readkvstore` / `doublewrite-readcrd` for migrating between identity backends). |
| identityChangeGracePeriod | string | `"5s"` | Time to wait before using new identity on endpoint identity change. |
| image | object | `{"digest":"sha256:8969bfd9c87cbea91e40665f8ebe327268c99d844ca26d7d12165de07f702866","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/cilium","tag":"v1.17.1","useDigest":true}` | Agent container image. |
| image | object | `{"digest":"sha256:3c4c9932b5d8368619cb922a497ff2ebc8def5f41c18e410bcc84025fcd385b1","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/cilium","tag":"v1.17.2","useDigest":true}` | Agent container image. |
| imagePullSecrets | list | `[]` | Configure image pull secrets for pulling container images |
| ingressController.default | bool | `false` | Set cilium ingress controller to be the default ingress controller This will let cilium ingress controller route entries without ingress class set |
| ingressController.defaultSecretName | string | `nil` | Default secret name for ingresses without .spec.tls[].secretName set. |
@@ -759,7 +762,7 @@ contributors across the globe, there is almost always someone available to help.
| operator.hostNetwork | bool | `true` | HostNetwork setting |
| operator.identityGCInterval | string | `"15m0s"` | Interval for identity garbage collection. |
| operator.identityHeartbeatTimeout | string | `"30m0s"` | Timeout for identity heartbeats. |
| operator.image | object | `{"alibabacloudDigest":"sha256:034b479fba340f9d98510e509c7ce1c36e8889a109d5f1c2240fcb0942bc772c","awsDigest":"sha256:da74748057c836471bfdc0e65bb29ba0edb82916ec4b99f6a4f002b2fcc849d6","azureDigest":"sha256:b9e3e3994f5fcf1832e1f344f3b3b544832851b1990f124b2c2c68e3ffe04a9b","genericDigest":"sha256:628becaeb3e4742a1c36c4897721092375891b58bae2bfcae48bbf4420aaee97","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/operator","suffix":"","tag":"v1.17.1","useDigest":true}` | cilium-operator image. |
| operator.image | object | `{"alibabacloudDigest":"sha256:7cb8c23417f65348bb810fe92fb05b41d926f019d77442f3fa1058d17fea7ffe","awsDigest":"sha256:955096183e22a203bbb198ca66e3266ce4dbc2b63f1a2fbd03f9373dcd97893c","azureDigest":"sha256:455fb88b558b1b8ba09d63302ccce76b4930581be89def027184ab04335c20e0","genericDigest":"sha256:81f2d7198366e8dec2903a3a8361e4c68d47d19c68a0d42f0b7b6e3f0523f249","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/operator","suffix":"","tag":"v1.17.2","useDigest":true}` | cilium-operator image. |
| operator.nodeGCInterval | string | `"5m0s"` | Interval for cilium node garbage collection. |
| operator.nodeSelector | object | `{"kubernetes.io/os":"linux"}` | Node labels for cilium-operator pod assignment ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector |
| operator.podAnnotations | object | `{}` | Annotations to be added to cilium-operator pods |
@@ -809,7 +812,7 @@ contributors across the globe, there is almost always someone available to help.
| preflight.extraEnv | list | `[]` | Additional preflight environment variables. |
| preflight.extraVolumeMounts | list | `[]` | Additional preflight volumeMounts. |
| preflight.extraVolumes | list | `[]` | Additional preflight volumes. |
| preflight.image | object | `{"digest":"sha256:8969bfd9c87cbea91e40665f8ebe327268c99d844ca26d7d12165de07f702866","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/cilium","tag":"v1.17.1","useDigest":true}` | Cilium pre-flight image. |
| preflight.image | object | `{"digest":"sha256:3c4c9932b5d8368619cb922a497ff2ebc8def5f41c18e410bcc84025fcd385b1","override":null,"pullPolicy":"IfNotPresent","repository":"quay.io/cilium/cilium","tag":"v1.17.2","useDigest":true}` | Cilium pre-flight image. |
| preflight.nodeSelector | object | `{"kubernetes.io/os":"linux"}` | Node labels for preflight pod assignment ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector |
| preflight.podAnnotations | object | `{}` | Annotations to be added to preflight pods |
| preflight.podDisruptionBudget.enabled | bool | `false` | enable PodDisruptionBudget ref: https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ |
@@ -883,7 +886,7 @@ contributors across the globe, there is almost always someone available to help.
| tls.caBundle.useSecret | bool | `false` | Use a Secret instead of a ConfigMap. |
| tls.readSecretsOnlyFromSecretsNamespace | string | `nil` | Configure if the Cilium Agent will only look in `tls.secretsNamespace` for CiliumNetworkPolicy relevant Secrets. If false, the Cilium Agent will be granted READ (GET/LIST/WATCH) access to _all_ secrets in the entire cluster. This is not recommended and is included for backwards compatibility. This value obsoletes `tls.secretsBackend`, with `true` == `local` in the old setting, and `false` == `k8s`. |
| tls.secretSync | object | `{"enabled":null}` | Configures settings for synchronization of TLS Interception Secrets |
| tls.secretSync.enabled | string | `nil` | Enable synchronization of Secrets for TLS Interception. If disabled and tls.secretsBackend is set to 'k8s', then secrets will be read directly by the agent. |
| tls.secretSync.enabled | string | `nil` | Enable synchronization of Secrets for TLS Interception. If disabled and tls.readSecretsOnlyFromSecretsNamespace is set to 'false', then secrets will be read directly by the agent. |
| tls.secretsBackend | string | `nil` | This configures how the Cilium agent loads the secrets used TLS-aware CiliumNetworkPolicies (namely the secrets referenced by terminatingTLS and originatingTLS). This value is DEPRECATED and will be removed in a future version. Use `tls.readSecretsOnlyFromSecretsNamespace` instead. Possible values: - local - k8s |
| tls.secretsNamespace | object | `{"create":true,"name":"cilium-secrets"}` | Configures where secrets used in CiliumNetworkPolicies will be looked for |
| tls.secretsNamespace.create | bool | `true` | Create secrets namespace for TLS Interception secrets. |
@@ -891,6 +894,7 @@ contributors across the globe, there is almost always someone available to help.
| tolerations | list | `[{"operator":"Exists"}]` | Node tolerations for agent scheduling to nodes with taints ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/ |
| tunnelPort | int | Port 8472 for VXLAN, Port 6081 for Geneve | Configure VXLAN and Geneve tunnel port. |
| tunnelProtocol | string | `"vxlan"` | Tunneling protocol to use in tunneling mode and for ad-hoc tunnels. Possible values: - "" - vxlan - geneve |
| tunnelSourcePortRange | string | 0-0 to let the kernel driver decide the range | Configure VXLAN and Geneve tunnel source port range hint. |
| updateStrategy | object | `{"rollingUpdate":{"maxUnavailable":2},"type":"RollingUpdate"}` | Cilium agent update strategy |
| upgradeCompatibility | string | `nil` | upgradeCompatibility helps users upgrading to ensure that the configMap for Cilium will not change critical values to ensure continued operation This flag is not required for new installations. For example: '1.7', '1.8', '1.9' |
| vtep.cidr | string | `""` | A space separated list of VTEP device CIDRs, for example "1.1.1.0/24 1.1.2.0/24" |

View File

@@ -7,8 +7,15 @@ staticResources:
- name: "envoy-prometheus-metrics-listener"
address:
socketAddress:
address: "0.0.0.0"
address: {{ .Values.ipv4.enabled | ternary "0.0.0.0" "::" | quote }}
portValue: {{ .Values.envoy.prometheus.port }}
{{- if and .Values.ipv4.enabled .Values.ipv6.enabled }}
additionalAddresses:
- address:
socketAddress:
address: "::"
portValue: {{ .Values.envoy.prometheus.port }}
{{- end }}
filterChains:
- filters:
- name: "envoy.filters.network.http_connection_manager"
@@ -289,7 +296,7 @@ overloadManager:
applicationLogConfig:
logFormat:
{{- if .Values.envoy.log.format_json }}
jsonFormat: "{{ .Values.envoy.log.format_json | toJson }}"
jsonFormat: {{ .Values.envoy.log.format_json | toJson }}
{{- else }}
textFormat: "{{ .Values.envoy.log.format }}"
{{- end }}

View File

@@ -232,7 +232,7 @@ spec:
resources:
{{- toYaml . | trim | nindent 10 }}
{{- end }}
{{- if or .Values.prometheus.enabled .Values.hubble.metrics.enabled }}
{{- if or .Values.prometheus.enabled (or .Values.hubble.metrics.enabled .Values.hubble.metrics.dynamic.enabled) }}
ports:
- name: peer-service
containerPort: {{ .Values.hubble.peerService.targetPort }}
@@ -364,7 +364,7 @@ spec:
mountPath: {{ .Values.kubeConfigPath }}
readOnly: true
{{- end }}
{{- if and .Values.hubble.enabled .Values.hubble.metrics.enabled .Values.hubble.metrics.tls.enabled }}
{{- if and .Values.hubble.enabled (or .Values.hubble.metrics.enabled .Values.hubble.metrics.dynamic.enabled) .Values.hubble.metrics.tls.enabled }}
- name: hubble-metrics-tls
mountPath: /var/lib/cilium/tls/hubble-metrics
readOnly: true
@@ -999,7 +999,7 @@ spec:
path: client-ca.crt
{{- end }}
{{- end }}
{{- if and .Values.hubble.enabled .Values.hubble.metrics.enabled .Values.hubble.metrics.tls.enabled }}
{{- if and .Values.hubble.enabled (or .Values.hubble.metrics.enabled .Values.hubble.metrics.dynamic.enabled) .Values.hubble.metrics.tls.enabled }}
- name: hubble-metrics-tls
projected:
# note: the leading zero means this number is in octal representation: do not remove it

View File

@@ -39,6 +39,9 @@ metadata:
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
@@ -62,6 +65,9 @@ metadata:
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
@@ -85,6 +91,9 @@ metadata:
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
@@ -104,6 +113,9 @@ metadata:
namespace: {{ .Values.bgpControlPlane.secretsNamespace.name | quote }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
@@ -123,6 +135,9 @@ metadata:
namespace: {{ .Values.tls.secretsNamespace.name | quote }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role

View File

@@ -46,6 +46,9 @@ metadata:
k8s-app: cilium
app.kubernetes.io/name: cilium-agent
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
clusterIP: None
type: ClusterIP

View File

@@ -403,7 +403,7 @@ data:
{{- if .Values.bpf.authMapMax }}
# bpf-auth-map-max specifies the maximum number of entries in the auth map
bpf-auth-map-max: {{ .Values.bpf.authMapMax | quote }}
bpf-auth-map-max: "{{ .Values.bpf.authMapMax | int }}"
{{- end }}
{{- if or $bpfCtTcpMax $bpfCtAnyMax }}
# bpf-ct-global-*-max specifies the maximum number of connections
@@ -419,34 +419,34 @@ data:
# For users upgrading from Cilium 1.2 or earlier, to minimize disruption
# during the upgrade process, set bpf-ct-global-tcp-max to 1000000.
{{- if $bpfCtTcpMax }}
bpf-ct-global-tcp-max: {{ $bpfCtTcpMax | quote }}
bpf-ct-global-tcp-max: "{{ $bpfCtTcpMax | int }}"
{{- end }}
{{- if $bpfCtAnyMax }}
bpf-ct-global-any-max: {{ $bpfCtAnyMax | quote }}
bpf-ct-global-any-max: "{{ $bpfCtAnyMax | int }}"
{{- end }}
{{- end }}
{{- if .Values.bpf.ctAccounting }}
bpf-conntrack-accounting: "{{ .Values.bpf.ctAccounting }}"
bpf-conntrack-accounting: "{{ .Values.bpf.ctAccounting | int }}"
{{- end }}
{{- if .Values.bpf.natMax }}
# bpf-nat-global-max specified the maximum number of entries in the
# BPF NAT table.
bpf-nat-global-max: "{{ .Values.bpf.natMax }}"
bpf-nat-global-max: "{{ .Values.bpf.natMax | int }}"
{{- end }}
{{- if .Values.bpf.neighMax }}
# bpf-neigh-global-max specified the maximum number of entries in the
# BPF neighbor table.
bpf-neigh-global-max: "{{ .Values.bpf.neighMax }}"
bpf-neigh-global-max: "{{ .Values.bpf.neighMax | int }}"
{{- end }}
{{- if hasKey .Values.bpf "policyMapMax" }}
# bpf-policy-map-max specifies the maximum number of entries in endpoint
# policy map (per endpoint)
bpf-policy-map-max: "{{ .Values.bpf.policyMapMax }}"
bpf-policy-map-max: "{{ .Values.bpf.policyMapMax | int }}"
{{- end }}
{{- if hasKey .Values.bpf "lbMapMax" }}
# bpf-lb-map-max specifies the maximum number of entries in bpf lb service,
# backend and affinity maps.
bpf-lb-map-max: "{{ .Values.bpf.lbMapMax }}"
bpf-lb-map-max: "{{ .Values.bpf.lbMapMax | int }}"
{{- end }}
{{- if hasKey .Values.bpf "lbExternalClusterIP" }}
bpf-lb-external-clusterip: {{ .Values.bpf.lbExternalClusterIP | quote }}
@@ -461,6 +461,7 @@ data:
bpf-lb-mode-annotation: {{ .Values.bpf.lbModeAnnotation | quote }}
{{- end }}
bpf-distributed-lru: {{ .Values.bpf.distributedLRU.enabled | quote }}
bpf-events-drop-enabled: {{ .Values.bpf.events.drop.enabled | quote }}
bpf-events-policy-verdict-enabled: {{ .Values.bpf.events.policyVerdict.enabled | quote }}
bpf-events-trace-enabled: {{ .Values.bpf.events.trace.enabled | quote }}
@@ -513,6 +514,9 @@ data:
{{- if .Values.tunnelPort }}
tunnel-port: {{ .Values.tunnelPort | quote }}
{{- end }}
{{- if .Values.tunnelSourcePortRange }}
tunnel-source-port-range: {{ .Values.tunnelSourcePortRange | quote }}
{{- end }}
{{- if .Values.serviceNoBackendResponse }}
service-no-backend-response: "{{ .Values.serviceNoBackendResponse }}"
@@ -927,9 +931,8 @@ data:
operator-api-serve-addr: {{ $defaultOperatorApiServeAddr | quote }}
{{- end }}
{{- if .Values.hubble.enabled }}
# Enable Hubble gRPC service.
enable-hubble: {{ .Values.hubble.enabled | quote }}
{{- if .Values.hubble.enabled }}
# UNIX domain socket for Hubble server to listen to.
hubble-socket-path: {{ .Values.hubble.socketPath | quote }}
{{- if hasKey .Values.hubble "eventQueueSize" }}
@@ -941,7 +944,7 @@ data:
# Capacity of the buffer to store recent events.
hubble-event-buffer-capacity: {{ .Values.hubble.eventBufferCapacity | quote }}
{{- end }}
{{- if .Values.hubble.metrics.enabled }}
{{- if or .Values.hubble.metrics.enabled .Values.hubble.metrics.dynamic.enabled}}
# Address to expose Hubble metrics (e.g. ":7070"). Metrics server will be disabled if this
# field is not set.
hubble-metrics-server: ":{{ .Values.hubble.metrics.port }}"
@@ -953,14 +956,20 @@ data:
hubble-metrics-server-tls-client-ca-files: /var/lib/cilium/tls/hubble-metrics/client-ca.crt
{{- end }}
{{- end }}
{{- end }}
{{- if .Values.hubble.metrics.enabled }}
# A space separated list of metrics to enable. See [0] for available metrics.
#
# https://github.com/cilium/hubble/blob/master/Documentation/metrics.md
hubble-metrics: {{- range .Values.hubble.metrics.enabled }}
{{.}}
{{- end}}
{{- if .Values.hubble.metrics.dynamic.enabled }}
hubble-dynamic-metrics-config-path: /dynamic-metrics-config/dynamic-metrics.yaml
{{- end }}
enable-hubble-open-metrics: {{ .Values.hubble.metrics.enableOpenMetrics | quote }}
{{- end }}
{{- if .Values.hubble.redact }}
{{- if eq .Values.hubble.redact.enabled true }}
# Enables hubble redact capabilities
@@ -1004,10 +1013,6 @@ data:
hubble-flowlogs-config-path: /flowlog-config/flowlogs.yaml
{{- end }}
{{- end }}
{{- if .Values.hubble.metrics.dynamic.enabled }}
hubble-dynamic-metrics-config-path: /dynamic-metrics-config/dynamic-metrics.yaml
hubble-metrics-server: ":{{ .Values.hubble.metrics.port }}"
{{- end }}
{{- if hasKey .Values.hubble "listenAddress" }}
# An additional address for Hubble server to listen to (e.g. ":4244").
hubble-listen-address: {{ .Values.hubble.listenAddress | quote }}
@@ -1041,8 +1046,8 @@ data:
{{- else }}
ipam: {{ $ipam | quote }}
{{- end }}
{{- if hasKey .Values.ipam "multiPoolPreAllocation" }}
ipam-multi-pool-pre-allocation: {{ .Values.ipam.multiPoolPreAllocation }}
{{- if .Values.ipam.multiPoolPreAllocation }}
ipam-multi-pool-pre-allocation: {{ .Values.ipam.multiPoolPreAllocation | quote }}
{{- end }}
{{- if .Values.ipam.ciliumNodeUpdateRate }}
@@ -1335,6 +1340,10 @@ data:
external-envoy-proxy: {{ include "envoyDaemonSetEnabled" . | quote }}
envoy-base-id: {{ .Values.envoy.baseID | quote }}
{{- if .Values.envoy.policyRestoreTimeoutDuration }}
envoy-policy-restore-timeout: {{ .Values.envoy.policyRestoreTimeoutDuration | quote }}
{{- end }}
{{- if .Values.envoy.log.path }}
envoy-log: {{ .Values.envoy.log.path | quote }}
{{- end }}

View File

@@ -41,6 +41,9 @@ metadata:
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
- apiGroups:
- ""
@@ -66,6 +69,9 @@ metadata:
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
rules:
- apiGroups:
- ""

View File

@@ -7,24 +7,23 @@ kind: RoleBinding
metadata:
name: cilium-operator-ingress-secrets
namespace: {{ .Values.ingressController.secretsNamespace.name | quote }}
{{- with .Values.commonLabels }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- with .Values.operator.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cilium-operator-ingress-secrets
subjects:
- kind: ServiceAccount
name: {{ .Values.serviceAccounts.operator.name | quote }}
namespace: {{ include "cilium.namespace" . }}
- kind: ServiceAccount
name: {{ .Values.serviceAccounts.operator.name | quote }}
namespace: {{ include "cilium.namespace" . }}
{{- end }}
{{- if and .Values.operator.enabled .Values.serviceAccounts.operator.create .Values.gatewayAPI.enabled .Values.gatewayAPI.secretsNamespace.sync .Values.gatewayAPI.secretsNamespace.name }}
@@ -34,12 +33,15 @@ kind: RoleBinding
metadata:
name: cilium-operator-gateway-secrets
namespace: {{ .Values.gatewayAPI.secretsNamespace.name | quote }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- with .Values.operator.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
@@ -57,12 +59,15 @@ kind: RoleBinding
metadata:
name: cilium-operator-tlsinterception-secrets
namespace: {{ .Values.tls.secretsNamespace.name | quote }}
labels:
app.kubernetes.io/part-of: cilium
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- with .Values.operator.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
labels:
app.kubernetes.io/part-of: cilium
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role

View File

@@ -1,4 +1,4 @@
{{- if and .Values.hubble.enabled .Values.hubble.metrics.enabled .Values.hubble.metrics.serviceMonitor.enabled }}
{{- if and .Values.hubble.enabled (or .Values.hubble.metrics.enabled .Values.hubble.metrics.dynamic.enabled) .Values.hubble.metrics.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:

View File

@@ -4,10 +4,13 @@ kind: Service
metadata:
name: spire-server
namespace: {{ .Values.authentication.mutual.spire.install.namespace }}
{{- with .Values.commonLabels }}
labels:
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- with .Values.authentication.mutual.spire.install.server.service.labels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- if or .Values.authentication.mutual.spire.install.server.service.annotations .Values.authentication.mutual.spire.annotations }}
annotations:
{{- with .Values.authentication.mutual.spire.annotations }}
@@ -17,10 +20,6 @@ metadata:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- with .Values.authentication.mutual.spire.install.server.service.labels }}
labels:
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
type: {{ .Values.authentication.mutual.spire.install.server.service.type }}
ports:

View File

@@ -4,10 +4,6 @@ kind: StatefulSet
metadata:
name: spire-server
namespace: {{ .Values.authentication.mutual.spire.install.namespace }}
{{- with .Values.commonLabels }}
labels:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- if or .Values.authentication.mutual.spire.install.server.annotations .Values.authentication.mutual.spire.annotations }}
annotations:
{{- with .Values.authentication.mutual.spire.annotations }}
@@ -19,9 +15,12 @@ metadata:
{{- end }}
labels:
app: spire-server
{{- with .Values.authentication.mutual.spire.install.server.labels }}
{{- with .Values.commonLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- with .Values.authentication.mutual.spire.install.server.labels }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
replicas: 1
selector:

View File

@@ -519,6 +519,14 @@
"disableExternalIPMitigation": {
"type": "boolean"
},
"distributedLRU": {
"properties": {
"enabled": {
"type": "boolean"
}
},
"type": "object"
},
"enableTCX": {
"type": "boolean"
},
@@ -2110,6 +2118,12 @@
},
"type": "object"
},
"policyRestoreTimeoutDuration": {
"type": [
"null",
"string"
]
},
"priorityClassName": {
"type": [
"null",
@@ -5462,6 +5476,9 @@
"tunnelProtocol": {
"type": "string"
},
"tunnelSourcePortRange": {
"type": "string"
},
"updateStrategy": {
"properties": {
"rollingUpdate": {

View File

@@ -191,10 +191,10 @@ image:
# @schema
override: ~
repository: "quay.io/cilium/cilium"
tag: "v1.17.1"
tag: "v1.17.2"
pullPolicy: "IfNotPresent"
# cilium-digest
digest: "sha256:8969bfd9c87cbea91e40665f8ebe327268c99d844ca26d7d12165de07f702866"
digest: "sha256:3c4c9932b5d8368619cb922a497ff2ebc8def5f41c18e410bcc84025fcd385b1"
useDigest: true
# -- Scheduling configurations for cilium pods
scheduling:
@@ -495,6 +495,13 @@ bpf:
# tracking table.
# @default -- `262144`
ctAnyMax: ~
# -- Control to use a distributed per-CPU backend memory for the core BPF LRU maps
# which Cilium uses. This improves performance significantly, but it is also
# recommended to increase BPF map sizing along with that.
distributedLRU:
# -- Enable distributed LRU backend memory. For compatibility with existing
# installations it is off by default.
enabled: false
# -- Control events generated by the Cilium datapath exposed to Cilium monitor and Hubble.
# Helm configuration for BPF events map rate limiting is experimental and might change
# in upcoming releases.
@@ -1433,9 +1440,9 @@ hubble:
# @schema
override: ~
repository: "quay.io/cilium/hubble-relay"
tag: "v1.17.1"
tag: "v1.17.2"
# hubble-relay-digest
digest: "sha256:397e8fbb188157f744390a7b272a1dec31234e605bcbe22d8919a166d202a3dc"
digest: "sha256:42a8db5c256c516cacb5b8937c321b2373ad7a6b0a1e5a5120d5028433d586cc"
useDigest: true
pullPolicy: "IfNotPresent"
# -- Specifies the resources for the hubble-relay pods
@@ -1684,8 +1691,8 @@ hubble:
# @schema
override: ~
repository: "quay.io/cilium/hubble-ui-backend"
tag: "v0.13.1"
digest: "sha256:0e0eed917653441fded4e7cdb096b7be6a3bddded5a2dd10812a27b1fc6ed95b"
tag: "v0.13.2"
digest: "sha256:a034b7e98e6ea796ed26df8f4e71f83fc16465a19d166eff67a03b822c0bfa15"
useDigest: true
pullPolicy: "IfNotPresent"
# -- Hubble-ui backend security context.
@@ -1718,8 +1725,8 @@ hubble:
# @schema
override: ~
repository: "quay.io/cilium/hubble-ui"
tag: "v0.13.1"
digest: "sha256:e2e9313eb7caf64b0061d9da0efbdad59c6c461f6ca1752768942bfeda0796c6"
tag: "v0.13.2"
digest: "sha256:9e37c1296b802830834cc87342a9182ccbb71ffebb711971e849221bd9d59392"
useDigest: true
pullPolicy: "IfNotPresent"
# -- Hubble-ui frontend security context.
@@ -2332,6 +2339,11 @@ envoy:
xffNumTrustedHopsL7PolicyIngress: 0
# -- Number of trusted hops regarding the x-forwarded-for and related HTTP headers for the egress L7 policy enforcement Envoy listeners.
xffNumTrustedHopsL7PolicyEgress: 0
# @schema
# type: [null, string]
# @schema
# -- Max duration to wait for endpoint policies to be restored on restart. Default "3m".
policyRestoreTimeoutDuration: null
# -- Envoy container image.
image:
# @schema
@@ -2339,9 +2351,9 @@ envoy:
# @schema
override: ~
repository: "quay.io/cilium/cilium-envoy"
tag: "v1.31.5-1739264036-958bef243c6c66fcfd73ca319f2eb49fff1eb2ae"
tag: "v1.31.5-1741765102-efed3defcc70ab5b263a0fc44c93d316b846a211"
pullPolicy: "IfNotPresent"
digest: "sha256:fc708bd36973d306412b2e50c924cd8333de67e0167802c9b48506f9d772f521"
digest: "sha256:377c78c13d2731f3720f931721ee309159e782d882251709cb0fac3b42c03f4b"
useDigest: true
# -- Additional containers added to the cilium Envoy DaemonSet.
extraContainers: []
@@ -2605,7 +2617,7 @@ tls:
# type: [null, boolean]
# @schema
# -- Enable synchronization of Secrets for TLS Interception. If disabled and
# tls.secretsBackend is set to 'k8s', then secrets will be read directly by the agent.
# tls.readSecretsOnlyFromSecretsNamespace is set to 'false', then secrets will be read directly by the agent.
enabled: ~
# -- Base64 encoded PEM values for the CA certificate and private key.
# This can be used as common CA to generate certificates used by hubble and clustermesh components.
@@ -2658,6 +2670,9 @@ routingMode: ""
# -- Configure VXLAN and Geneve tunnel port.
# @default -- Port 8472 for VXLAN, Port 6081 for Geneve
tunnelPort: 0
# -- Configure VXLAN and Geneve tunnel source port range hint.
# @default -- 0-0 to let the kernel driver decide the range
tunnelSourcePortRange: 0-0
# -- Configure what the response should be to traffic for a service without backends.
# Possible values:
# - reject (default)
@@ -2693,15 +2708,15 @@ operator:
# @schema
override: ~
repository: "quay.io/cilium/operator"
tag: "v1.17.1"
tag: "v1.17.2"
# operator-generic-digest
genericDigest: "sha256:628becaeb3e4742a1c36c4897721092375891b58bae2bfcae48bbf4420aaee97"
genericDigest: "sha256:81f2d7198366e8dec2903a3a8361e4c68d47d19c68a0d42f0b7b6e3f0523f249"
# operator-azure-digest
azureDigest: "sha256:b9e3e3994f5fcf1832e1f344f3b3b544832851b1990f124b2c2c68e3ffe04a9b"
azureDigest: "sha256:455fb88b558b1b8ba09d63302ccce76b4930581be89def027184ab04335c20e0"
# operator-aws-digest
awsDigest: "sha256:da74748057c836471bfdc0e65bb29ba0edb82916ec4b99f6a4f002b2fcc849d6"
awsDigest: "sha256:955096183e22a203bbb198ca66e3266ce4dbc2b63f1a2fbd03f9373dcd97893c"
# operator-alibabacloud-digest
alibabacloudDigest: "sha256:034b479fba340f9d98510e509c7ce1c36e8889a109d5f1c2240fcb0942bc772c"
alibabacloudDigest: "sha256:7cb8c23417f65348bb810fe92fb05b41d926f019d77442f3fa1058d17fea7ffe"
useDigest: true
pullPolicy: "IfNotPresent"
suffix: ""
@@ -2976,9 +2991,9 @@ preflight:
# @schema
override: ~
repository: "quay.io/cilium/cilium"
tag: "v1.17.1"
tag: "v1.17.2"
# cilium-digest
digest: "sha256:8969bfd9c87cbea91e40665f8ebe327268c99d844ca26d7d12165de07f702866"
digest: "sha256:3c4c9932b5d8368619cb922a497ff2ebc8def5f41c18e410bcc84025fcd385b1"
useDigest: true
pullPolicy: "IfNotPresent"
# -- The priority class to use for the preflight pod.
@@ -3125,9 +3140,9 @@ clustermesh:
# @schema
override: ~
repository: "quay.io/cilium/clustermesh-apiserver"
tag: "v1.17.1"
tag: "v1.17.2"
# clustermesh-apiserver-digest
digest: "sha256:1de22f46bfdd638de72c2224d5223ddc3bbeacda1803cb75799beca3d4bf7a4c"
digest: "sha256:981250ebdc6e66e190992eaf75cfca169113a8f08d5c3793fe15822176980398"
useDigest: true
pullPolicy: "IfNotPresent"
# -- TCP port for the clustermesh-apiserver health API.
@@ -3634,7 +3649,7 @@ authentication:
override: ~
repository: "docker.io/library/busybox"
tag: "1.37.0"
digest: "sha256:a5d0ce49aa801d475da48f8cb163c354ab95cab073cd3c138bd458fc8257fbf1"
digest: "sha256:498a000f370d8c37927118ed80afe8adc38d1edcbfc071627d17b25c88efcab0"
useDigest: true
pullPolicy: "IfNotPresent"
# SPIRE agent configuration

View File

@@ -500,6 +500,13 @@ bpf:
# tracking table.
# @default -- `262144`
ctAnyMax: ~
# -- Control to use a distributed per-CPU backend memory for the core BPF LRU maps
# which Cilium uses. This improves performance significantly, but it is also
# recommended to increase BPF map sizing along with that.
distributedLRU:
# -- Enable distributed LRU backend memory. For compatibility with existing
# installations it is off by default.
enabled: false
# -- Control events generated by the Cilium datapath exposed to Cilium monitor and Hubble.
# Helm configuration for BPF events map rate limiting is experimental and might change
# in upcoming releases.
@@ -2351,6 +2358,11 @@ envoy:
xffNumTrustedHopsL7PolicyIngress: 0
# -- Number of trusted hops regarding the x-forwarded-for and related HTTP headers for the egress L7 policy enforcement Envoy listeners.
xffNumTrustedHopsL7PolicyEgress: 0
# @schema
# type: [null, string]
# @schema
# -- Max duration to wait for endpoint policies to be restored on restart. Default "3m".
policyRestoreTimeoutDuration: null
# -- Envoy container image.
image:
# @schema
@@ -2626,7 +2638,7 @@ tls:
# type: [null, boolean]
# @schema
# -- Enable synchronization of Secrets for TLS Interception. If disabled and
# tls.secretsBackend is set to 'k8s', then secrets will be read directly by the agent.
# tls.readSecretsOnlyFromSecretsNamespace is set to 'false', then secrets will be read directly by the agent.
enabled: ~
# -- Base64 encoded PEM values for the CA certificate and private key.
# This can be used as common CA to generate certificates used by hubble and clustermesh components.
@@ -2679,6 +2691,9 @@ routingMode: ""
# -- Configure VXLAN and Geneve tunnel port.
# @default -- Port 8472 for VXLAN, Port 6081 for Geneve
tunnelPort: 0
# -- Configure VXLAN and Geneve tunnel source port range hint.
# @default -- 0-0 to let the kernel driver decide the range
tunnelSourcePortRange: 0-0
# -- Configure what the response should be to traffic for a service without backends.
# Possible values:
# - reject (default)

View File

@@ -1,2 +1,2 @@
ARG VERSION=v1.17.1
ARG VERSION=v1.17.2
FROM quay.io/cilium/cilium:${VERSION}

View File

@@ -4,7 +4,7 @@ metadata:
name: cozystack-controller
rules:
- apiGroups: [""]
resources: ["configmaps", "pods", "namespaces", "nodes", "services", "persistentvolumes"]
resources: ["configmaps", "pods", "namespaces", "nodes", "services", "persistentvolumes", "persistentvolumeclaims"]
verbs: ["get", "watch", "list"]
- apiGroups: ['cozystack.io']
resources: ['*']

View File

@@ -0,0 +1,3 @@
apiVersion: v2
name: cozy-gpu-operator
version: 0.0.0 # Placeholder, the actual version will be automatically set during the build process

View File

@@ -0,0 +1,11 @@
export NAME=gpu-operator
export NAMESPACE=cozy-$(NAME)
include ../../../scripts/common-envs.mk
include ../../../scripts/package.mk
update:
rm -rf charts
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update nvidia
helm pull nvidia/gpu-operator --untar --untardir charts

View File

@@ -0,0 +1,22 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/

View File

@@ -0,0 +1,6 @@
dependencies:
- name: node-feature-discovery
repository: https://kubernetes-sigs.github.io/node-feature-discovery/charts
version: 0.17.2
digest: sha256:4c55d30d958027ef8997a2976449326de3c90049025c3ebb9bee017cad32cc3f
generated: "2025-02-25T09:08:49.128088-08:00"

View File

@@ -0,0 +1,23 @@
apiVersion: v2
appVersion: v25.3.0
dependencies:
- condition: nfd.enabled
name: node-feature-discovery
repository: https://kubernetes-sigs.github.io/node-feature-discovery/charts
version: v0.17.2
description: NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
home: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/overview.html
icon: https://assets.nvidiagrid.net/ngc/logos/GPUoperator.png
keywords:
- gpu
- cuda
- compute
- operator
- deep learning
- monitoring
- tesla
kubeVersion: '>= 1.16.0-0'
name: gpu-operator
sources:
- https://github.com/NVIDIA/gpu-operator
version: v25.3.0

View File

@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/

View File

@@ -0,0 +1,14 @@
apiVersion: v2
appVersion: v0.17.2
description: 'Detects hardware features available on each node in a Kubernetes cluster,
and advertises those features using node labels. '
home: https://github.com/kubernetes-sigs/node-feature-discovery
keywords:
- feature-discovery
- feature-detection
- node-labels
name: node-feature-discovery
sources:
- https://github.com/kubernetes-sigs/node-feature-discovery
type: application
version: 0.17.2

View File

@@ -0,0 +1,10 @@
# Node Feature Discovery
Node Feature Discovery (NFD) is a Kubernetes add-on for detecting hardware
features and system configuration. Detected features are advertised as node
labels. NFD provides flexible configuration and extension points for a wide
range of vendor and application specific node labeling needs.
See
[NFD documentation](https://kubernetes-sigs.github.io/node-feature-discovery/v0.17/deployment/helm.html)
for deployment instructions.

View File

@@ -0,0 +1,711 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.3
name: nodefeatures.nfd.k8s-sigs.io
spec:
group: nfd.k8s-sigs.io
names:
kind: NodeFeature
listKind: NodeFeatureList
plural: nodefeatures
singular: nodefeature
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: |-
NodeFeature resource holds the features discovered for one node in the
cluster.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: Specification of the NodeFeature, containing features discovered
for a node.
properties:
features:
description: Features is the full "raw" features data that has been
discovered.
properties:
attributes:
additionalProperties:
description: AttributeFeatureSet is a set of features having
string value.
properties:
elements:
additionalProperties:
type: string
description: Individual features of the feature set.
type: object
required:
- elements
type: object
description: Attributes contains all the attribute-type features
of the node.
type: object
flags:
additionalProperties:
description: FlagFeatureSet is a set of simple features only
containing names without values.
properties:
elements:
additionalProperties:
description: |-
Nil is a dummy empty struct for protobuf compatibility.
NOTE: protobuf definitions have been removed but this is kept for API compatibility.
type: object
description: Individual features of the feature set.
type: object
required:
- elements
type: object
description: Flags contains all the flag-type features of the
node.
type: object
instances:
additionalProperties:
description: InstanceFeatureSet is a set of features each of
which is an instance having multiple attributes.
properties:
elements:
description: Individual features of the feature set.
items:
description: InstanceFeature represents one instance of
a complex features, e.g. a device.
properties:
attributes:
additionalProperties:
type: string
description: Attributes of the instance feature.
type: object
required:
- attributes
type: object
type: array
required:
- elements
type: object
description: Instances contains all the instance-type features
of the node.
type: object
type: object
labels:
additionalProperties:
type: string
description: Labels is the set of node labels that are requested to
be created.
type: object
type: object
required:
- spec
type: object
served: true
storage: true
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.3
name: nodefeaturegroups.nfd.k8s-sigs.io
spec:
group: nfd.k8s-sigs.io
names:
kind: NodeFeatureGroup
listKind: NodeFeatureGroupList
plural: nodefeaturegroups
shortNames:
- nfg
singular: nodefeaturegroup
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: NodeFeatureGroup resource holds Node pools by featureGroup
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: Spec defines the rules to be evaluated.
properties:
featureGroupRules:
description: List of rules to evaluate to determine nodes that belong
in this group.
items:
description: GroupRule defines a rule for nodegroup filtering.
properties:
matchAny:
description: MatchAny specifies a list of matchers one of which
must match.
items:
description: MatchAnyElem specifies one sub-matcher of MatchAny.
properties:
matchFeatures:
description: MatchFeatures specifies a set of matcher
terms all of which must match.
items:
description: |-
FeatureMatcherTerm defines requirements against one feature set. All
requirements (specified as MatchExpressions) are evaluated against each
element in the feature set.
properties:
feature:
description: Feature is the name of the feature
set to match against.
type: string
matchExpressions:
additionalProperties:
description: |-
MatchExpression specifies an expression to evaluate against a set of input
values. It contains an operator that is applied when matching the input and
an array of values that the operator evaluates the input against.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
description: |-
MatchExpressions is the set of per-element expressions evaluated. These
match against the value of the specified elements.
type: object
matchName:
description: |-
MatchName in an expression that is matched against the name of each
element in the feature set.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
required:
- feature
type: object
type: array
required:
- matchFeatures
type: object
type: array
matchFeatures:
description: MatchFeatures specifies a set of matcher terms
all of which must match.
items:
description: |-
FeatureMatcherTerm defines requirements against one feature set. All
requirements (specified as MatchExpressions) are evaluated against each
element in the feature set.
properties:
feature:
description: Feature is the name of the feature set to
match against.
type: string
matchExpressions:
additionalProperties:
description: |-
MatchExpression specifies an expression to evaluate against a set of input
values. It contains an operator that is applied when matching the input and
an array of values that the operator evaluates the input against.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
description: |-
MatchExpressions is the set of per-element expressions evaluated. These
match against the value of the specified elements.
type: object
matchName:
description: |-
MatchName in an expression that is matched against the name of each
element in the feature set.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
required:
- feature
type: object
type: array
name:
description: Name of the rule.
type: string
required:
- name
type: object
type: array
required:
- featureGroupRules
type: object
status:
description: |-
Status of the NodeFeatureGroup after the most recent evaluation of the
specification.
properties:
nodes:
description: Nodes is a list of FeatureGroupNode in the cluster that
match the featureGroupRules
items:
properties:
name:
description: Name of the node.
type: string
required:
- name
type: object
type: array
x-kubernetes-list-map-keys:
- name
x-kubernetes-list-type: map
type: object
required:
- spec
type: object
served: true
storage: true
subresources:
status: {}
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.16.3
name: nodefeaturerules.nfd.k8s-sigs.io
spec:
group: nfd.k8s-sigs.io
names:
kind: NodeFeatureRule
listKind: NodeFeatureRuleList
plural: nodefeaturerules
shortNames:
- nfr
singular: nodefeaturerule
scope: Cluster
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: |-
NodeFeatureRule resource specifies a configuration for feature-based
customization of node objects, such as node labeling.
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: Spec defines the rules to be evaluated.
properties:
rules:
description: Rules is a list of node customization rules.
items:
description: Rule defines a rule for node customization such as
labeling.
properties:
annotations:
additionalProperties:
type: string
description: Annotations to create if the rule matches.
type: object
extendedResources:
additionalProperties:
type: string
description: ExtendedResources to create if the rule matches.
type: object
labels:
additionalProperties:
type: string
description: Labels to create if the rule matches.
type: object
labelsTemplate:
description: |-
LabelsTemplate specifies a template to expand for dynamically generating
multiple labels. Data (after template expansion) must be keys with an
optional value (<key>[=<value>]) separated by newlines.
type: string
matchAny:
description: MatchAny specifies a list of matchers one of which
must match.
items:
description: MatchAnyElem specifies one sub-matcher of MatchAny.
properties:
matchFeatures:
description: MatchFeatures specifies a set of matcher
terms all of which must match.
items:
description: |-
FeatureMatcherTerm defines requirements against one feature set. All
requirements (specified as MatchExpressions) are evaluated against each
element in the feature set.
properties:
feature:
description: Feature is the name of the feature
set to match against.
type: string
matchExpressions:
additionalProperties:
description: |-
MatchExpression specifies an expression to evaluate against a set of input
values. It contains an operator that is applied when matching the input and
an array of values that the operator evaluates the input against.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
description: |-
MatchExpressions is the set of per-element expressions evaluated. These
match against the value of the specified elements.
type: object
matchName:
description: |-
MatchName in an expression that is matched against the name of each
element in the feature set.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
required:
- feature
type: object
type: array
required:
- matchFeatures
type: object
type: array
matchFeatures:
description: MatchFeatures specifies a set of matcher terms
all of which must match.
items:
description: |-
FeatureMatcherTerm defines requirements against one feature set. All
requirements (specified as MatchExpressions) are evaluated against each
element in the feature set.
properties:
feature:
description: Feature is the name of the feature set to
match against.
type: string
matchExpressions:
additionalProperties:
description: |-
MatchExpression specifies an expression to evaluate against a set of input
values. It contains an operator that is applied when matching the input and
an array of values that the operator evaluates the input against.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
description: |-
MatchExpressions is the set of per-element expressions evaluated. These
match against the value of the specified elements.
type: object
matchName:
description: |-
MatchName in an expression that is matched against the name of each
element in the feature set.
properties:
op:
description: Op is the operator to be applied.
enum:
- In
- NotIn
- InRegexp
- Exists
- DoesNotExist
- Gt
- Lt
- GtLt
- IsTrue
- IsFalse
type: string
value:
description: |-
Value is the list of values that the operand evaluates the input
against. Value should be empty if the operator is Exists, DoesNotExist,
IsTrue or IsFalse. Value should contain exactly one element if the
operator is Gt or Lt and exactly two elements if the operator is GtLt.
In other cases Value should contain at least one element.
items:
type: string
type: array
required:
- op
type: object
required:
- feature
type: object
type: array
name:
description: Name of the rule.
type: string
taints:
description: Taints to create if the rule matches.
items:
description: |-
The node this Taint is attached to has the "effect" on
any pod that does not tolerate the Taint.
properties:
effect:
description: |-
Required. The effect of the taint on pods
that do not tolerate the taint.
Valid effects are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: Required. The taint key to be applied to
a node.
type: string
timeAdded:
description: |-
TimeAdded represents the time at which the taint was added.
It is only written for NoExecute taints.
format: date-time
type: string
value:
description: The taint value corresponding to the taint
key.
type: string
required:
- effect
- key
type: object
type: array
vars:
additionalProperties:
type: string
description: |-
Vars is the variables to store if the rule matches. Variables do not
directly inflict any changes in the node object. However, they can be
referenced from other rules enabling more complex rule hierarchies,
without exposing intermediary output values as labels.
type: object
varsTemplate:
description: |-
VarsTemplate specifies a template to expand for dynamically generating
multiple variables. Data (after template expansion) must be keys with an
optional value (<key>[=<value>]) separated by newlines.
type: string
required:
- name
type: object
type: array
required:
- rules
type: object
required:
- spec
type: object
served: true
storage: true

View File

@@ -0,0 +1,107 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
*/}}
{{- define "node-feature-discovery.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "node-feature-discovery.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{/*
Allow the release namespace to be overridden for multi-namespace deployments in combined charts
*/}}
{{- define "node-feature-discovery.namespace" -}}
{{- if .Values.namespaceOverride -}}
{{- .Values.namespaceOverride -}}
{{- else -}}
{{- .Release.Namespace -}}
{{- end -}}
{{- end -}}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "node-feature-discovery.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Common labels
*/}}
{{- define "node-feature-discovery.labels" -}}
helm.sh/chart: {{ include "node-feature-discovery.chart" . }}
{{ include "node-feature-discovery.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end -}}
{{/*
Selector labels
*/}}
{{- define "node-feature-discovery.selectorLabels" -}}
app.kubernetes.io/name: {{ include "node-feature-discovery.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end -}}
{{/*
Create the name of the service account which the nfd master will use
*/}}
{{- define "node-feature-discovery.master.serviceAccountName" -}}
{{- if .Values.master.serviceAccount.create -}}
{{ default (include "node-feature-discovery.fullname" .) .Values.master.serviceAccount.name }}
{{- else -}}
{{ default "default" .Values.master.serviceAccount.name }}
{{- end -}}
{{- end -}}
{{/*
Create the name of the service account which the nfd worker will use
*/}}
{{- define "node-feature-discovery.worker.serviceAccountName" -}}
{{- if .Values.worker.serviceAccount.create -}}
{{ default (printf "%s-worker" (include "node-feature-discovery.fullname" .)) .Values.worker.serviceAccount.name }}
{{- else -}}
{{ default "default" .Values.worker.serviceAccount.name }}
{{- end -}}
{{- end -}}
{{/*
Create the name of the service account which topologyUpdater will use
*/}}
{{- define "node-feature-discovery.topologyUpdater.serviceAccountName" -}}
{{- if .Values.topologyUpdater.serviceAccount.create -}}
{{ default (printf "%s-topology-updater" (include "node-feature-discovery.fullname" .)) .Values.topologyUpdater.serviceAccount.name }}
{{- else -}}
{{ default "default" .Values.topologyUpdater.serviceAccount.name }}
{{- end -}}
{{- end -}}
{{/*
Create the name of the service account which nfd-gc will use
*/}}
{{- define "node-feature-discovery.gc.serviceAccountName" -}}
{{- if .Values.gc.serviceAccount.create -}}
{{ default (printf "%s-gc" (include "node-feature-discovery.fullname" .)) .Values.gc.serviceAccount.name }}
{{- else -}}
{{ default "default" .Values.gc.serviceAccount.name }}
{{- end -}}
{{- end -}}

View File

@@ -0,0 +1,140 @@
{{- if and .Values.master.enable .Values.master.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "node-feature-discovery.fullname" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- namespaces
verbs:
- watch
- list
- apiGroups:
- ""
resources:
- nodes
- nodes/status
verbs:
- get
- patch
- update
- list
- apiGroups:
- nfd.k8s-sigs.io
resources:
- nodefeatures
- nodefeaturerules
- nodefeaturegroups
verbs:
- get
- list
- watch
- apiGroups:
- nfd.k8s-sigs.io
resources:
- nodefeaturegroups/status
verbs:
- patch
- update
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- create
- apiGroups:
- coordination.k8s.io
resources:
- leases
resourceNames:
- "nfd-master.nfd.kubernetes.io"
verbs:
- get
- update
{{- end }}
{{- if and .Values.topologyUpdater.enable .Values.topologyUpdater.rbac.create }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-topology-updater
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- apiGroups:
- ""
resources:
- namespaces
verbs:
- get
- apiGroups:
- ""
resources:
- nodes/proxy
verbs:
- get
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- topology.node.k8s.io
resources:
- noderesourcetopologies
verbs:
- create
- get
- update
{{- end }}
{{- if and .Values.gc.enable .Values.gc.rbac.create }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-gc
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/proxy
verbs:
- get
- apiGroups:
- topology.node.k8s.io
resources:
- noderesourcetopologies
verbs:
- delete
- list
- apiGroups:
- nfd.k8s-sigs.io
resources:
- nodefeatures
verbs:
- delete
- list
{{- end }}

View File

@@ -0,0 +1,52 @@
{{- if and .Values.master.enable .Values.master.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "node-feature-discovery.fullname" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "node-feature-discovery.fullname" . }}
subjects:
- kind: ServiceAccount
name: {{ include "node-feature-discovery.master.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
{{- end }}
{{- if and .Values.topologyUpdater.enable .Values.topologyUpdater.rbac.create }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-topology-updater
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "node-feature-discovery.fullname" . }}-topology-updater
subjects:
- kind: ServiceAccount
name: {{ include "node-feature-discovery.topologyUpdater.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
{{- end }}
{{- if and .Values.gc.enable .Values.gc.rbac.create }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-gc
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "node-feature-discovery.fullname" . }}-gc
subjects:
- kind: ServiceAccount
name: {{ include "node-feature-discovery.gc.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
{{- end }}

View File

@@ -0,0 +1,170 @@
{{- if .Values.master.enable }}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-master
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
role: master
{{- with .Values.master.deploymentAnnotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
replicas: {{ .Values.master.replicaCount }}
revisionHistoryLimit: {{ .Values.master.revisionHistoryLimit }}
selector:
matchLabels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 6 }}
role: master
template:
metadata:
labels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 8 }}
role: master
annotations:
checksum/config: {{ include (print $.Template.BasePath "/nfd-master-conf.yaml") . | sha256sum }}
{{- with .Values.master.annotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
{{- with .Values.priorityClassName }}
priorityClassName: {{ . }}
{{- end }}
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "node-feature-discovery.master.serviceAccountName" . }}
enableServiceLinks: false
securityContext:
{{- toYaml .Values.master.podSecurityContext | nindent 8 }}
hostNetwork: {{ .Values.master.hostNetwork }}
containers:
- name: master
securityContext:
{{- toYaml .Values.master.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
startupProbe:
grpc:
port: {{ .Values.master.healthPort | default "8082" }}
{{- with .Values.master.startupProbe.initialDelaySeconds }}
initialDelaySeconds: {{ . }}
{{- end }}
{{- with .Values.master.startupProbe.failureThreshold }}
failureThreshold: {{ . }}
{{- end }}
{{- with .Values.master.startupProbe.periodSeconds }}
periodSeconds: {{ . }}
{{- end }}
{{- with .Values.master.startupProbe.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
livenessProbe:
grpc:
port: {{ .Values.master.healthPort | default "8082" }}
{{- with .Values.master.livenessProbe.initialDelaySeconds }}
initialDelaySeconds: {{ . }}
{{- end }}
{{- with .Values.master.livenessProbe.failureThreshold }}
failureThreshold: {{ . }}
{{- end }}
{{- with .Values.master.livenessProbe.periodSeconds }}
periodSeconds: {{ . }}
{{- end }}
{{- with .Values.master.livenessProbe.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
readinessProbe:
grpc:
port: {{ .Values.master.healthPort | default "8082" }}
{{- with .Values.master.readinessProbe.initialDelaySeconds }}
initialDelaySeconds: {{ . }}
{{- end }}
{{- with .Values.master.readinessProbe.failureThreshold }}
failureThreshold: {{ . }}
{{- end }}
{{- with .Values.master.readinessProbe.periodSeconds }}
periodSeconds: {{ . }}
{{- end }}
{{- with .Values.master.readinessProbe.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
{{- with .Values.master.readinessProbe.successThreshold }}
successThreshold: {{ . }}
{{- end }}
ports:
- containerPort: {{ .Values.master.metricsPort | default "8081" }}
name: metrics
- containerPort: {{ .Values.master.healthPort | default "8082" }}
name: health
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- with .Values.master.extraEnvs }}
{{- toYaml . | nindent 8 }}
{{- end}}
command:
- "nfd-master"
resources:
{{- toYaml .Values.master.resources | nindent 12 }}
args:
{{- if .Values.master.instance | empty | not }}
- "-instance={{ .Values.master.instance }}"
{{- end }}
- "-enable-leader-election"
{{- if .Values.master.extraLabelNs | empty | not }}
- "-extra-label-ns={{- join "," .Values.master.extraLabelNs }}"
{{- end }}
{{- if .Values.master.denyLabelNs | empty | not }}
- "-deny-label-ns={{- join "," .Values.master.denyLabelNs }}"
{{- end }}
{{- if .Values.master.enableTaints }}
- "-enable-taints"
{{- end }}
{{- if .Values.master.featureRulesController | kindIs "invalid" | not }}
- "-featurerules-controller={{ .Values.master.featureRulesController }}"
{{- end }}
{{- if .Values.master.resyncPeriod }}
- "-resync-period={{ .Values.master.resyncPeriod }}"
{{- end }}
{{- if .Values.master.nfdApiParallelism | empty | not }}
- "-nfd-api-parallelism={{ .Values.master.nfdApiParallelism }}"
{{- end }}
# Go over featureGates and add the feature-gate flag
{{- range $key, $value := .Values.featureGates }}
- "-feature-gates={{ $key }}={{ $value }}"
{{- end }}
- "-metrics={{ .Values.master.metricsPort | default "8081" }}"
- "-grpc-health={{ .Values.master.healthPort | default "8082" }}"
{{- with .Values.master.extraArgs }}
{{- toYaml . | nindent 12 }}
{{- end }}
volumeMounts:
- name: nfd-master-conf
mountPath: "/etc/kubernetes/node-feature-discovery"
readOnly: true
volumes:
- name: nfd-master-conf
configMap:
name: {{ include "node-feature-discovery.fullname" . }}-master-conf
items:
- key: nfd-master.conf
path: nfd-master.conf
{{- with .Values.master.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.master.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.master.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,88 @@
{{- if and .Values.gc.enable -}}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-gc
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
role: gc
{{- with .Values.gc.deploymentAnnotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
replicas: {{ .Values.gc.replicaCount | default 1 }}
revisionHistoryLimit: {{ .Values.gc.revisionHistoryLimit }}
selector:
matchLabels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 6 }}
role: gc
template:
metadata:
labels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 8 }}
role: gc
{{- with .Values.gc.annotations }}
annotations:
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ include "node-feature-discovery.gc.serviceAccountName" . }}
dnsPolicy: ClusterFirstWithHostNet
{{- with .Values.priorityClassName }}
priorityClassName: {{ . }}
{{- end }}
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
securityContext:
{{- toYaml .Values.gc.podSecurityContext | nindent 8 }}
hostNetwork: {{ .Values.gc.hostNetwork }}
containers:
- name: gc
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: "{{ .Values.image.pullPolicy }}"
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- with .Values.gc.extraEnvs }}
{{- toYaml . | nindent 8 }}
{{- end}}
command:
- "nfd-gc"
args:
{{- if .Values.gc.interval | empty | not }}
- "-gc-interval={{ .Values.gc.interval }}"
{{- end }}
{{- with .Values.gc.extraArgs }}
{{- toYaml . | nindent 10 }}
{{- end }}
resources:
{{- toYaml .Values.gc.resources | nindent 12 }}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: [ "ALL" ]
readOnlyRootFilesystem: true
runAsNonRoot: true
ports:
- name: metrics
containerPort: {{ .Values.gc.metricsPort | default "8081"}}
{{- with .Values.gc.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.gc.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.gc.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,12 @@
{{- if .Values.master.enable }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-master-conf
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
data:
nfd-master.conf: |-
{{- .Values.master.config | toYaml | nindent 4 }}
{{- end }}

View File

@@ -0,0 +1,12 @@
{{- if .Values.topologyUpdater.enable -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-topology-updater-conf
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
data:
nfd-topology-updater.conf: |-
{{- .Values.topologyUpdater.config | toYaml | nindent 4 }}
{{- end }}

View File

@@ -0,0 +1,12 @@
{{- if .Values.worker.enable }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-worker-conf
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
data:
nfd-worker.conf: |-
{{- .Values.worker.config | toYaml | nindent 4 }}
{{- end }}

View File

@@ -0,0 +1,94 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-prune
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-prune
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
rules:
- apiGroups:
- ""
resources:
- nodes
- nodes/status
verbs:
- get
- patch
- update
- list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-prune
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "node-feature-discovery.fullname" . }}-prune
subjects:
- kind: ServiceAccount
name: {{ include "node-feature-discovery.fullname" . }}-prune
namespace: {{ include "node-feature-discovery.namespace" . }}
---
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-prune
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": post-delete
"helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
template:
metadata:
labels:
{{- include "node-feature-discovery.labels" . | nindent 8 }}
role: prune
spec:
serviceAccountName: {{ include "node-feature-discovery.fullname" . }}-prune
containers:
- name: nfd-master
securityContext:
{{- toYaml .Values.master.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
command:
- "nfd-master"
args:
- "-prune"
{{- if .Values.master.instance | empty | not }}
- "-instance={{ .Values.master.instance }}"
{{- end }}
restartPolicy: Never
{{- with .Values.master.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.master.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.master.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}

View File

@@ -0,0 +1,26 @@
{{- if .Values.prometheus.enable }}
# Prometheus Monitor Service (Metrics)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: {{ include "node-feature-discovery.fullname" . }}
labels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 4 }}
{{- with .Values.prometheus.labels }}
{{ toYaml . | nindent 4 }}
{{- end }}
spec:
podMetricsEndpoints:
- honorLabels: true
interval: {{ .Values.prometheus.scrapeInterval }}
path: /metrics
port: metrics
scheme: http
namespaceSelector:
matchNames:
- {{ include "node-feature-discovery.namespace" . }}
selector:
matchExpressions:
- {key: app.kubernetes.io/instance, operator: In, values: ["{{ .Release.Name }}"]}
- {key: app.kubernetes.io/name, operator: In, values: ["{{ include "node-feature-discovery.name" . }}"]}
{{- end }}

View File

@@ -0,0 +1,25 @@
{{- if and .Values.worker.enable .Values.worker.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-worker
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
rules:
- apiGroups:
- nfd.k8s-sigs.io
resources:
- nodefeatures
verbs:
- create
- get
- update
- delete
- apiGroups:
- ""
resources:
- pods
verbs:
- get
{{- end }}

View File

@@ -0,0 +1,18 @@
{{- if and .Values.worker.enable .Values.worker.rbac.create }}
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-worker
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "node-feature-discovery.fullname" . }}-worker
subjects:
- kind: ServiceAccount
name: {{ include "node-feature-discovery.worker.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
{{- end }}

View File

@@ -0,0 +1,58 @@
{{- if and .Values.master.enable .Values.master.serviceAccount.create }}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "node-feature-discovery.master.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
{{- with .Values.master.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- if and .Values.topologyUpdater.enable .Values.topologyUpdater.serviceAccount.create }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "node-feature-discovery.topologyUpdater.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
{{- with .Values.topologyUpdater.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- if and .Values.gc.enable .Values.gc.serviceAccount.create }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "node-feature-discovery.gc.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
{{- with .Values.gc.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}
{{- if and .Values.worker.enable .Values.worker.serviceAccount.create }}
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "node-feature-discovery.worker.serviceAccountName" . }}
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
{{- with .Values.worker.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,278 @@
{{- if and .Values.topologyUpdater.enable .Values.topologyUpdater.createCRDs -}}
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
api-approved.kubernetes.io: https://github.com/kubernetes/enhancements/pull/1870
controller-gen.kubebuilder.io/version: v0.11.2
creationTimestamp: null
name: noderesourcetopologies.topology.node.k8s.io
spec:
group: topology.node.k8s.io
names:
kind: NodeResourceTopology
listKind: NodeResourceTopologyList
plural: noderesourcetopologies
shortNames:
- node-res-topo
singular: noderesourcetopology
scope: Cluster
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: NodeResourceTopology describes node resources and their topology.
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
topologyPolicies:
items:
type: string
type: array
zones:
description: ZoneList contains an array of Zone objects.
items:
description: Zone represents a resource topology zone, e.g. socket,
node, die or core.
properties:
attributes:
description: AttributeList contains an array of AttributeInfo objects.
items:
description: AttributeInfo contains one attribute of a Zone.
properties:
name:
type: string
value:
type: string
required:
- name
- value
type: object
type: array
costs:
description: CostList contains an array of CostInfo objects.
items:
description: CostInfo describes the cost (or distance) between
two Zones.
properties:
name:
type: string
value:
format: int64
type: integer
required:
- name
- value
type: object
type: array
name:
type: string
parent:
type: string
resources:
description: ResourceInfoList contains an array of ResourceInfo
objects.
items:
description: ResourceInfo contains information about one resource
type.
properties:
allocatable:
anyOf:
- type: integer
- type: string
description: Allocatable quantity of the resource, corresponding
to allocatable in node status, i.e. total amount of this
resource available to be used by pods.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
available:
anyOf:
- type: integer
- type: string
description: Available is the amount of this resource currently
available for new (to be scheduled) pods, i.e. Allocatable
minus the resources reserved by currently running pods.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
capacity:
anyOf:
- type: integer
- type: string
description: Capacity of the resource, corresponding to capacity
in node status, i.e. total amount of this resource that
the node has.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
name:
description: Name of the resource.
type: string
required:
- allocatable
- available
- capacity
- name
type: object
type: array
type:
type: string
required:
- name
- type
type: object
type: array
required:
- topologyPolicies
- zones
type: object
served: true
storage: false
- name: v1alpha2
schema:
openAPIV3Schema:
description: NodeResourceTopology describes node resources and their topology.
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
attributes:
description: AttributeList contains an array of AttributeInfo objects.
items:
description: AttributeInfo contains one attribute of a Zone.
properties:
name:
type: string
value:
type: string
required:
- name
- value
type: object
type: array
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
topologyPolicies:
description: 'DEPRECATED (to be removed in v1beta1): use top level attributes
if needed'
items:
type: string
type: array
zones:
description: ZoneList contains an array of Zone objects.
items:
description: Zone represents a resource topology zone, e.g. socket,
node, die or core.
properties:
attributes:
description: AttributeList contains an array of AttributeInfo objects.
items:
description: AttributeInfo contains one attribute of a Zone.
properties:
name:
type: string
value:
type: string
required:
- name
- value
type: object
type: array
costs:
description: CostList contains an array of CostInfo objects.
items:
description: CostInfo describes the cost (or distance) between
two Zones.
properties:
name:
type: string
value:
format: int64
type: integer
required:
- name
- value
type: object
type: array
name:
type: string
parent:
type: string
resources:
description: ResourceInfoList contains an array of ResourceInfo
objects.
items:
description: ResourceInfo contains information about one resource
type.
properties:
allocatable:
anyOf:
- type: integer
- type: string
description: Allocatable quantity of the resource, corresponding
to allocatable in node status, i.e. total amount of this
resource available to be used by pods.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
available:
anyOf:
- type: integer
- type: string
description: Available is the amount of this resource currently
available for new (to be scheduled) pods, i.e. Allocatable
minus the resources reserved by currently running pods.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
capacity:
anyOf:
- type: integer
- type: string
description: Capacity of the resource, corresponding to capacity
in node status, i.e. total amount of this resource that
the node has.
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
name:
description: Name of the resource.
type: string
required:
- allocatable
- available
- capacity
- name
type: object
type: array
type:
type: string
required:
- name
- type
type: object
type: array
required:
- zones
type: object
served: true
storage: true
status:
acceptedNames:
kind: ""
plural: ""
conditions: []
storedVersions: []
{{- end }}

View File

@@ -0,0 +1,188 @@
{{- if .Values.topologyUpdater.enable -}}
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-topology-updater
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
role: topology-updater
{{- with .Values.topologyUpdater.daemonsetAnnotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
revisionHistoryLimit: {{ .Values.topologyUpdater.revisionHistoryLimit }}
selector:
matchLabels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 6 }}
role: topology-updater
template:
metadata:
labels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 8 }}
role: topology-updater
annotations:
checksum/config: {{ include (print $.Template.BasePath "/nfd-topologyupdater-conf.yaml") . | sha256sum }}
{{- with .Values.topologyUpdater.annotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
serviceAccountName: {{ include "node-feature-discovery.topologyUpdater.serviceAccountName" . }}
dnsPolicy: ClusterFirstWithHostNet
{{- with .Values.priorityClassName }}
priorityClassName: {{ . }}
{{- end }}
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
securityContext:
{{- toYaml .Values.topologyUpdater.podSecurityContext | nindent 8 }}
hostNetwork: {{ .Values.topologyUpdater.hostNetwork }}
containers:
- name: topology-updater
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: "{{ .Values.image.pullPolicy }}"
livenessProbe:
grpc:
port: {{ .Values.topologyUpdater.healthPort | default "8082" }}
{{- with .Values.topologyUpdater.livenessProbe.initialDelaySeconds }}
initialDelaySeconds: {{ . }}
{{- end }}
{{- with .Values.topologyUpdater.livenessProbe.failureThreshold }}
failureThreshold: {{ . }}
{{- end }}
{{- with .Values.topologyUpdater.livenessProbe.periodSeconds }}
periodSeconds: {{ . }}
{{- end }}
{{- with .Values.topologyUpdater.livenessProbe.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
readinessProbe:
grpc:
port: {{ .Values.topologyUpdater.healthPort | default "8082" }}
{{- with .Values.topologyUpdater.readinessProbe.initialDelaySeconds }}
initialDelaySeconds: {{ . }}
{{- end }}
{{- with .Values.topologyUpdater.readinessProbe.failureThreshold }}
failureThreshold: {{ . }}
{{- end }}
{{- with .Values.topologyUpdater.readinessProbe.periodSeconds }}
periodSeconds: {{ . }}
{{- end }}
{{- with .Values.topologyUpdater.readinessProbe.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
{{- with .Values.topologyUpdater.readinessProbe.successThreshold }}
successThreshold: {{ . }}
{{- end }}
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: NODE_ADDRESS
valueFrom:
fieldRef:
fieldPath: status.hostIP
{{- with .Values.topologyUpdater.extraEnvs }}
{{- toYaml . | nindent 8 }}
{{- end}}
command:
- "nfd-topology-updater"
args:
- "-podresources-socket=/host-var/lib/kubelet-podresources/kubelet.sock"
{{- if .Values.topologyUpdater.updateInterval | empty | not }}
- "-sleep-interval={{ .Values.topologyUpdater.updateInterval }}"
{{- else }}
- "-sleep-interval=3s"
{{- end }}
{{- if .Values.topologyUpdater.watchNamespace | empty | not }}
- "-watch-namespace={{ .Values.topologyUpdater.watchNamespace }}"
{{- else }}
- "-watch-namespace=*"
{{- end }}
{{- if not .Values.topologyUpdater.podSetFingerprint }}
- "-pods-fingerprint=false"
{{- end }}
{{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }}
- "-kubelet-config-uri=file:///host-var/kubelet-config"
{{- end }}
{{- if .Values.topologyUpdater.kubeletStateDir | empty }}
# Disable kubelet state tracking by giving an empty path
- "-kubelet-state-dir="
{{- end }}
- "-metrics={{ .Values.topologyUpdater.metricsPort | default "8081"}}"
- "-grpc-health={{ .Values.topologyUpdater.healthPort | default "8082" }}"
{{- with .Values.topologyUpdater.extraArgs }}
{{- toYaml . | nindent 10 }}
{{- end }}
ports:
- containerPort: {{ .Values.topologyUpdater.metricsPort | default "8081"}}
name: metrics
- containerPort: {{ .Values.topologyUpdater.healthPort | default "8082" }}
name: health
volumeMounts:
{{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }}
- name: kubelet-config
mountPath: /host-var/kubelet-config
{{- end }}
- name: kubelet-podresources-sock
mountPath: /host-var/lib/kubelet-podresources/kubelet.sock
- name: host-sys
mountPath: /host-sys
{{- if .Values.topologyUpdater.kubeletStateDir | empty | not }}
- name: kubelet-state-files
mountPath: /host-var/lib/kubelet
readOnly: true
{{- end }}
- name: nfd-topology-updater-conf
mountPath: "/etc/kubernetes/node-feature-discovery"
readOnly: true
resources:
{{- toYaml .Values.topologyUpdater.resources | nindent 12 }}
securityContext:
{{- toYaml .Values.topologyUpdater.securityContext | nindent 12 }}
volumes:
- name: host-sys
hostPath:
path: "/sys"
{{- if .Values.topologyUpdater.kubeletConfigPath | empty | not }}
- name: kubelet-config
hostPath:
path: {{ .Values.topologyUpdater.kubeletConfigPath }}
{{- end }}
- name: kubelet-podresources-sock
hostPath:
{{- if .Values.topologyUpdater.kubeletPodResourcesSockPath | empty | not }}
path: {{ .Values.topologyUpdater.kubeletPodResourcesSockPath }}
{{- else }}
path: /var/lib/kubelet/pod-resources/kubelet.sock
{{- end }}
{{- if .Values.topologyUpdater.kubeletStateDir | empty | not }}
- name: kubelet-state-files
hostPath:
path: {{ .Values.topologyUpdater.kubeletStateDir }}
{{- end }}
- name: nfd-topology-updater-conf
configMap:
name: {{ include "node-feature-discovery.fullname" . }}-topology-updater-conf
items:
- key: nfd-topology-updater.conf
path: nfd-topology-updater.conf
{{- with .Values.topologyUpdater.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.topologyUpdater.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.topologyUpdater.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,195 @@
{{- if .Values.worker.enable }}
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: {{ include "node-feature-discovery.fullname" . }}-worker
namespace: {{ include "node-feature-discovery.namespace" . }}
labels:
{{- include "node-feature-discovery.labels" . | nindent 4 }}
role: worker
{{- with .Values.worker.daemonsetAnnotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
revisionHistoryLimit: {{ .Values.worker.revisionHistoryLimit }}
selector:
matchLabels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 6 }}
role: worker
template:
metadata:
labels:
{{- include "node-feature-discovery.selectorLabels" . | nindent 8 }}
role: worker
annotations:
checksum/config: {{ include (print $.Template.BasePath "/nfd-worker-conf.yaml") . | sha256sum }}
{{- with .Values.worker.annotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
spec:
dnsPolicy: ClusterFirstWithHostNet
{{- with .Values.priorityClassName }}
priorityClassName: {{ . }}
{{- end }}
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "node-feature-discovery.worker.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.worker.podSecurityContext | nindent 8 }}
hostNetwork: {{ .Values.worker.hostNetwork }}
containers:
- name: worker
securityContext:
{{- toYaml .Values.worker.securityContext | nindent 12 }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
livenessProbe:
grpc:
port: {{ .Values.worker.healthPort | default "8082" }}
{{- with .Values.worker.livenessProbe.initialDelaySeconds }}
initialDelaySeconds: {{ . }}
{{- end }}
{{- with .Values.worker.livenessProbe.failureThreshold }}
failureThreshold: {{ . }}
{{- end }}
{{- with .Values.worker.livenessProbe.periodSeconds }}
periodSeconds: {{ . }}
{{- end }}
{{- with .Values.worker.livenessProbe.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
readinessProbe:
grpc:
port: {{ .Values.worker.healthPort | default "8082" }}
{{- with .Values.worker.readinessProbe.initialDelaySeconds }}
initialDelaySeconds: {{ . }}
{{- end }}
{{- with .Values.worker.readinessProbe.failureThreshold }}
failureThreshold: {{ . }}
{{- end }}
{{- with .Values.worker.readinessProbe.periodSeconds }}
periodSeconds: {{ . }}
{{- end }}
{{- with .Values.worker.readinessProbe.timeoutSeconds }}
timeoutSeconds: {{ . }}
{{- end }}
{{- with .Values.worker.readinessProbe.successThreshold }}
successThreshold: {{ . }}
{{- end }}
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
{{- with .Values.worker.extraEnvs }}
{{- toYaml . | nindent 8 }}
{{- end}}
resources:
{{- toYaml .Values.worker.resources | nindent 12 }}
command:
- "nfd-worker"
args:
# Go over featureGate and add the feature-gate flag
{{- range $key, $value := .Values.featureGates }}
- "-feature-gates={{ $key }}={{ $value }}"
{{- end }}
- "-metrics={{ .Values.worker.metricsPort | default "8081"}}"
- "-grpc-health={{ .Values.worker.healthPort | default "8082" }}"
{{- with .Values.worker.extraArgs }}
{{- toYaml . | nindent 8 }}
{{- end }}
ports:
- containerPort: {{ .Values.worker.metricsPort | default "8081"}}
name: metrics
- containerPort: {{ .Values.worker.healthPort | default "8082" }}
name: health
volumeMounts:
- name: host-boot
mountPath: "/host-boot"
readOnly: true
- name: host-os-release
mountPath: "/host-etc/os-release"
readOnly: true
- name: host-sys
mountPath: "/host-sys"
readOnly: true
- name: host-usr-lib
mountPath: "/host-usr/lib"
readOnly: true
- name: host-lib
mountPath: "/host-lib"
readOnly: true
- name: host-proc-swaps
mountPath: "/host-proc/swaps"
readOnly: true
{{- if .Values.worker.mountUsrSrc }}
- name: host-usr-src
mountPath: "/host-usr/src"
readOnly: true
{{- end }}
- name: features-d
mountPath: "/etc/kubernetes/node-feature-discovery/features.d/"
readOnly: true
- name: nfd-worker-conf
mountPath: "/etc/kubernetes/node-feature-discovery"
readOnly: true
volumes:
- name: host-boot
hostPath:
path: "/boot"
- name: host-os-release
hostPath:
path: "/etc/os-release"
- name: host-sys
hostPath:
path: "/sys"
- name: host-usr-lib
hostPath:
path: "/usr/lib"
- name: host-lib
hostPath:
path: "/lib"
- name: host-proc-swaps
hostPath:
path: "/proc/swaps"
{{- if .Values.worker.mountUsrSrc }}
- name: host-usr-src
hostPath:
path: "/usr/src"
{{- end }}
- name: features-d
hostPath:
path: "/etc/kubernetes/node-feature-discovery/features.d/"
- name: nfd-worker-conf
configMap:
name: {{ include "node-feature-discovery.fullname" . }}-worker-conf
items:
- key: nfd-worker.conf
path: nfd-worker.conf
{{- with .Values.worker.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.worker.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.worker.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.worker.priorityClassName }}
priorityClassName: {{ . | quote }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,599 @@
image:
repository: registry.k8s.io/nfd/node-feature-discovery
# This should be set to 'IfNotPresent' for released version
pullPolicy: IfNotPresent
# tag, if defined will use the given image tag, else Chart.AppVersion will be used
# tag
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
namespaceOverride: ""
featureGates:
NodeFeatureGroupAPI: false
priorityClassName: ""
master:
enable: true
extraArgs: []
extraEnvs: []
hostNetwork: false
config: ### <NFD-MASTER-CONF-START-DO-NOT-REMOVE>
# noPublish: false
# autoDefaultNs: true
# extraLabelNs: ["added.ns.io","added.kubernets.io"]
# denyLabelNs: ["denied.ns.io","denied.kubernetes.io"]
# enableTaints: false
# labelWhiteList: "foo"
# resyncPeriod: "2h"
# restrictions:
# disableLabels: true
# disableTaints: true
# disableExtendedResources: true
# disableAnnotations: true
# allowOverwrite: false
# denyNodeFeatureLabels: true
# nodeFeatureNamespaceSelector:
# matchLabels:
# kubernetes.io/metadata.name: "node-feature-discovery"
# matchExpressions:
# - key: "kubernetes.io/metadata.name"
# operator: "In"
# values:
# - "node-feature-discovery"
# klog:
# addDirHeader: false
# alsologtostderr: false
# logBacktraceAt:
# logtostderr: true
# skipHeaders: false
# stderrthreshold: 2
# v: 0
# vmodule:
## NOTE: the following options are not dynamically run-time configurable
## and require a nfd-master restart to take effect after being changed
# logDir:
# logFile:
# logFileMaxSize: 1800
# skipLogHeaders: false
# leaderElection:
# leaseDuration: 15s
# # this value has to be lower than leaseDuration and greater than retryPeriod*1.2
# renewDeadline: 10s
# # this value has to be greater than 0
# retryPeriod: 2s
# nfdApiParallelism: 10
### <NFD-MASTER-CONF-END-DO-NOT-REMOVE>
metricsPort: 8081
healthPort: 8082
instance:
featureApi:
resyncPeriod:
denyLabelNs: []
extraLabelNs: []
enableTaints: false
featureRulesController: null
nfdApiParallelism: null
deploymentAnnotations: {}
replicaCount: 1
podSecurityContext: {}
# fsGroup: 2000
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: [ "ALL" ]
readOnlyRootFilesystem: true
runAsNonRoot: true
# runAsUser: 1000
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations: {}
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name:
# specify how many old ReplicaSets for the Deployment to retain.
revisionHistoryLimit:
rbac:
create: true
resources:
limits:
memory: 4Gi
requests:
cpu: 100m
# You may want to use the same value for `requests.memory` and `limits.memory`. The “requests” value affects scheduling to accommodate pods on nodes.
# If there is a large difference between “requests” and “limits” and nodes experience memory pressure, the kernel may invoke
# the OOM Killer, even if the memory does not exceed the “limits” threshold. This can cause unexpected pod evictions. Memory
# cannot be compressed and once allocated to a pod, it can only be reclaimed by killing the pod.
# Natan Yellin 22/09/2022 https://home.robusta.dev/blog/kubernetes-memory-limit
memory: 128Mi
nodeSelector: {}
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Equal"
value: ""
effect: "NoSchedule"
- key: "node-role.kubernetes.io/control-plane"
operator: "Equal"
value: ""
effect: "NoSchedule"
annotations: {}
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: "node-role.kubernetes.io/master"
operator: In
values: [""]
- weight: 1
preference:
matchExpressions:
- key: "node-role.kubernetes.io/control-plane"
operator: In
values: [""]
startupProbe:
grpc:
port: 8082
failureThreshold: 30
# periodSeconds: 10
livenessProbe:
grpc:
port: 8082
# failureThreshold: 3
# initialDelaySeconds: 0
# periodSeconds: 10
# timeoutSeconds: 1
readinessProbe:
grpc:
port: 8082
failureThreshold: 10
# initialDelaySeconds: 0
# periodSeconds: 10
# timeoutSeconds: 1
# successThreshold: 1
worker:
enable: true
extraArgs: []
extraEnvs: []
hostNetwork: false
config: ### <NFD-WORKER-CONF-START-DO-NOT-REMOVE>
#core:
# labelWhiteList:
# noPublish: false
# noOwnerRefs: false
# sleepInterval: 60s
# featureSources: [all]
# labelSources: [all]
# klog:
# addDirHeader: false
# alsologtostderr: false
# logBacktraceAt:
# logtostderr: true
# skipHeaders: false
# stderrthreshold: 2
# v: 0
# vmodule:
## NOTE: the following options are not dynamically run-time configurable
## and require a nfd-worker restart to take effect after being changed
# logDir:
# logFile:
# logFileMaxSize: 1800
# skipLogHeaders: false
#sources:
# cpu:
# cpuid:
## NOTE: whitelist has priority over blacklist
# attributeBlacklist:
# - "AVX10"
# - "BMI1"
# - "BMI2"
# - "CLMUL"
# - "CMOV"
# - "CX16"
# - "ERMS"
# - "F16C"
# - "HTT"
# - "LZCNT"
# - "MMX"
# - "MMXEXT"
# - "NX"
# - "POPCNT"
# - "RDRAND"
# - "RDSEED"
# - "RDTSCP"
# - "SGX"
# - "SSE"
# - "SSE2"
# - "SSE3"
# - "SSE4"
# - "SSE42"
# - "SSSE3"
# - "TDX_GUEST"
# attributeWhitelist:
# kernel:
# kconfigFile: "/path/to/kconfig"
# configOpts:
# - "NO_HZ"
# - "X86"
# - "DMI"
# pci:
# deviceClassWhitelist:
# - "0200"
# - "03"
# - "12"
# deviceLabelFields:
# - "class"
# - "vendor"
# - "device"
# - "subsystem_vendor"
# - "subsystem_device"
# usb:
# deviceClassWhitelist:
# - "0e"
# - "ef"
# - "fe"
# - "ff"
# deviceLabelFields:
# - "class"
# - "vendor"
# - "device"
# custom:
# # The following feature demonstrates the capabilities of the matchFeatures
# - name: "my custom rule"
# labels:
# "vendor.io/my-ng-feature": "true"
# # matchFeatures implements a logical AND over all matcher terms in the
# # list (i.e. all of the terms, or per-feature matchers, must match)
# matchFeatures:
# - feature: cpu.cpuid
# matchExpressions:
# AVX512F: {op: Exists}
# - feature: cpu.cstate
# matchExpressions:
# enabled: {op: IsTrue}
# - feature: cpu.pstate
# matchExpressions:
# no_turbo: {op: IsFalse}
# scaling_governor: {op: In, value: ["performance"]}
# - feature: cpu.rdt
# matchExpressions:
# RDTL3CA: {op: Exists}
# - feature: cpu.sst
# matchExpressions:
# bf.enabled: {op: IsTrue}
# - feature: cpu.topology
# matchExpressions:
# hardware_multithreading: {op: IsFalse}
#
# - feature: kernel.config
# matchExpressions:
# X86: {op: Exists}
# LSM: {op: InRegexp, value: ["apparmor"]}
# - feature: kernel.loadedmodule
# matchExpressions:
# e1000e: {op: Exists}
# - feature: kernel.selinux
# matchExpressions:
# enabled: {op: IsFalse}
# - feature: kernel.version
# matchExpressions:
# major: {op: In, value: ["5"]}
# minor: {op: Gt, value: ["10"]}
#
# - feature: storage.block
# matchExpressions:
# rotational: {op: In, value: ["0"]}
# dax: {op: In, value: ["0"]}
#
# - feature: network.device
# matchExpressions:
# operstate: {op: In, value: ["up"]}
# speed: {op: Gt, value: ["100"]}
#
# - feature: memory.numa
# matchExpressions:
# node_count: {op: Gt, value: ["2"]}
# - feature: memory.nv
# matchExpressions:
# devtype: {op: In, value: ["nd_dax"]}
# mode: {op: In, value: ["memory"]}
#
# - feature: system.osrelease
# matchExpressions:
# ID: {op: In, value: ["fedora", "centos"]}
# - feature: system.name
# matchExpressions:
# nodename: {op: InRegexp, value: ["^worker-X"]}
#
# - feature: local.label
# matchExpressions:
# custom-feature-knob: {op: Gt, value: ["100"]}
#
# # The following feature demonstrates the capabilities of the matchAny
# - name: "my matchAny rule"
# labels:
# "vendor.io/my-ng-feature-2": "my-value"
# # matchAny implements a logical IF over all elements (sub-matchers) in
# # the list (i.e. at least one feature matcher must match)
# matchAny:
# - matchFeatures:
# - feature: kernel.loadedmodule
# matchExpressions:
# driver-module-X: {op: Exists}
# - feature: pci.device
# matchExpressions:
# vendor: {op: In, value: ["8086"]}
# class: {op: In, value: ["0200"]}
# - matchFeatures:
# - feature: kernel.loadedmodule
# matchExpressions:
# driver-module-Y: {op: Exists}
# - feature: usb.device
# matchExpressions:
# vendor: {op: In, value: ["8086"]}
# class: {op: In, value: ["02"]}
#
# - name: "avx wildcard rule"
# labels:
# "my-avx-feature": "true"
# matchFeatures:
# - feature: cpu.cpuid
# matchName: {op: InRegexp, value: ["^AVX512"]}
#
# # The following features demonstreate label templating capabilities
# - name: "my template rule"
# labelsTemplate: |
# {{ range .system.osrelease }}vendor.io/my-system-feature.{{ .Name }}={{ .Value }}
# {{ end }}
# matchFeatures:
# - feature: system.osrelease
# matchExpressions:
# ID: {op: InRegexp, value: ["^open.*"]}
# VERSION_ID.major: {op: In, value: ["13", "15"]}
#
# - name: "my template rule 2"
# labelsTemplate: |
# {{ range .pci.device }}vendor.io/my-pci-device.{{ .class }}-{{ .device }}=with-cpuid
# {{ end }}
# matchFeatures:
# - feature: pci.device
# matchExpressions:
# class: {op: InRegexp, value: ["^06"]}
# vendor: ["8086"]
# - feature: cpu.cpuid
# matchExpressions:
# AVX: {op: Exists}
#
# # The following examples demonstrate vars field and back-referencing
# # previous labels and vars
# - name: "my dummy kernel rule"
# labels:
# "vendor.io/my.kernel.feature": "true"
# matchFeatures:
# - feature: kernel.version
# matchExpressions:
# major: {op: Gt, value: ["2"]}
#
# - name: "my dummy rule with no labels"
# vars:
# "my.dummy.var": "1"
# matchFeatures:
# - feature: cpu.cpuid
# matchExpressions: {}
#
# - name: "my rule using backrefs"
# labels:
# "vendor.io/my.backref.feature": "true"
# matchFeatures:
# - feature: rule.matched
# matchExpressions:
# vendor.io/my.kernel.feature: {op: IsTrue}
# my.dummy.var: {op: Gt, value: ["0"]}
#
# - name: "kconfig template rule"
# labelsTemplate: |
# {{ range .kernel.config }}kconfig-{{ .Name }}={{ .Value }}
# {{ end }}
# matchFeatures:
# - feature: kernel.config
# matchName: {op: In, value: ["SWAP", "X86", "ARM"]}
### <NFD-WORKER-CONF-END-DO-NOT-REMOVE>
metricsPort: 8081
healthPort: 8082
daemonsetAnnotations: {}
podSecurityContext: {}
# fsGroup: 2000
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: [ "ALL" ]
readOnlyRootFilesystem: true
runAsNonRoot: true
# runAsUser: 1000
livenessProbe:
grpc:
port: 8082
initialDelaySeconds: 10
# failureThreshold: 3
# periodSeconds: 10
# timeoutSeconds: 1
readinessProbe:
grpc:
port: 8082
initialDelaySeconds: 5
failureThreshold: 10
# periodSeconds: 10
# timeoutSeconds: 1
# successThreshold: 1
serviceAccount:
# Specifies whether a service account should be created.
# We create this by default to make it easier for downstream users to apply PodSecurityPolicies.
create: true
# Annotations to add to the service account
annotations: {}
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name:
# specify how many old ControllerRevisions for the DaemonSet to retain.
revisionHistoryLimit:
rbac:
create: true
# Allow users to mount the hostPath /usr/src, useful for RHCOS on s390x
# Does not work on systems without /usr/src AND a read-only /usr, such as Talos
mountUsrSrc: false
resources:
limits:
memory: 512Mi
requests:
cpu: 5m
memory: 64Mi
nodeSelector: {}
tolerations: []
annotations: {}
affinity: {}
priorityClassName: ""
topologyUpdater:
config: ### <NFD-TOPOLOGY-UPDATER-CONF-START-DO-NOT-REMOVE>
## key = node name, value = list of resources to be excluded.
## use * to exclude from all nodes.
## an example for how the exclude list should looks like
#excludeList:
# node1: [cpu]
# node2: [memory, example/deviceA]
# *: [hugepages-2Mi]
### <NFD-TOPOLOGY-UPDATER-CONF-END-DO-NOT-REMOVE>
enable: false
createCRDs: false
extraArgs: []
extraEnvs: []
hostNetwork: false
serviceAccount:
create: true
annotations: {}
name:
# specify how many old ControllerRevisions for the DaemonSet to retain.
revisionHistoryLimit:
rbac:
create: true
metricsPort: 8081
healthPort: 8082
kubeletConfigPath:
kubeletPodResourcesSockPath:
updateInterval: 60s
watchNamespace: "*"
kubeletStateDir: /var/lib/kubelet
podSecurityContext: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: [ "ALL" ]
readOnlyRootFilesystem: true
runAsUser: 0
livenessProbe:
grpc:
port: 8082
initialDelaySeconds: 10
# failureThreshold: 3
# periodSeconds: 10
# timeoutSeconds: 1
readinessProbe:
grpc:
port: 8082
initialDelaySeconds: 5
failureThreshold: 10
# periodSeconds: 10
# timeoutSeconds: 1
# successThreshold: 1
resources:
limits:
memory: 60Mi
requests:
cpu: 50m
memory: 40Mi
nodeSelector: {}
tolerations: []
annotations: {}
daemonsetAnnotations: {}
affinity: {}
podSetFingerprint: true
gc:
enable: true
extraArgs: []
extraEnvs: []
hostNetwork: false
replicaCount: 1
serviceAccount:
create: true
annotations: {}
name:
rbac:
create: true
interval: 1h
podSecurityContext: {}
resources:
limits:
memory: 1Gi
requests:
cpu: 10m
memory: 128Mi
metricsPort: 8081
nodeSelector: {}
tolerations: []
annotations: {}
deploymentAnnotations: {}
affinity: {}
# specify how many old ReplicaSets for the Deployment to retain.
revisionHistoryLimit:
prometheus:
enable: false
scrapeInterval: 10s
labels: {}

View File

@@ -0,0 +1,809 @@
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.17.2
name: nvidiadrivers.nvidia.com
spec:
group: nvidia.com
names:
kind: NVIDIADriver
listKind: NVIDIADriverList
plural: nvidiadrivers
shortNames:
- nvd
- nvdriver
- nvdrivers
singular: nvidiadriver
scope: Cluster
versions:
- additionalPrinterColumns:
- jsonPath: .status.state
name: Status
type: string
- jsonPath: .metadata.creationTimestamp
name: Age
type: string
name: v1alpha1
schema:
openAPIV3Schema:
description: NVIDIADriver is the Schema for the nvidiadrivers API
properties:
apiVersion:
description: |-
APIVersion defines the versioned schema of this representation of an object.
Servers should convert recognized schemas to the latest internal value, and
may reject unrecognized values.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
type: string
kind:
description: |-
Kind is a string value representing the REST resource this object represents.
Servers may infer this from the endpoint the client submits requests to.
Cannot be updated.
In CamelCase.
More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
type: string
metadata:
type: object
spec:
description: NVIDIADriverSpec defines the desired state of NVIDIADriver
properties:
annotations:
additionalProperties:
type: string
description: |-
Optional: Annotations is an unstructured key value map stored with a resource that may be
set by external tools to store and retrieve arbitrary metadata. They are not
queryable and should be preserved when modifying objects.
type: object
args:
description: 'Optional: List of arguments'
items:
type: string
type: array
certConfig:
description: 'Optional: Custom certificates configuration for NVIDIA
Driver container'
properties:
name:
type: string
type: object
driverType:
default: gpu
description: DriverType defines NVIDIA driver type
enum:
- gpu
- vgpu
- vgpu-host-manager
type: string
x-kubernetes-validations:
- message: driverType is an immutable field. Please create a new NvidiaDriver
resource instead when you want to change this setting.
rule: self == oldSelf
env:
description: 'Optional: List of environment variables'
items:
description: EnvVar represents an environment variable present in
a Container.
properties:
name:
description: Name of the environment variable.
type: string
value:
description: Value of the environment variable.
type: string
required:
- name
type: object
type: array
gdrcopy:
description: GDRCopy defines the spec for GDRCopy driver
properties:
args:
description: 'Optional: List of arguments'
items:
type: string
type: array
enabled:
description: Enabled indicates if GDRCopy is enabled through GPU
operator
type: boolean
env:
description: 'Optional: List of environment variables'
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable.
type: string
value:
description: Value of the environment variable.
type: string
required:
- name
type: object
type: array
image:
description: GDRCopy driver image name
pattern: '[a-zA-Z0-9\-]+'
type: string
imagePullPolicy:
description: Image pull policy
type: string
imagePullSecrets:
description: Image pull secrets
items:
type: string
type: array
repository:
description: GDRCopy diver image repository
type: string
version:
description: GDRCopy driver image tag
type: string
type: object
gds:
description: GPUDirectStorage defines the spec for GDS driver
properties:
args:
description: 'Optional: List of arguments'
items:
type: string
type: array
enabled:
description: Enabled indicates if GPUDirect Storage is enabled
through GPU operator
type: boolean
env:
description: 'Optional: List of environment variables'
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable.
type: string
value:
description: Value of the environment variable.
type: string
required:
- name
type: object
type: array
image:
description: NVIDIA GPUDirect Storage Driver image name
pattern: '[a-zA-Z0-9\-]+'
type: string
imagePullPolicy:
description: Image pull policy
type: string
imagePullSecrets:
description: Image pull secrets
items:
type: string
type: array
repository:
description: NVIDIA GPUDirect Storage Driver image repository
type: string
version:
description: NVIDIA GPUDirect Storage Driver image tag
type: string
type: object
image:
default: nvcr.io/nvidia/driver
description: NVIDIA Driver container image name
type: string
imagePullPolicy:
description: Image pull policy
type: string
imagePullSecrets:
description: Image pull secrets
items:
type: string
type: array
kernelModuleConfig:
description: 'Optional: Kernel module configuration parameters for
the NVIDIA Driver'
properties:
name:
type: string
type: object
kernelModuleType:
default: auto
description: |-
KernelModuleType represents the type of driver kernel modules to be used when installing the GPU driver.
Accepted values are auto, proprietary and open. NOTE: If auto is chosen, it means that the recommended kernel module
type is chosen based on the GPU devices on the host and the driver branch used
enum:
- auto
- open
- proprietary
type: string
labels:
additionalProperties:
type: string
description: |-
Optional: Map of string keys and values that can be used to organize and categorize
(scope and select) objects. May match selectors of replication controllers
and services.
type: object
licensingConfig:
description: 'Optional: Licensing configuration for NVIDIA vGPU licensing'
properties:
name:
type: string
nlsEnabled:
description: NLSEnabled indicates if NVIDIA Licensing System is
used for licensing.
type: boolean
type: object
livenessProbe:
description: NVIDIA Driver container liveness probe settings
properties:
failureThreshold:
description: |-
Minimum consecutive failures for the probe to be considered failed after having succeeded.
Defaults to 3. Minimum value is 1.
format: int32
minimum: 1
type: integer
initialDelaySeconds:
description: |-
Number of seconds after the container has started before liveness probes are initiated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
format: int32
type: integer
periodSeconds:
description: |-
How often (in seconds) to perform the probe.
Default to 10 seconds. Minimum value is 1.
format: int32
minimum: 1
type: integer
successThreshold:
description: |-
Minimum consecutive successes for the probe to be considered successful after having failed.
Defaults to 1. Must be 1 for liveness and startup. Minimum value is 1.
format: int32
minimum: 1
type: integer
timeoutSeconds:
description: |-
Number of seconds after which the probe times out.
Defaults to 1 second. Minimum value is 1.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
format: int32
minimum: 1
type: integer
type: object
manager:
description: Manager represents configuration for NVIDIA Driver Manager
initContainer
properties:
env:
description: 'Optional: List of environment variables'
items:
description: EnvVar represents an environment variable present
in a Container.
properties:
name:
description: Name of the environment variable.
type: string
value:
description: Value of the environment variable.
type: string
required:
- name
type: object
type: array
image:
description: Image represents NVIDIA Driver Manager image name
pattern: '[a-zA-Z0-9\-]+'
type: string
imagePullPolicy:
description: Image pull policy
type: string
imagePullSecrets:
description: Image pull secrets
items:
type: string
type: array
repository:
description: Repository represents Driver Managerrepository path
type: string
version:
description: Version represents NVIDIA Driver Manager image tag(version)
type: string
type: object
nodeAffinity:
description: Affinity specifies node affinity rules for driver pods
properties:
preferredDuringSchedulingIgnoredDuringExecution:
description: |-
The scheduler will prefer to schedule pods to nodes that satisfy
the affinity expressions specified by this field, but it may choose
a node that violates one or more of the expressions. The node that is
most preferred is the one with the greatest sum of weights, i.e.
for each node that meets all of the scheduling requirements (resource
request, requiredDuringScheduling affinity expressions, etc.),
compute a sum by iterating through the elements of this field and adding
"weight" to the sum if the node matches the corresponding matchExpressions; the
node(s) with the highest sum are the most preferred.
items:
description: |-
An empty preferred scheduling term matches all objects with implicit weight 0
(i.e. it's a no-op). A null preferred scheduling term matches no objects (i.e. is also a no-op).
properties:
preference:
description: A node selector term, associated with the corresponding
weight.
properties:
matchExpressions:
description: A list of node selector requirements by
node's labels.
items:
description: |-
A node selector requirement is a selector that contains values, a key, and an operator
that relates the key and values.
properties:
key:
description: The label key that the selector applies
to.
type: string
operator:
description: |-
Represents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
type: string
values:
description: |-
An array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.
items:
type: string
type: array
x-kubernetes-list-type: atomic
required:
- key
- operator
type: object
type: array
x-kubernetes-list-type: atomic
matchFields:
description: A list of node selector requirements by
node's fields.
items:
description: |-
A node selector requirement is a selector that contains values, a key, and an operator
that relates the key and values.
properties:
key:
description: The label key that the selector applies
to.
type: string
operator:
description: |-
Represents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
type: string
values:
description: |-
An array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.
items:
type: string
type: array
x-kubernetes-list-type: atomic
required:
- key
- operator
type: object
type: array
x-kubernetes-list-type: atomic
type: object
x-kubernetes-map-type: atomic
weight:
description: Weight associated with matching the corresponding
nodeSelectorTerm, in the range 1-100.
format: int32
type: integer
required:
- preference
- weight
type: object
type: array
x-kubernetes-list-type: atomic
requiredDuringSchedulingIgnoredDuringExecution:
description: |-
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node.
If the affinity requirements specified by this field cease to be met
at some point during pod execution (e.g. due to an update), the system
may or may not try to eventually evict the pod from its node.
properties:
nodeSelectorTerms:
description: Required. A list of node selector terms. The
terms are ORed.
items:
description: |-
A null or empty node selector term matches no objects. The requirements of
them are ANDed.
The TopologySelectorTerm type implements a subset of the NodeSelectorTerm.
properties:
matchExpressions:
description: A list of node selector requirements by
node's labels.
items:
description: |-
A node selector requirement is a selector that contains values, a key, and an operator
that relates the key and values.
properties:
key:
description: The label key that the selector applies
to.
type: string
operator:
description: |-
Represents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
type: string
values:
description: |-
An array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.
items:
type: string
type: array
x-kubernetes-list-type: atomic
required:
- key
- operator
type: object
type: array
x-kubernetes-list-type: atomic
matchFields:
description: A list of node selector requirements by
node's fields.
items:
description: |-
A node selector requirement is a selector that contains values, a key, and an operator
that relates the key and values.
properties:
key:
description: The label key that the selector applies
to.
type: string
operator:
description: |-
Represents a key's relationship to a set of values.
Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
type: string
values:
description: |-
An array of string values. If the operator is In or NotIn,
the values array must be non-empty. If the operator is Exists or DoesNotExist,
the values array must be empty. If the operator is Gt or Lt, the values
array must have a single element, which will be interpreted as an integer.
This array is replaced during a strategic merge patch.
items:
type: string
type: array
x-kubernetes-list-type: atomic
required:
- key
- operator
type: object
type: array
x-kubernetes-list-type: atomic
type: object
x-kubernetes-map-type: atomic
type: array
x-kubernetes-list-type: atomic
required:
- nodeSelectorTerms
type: object
x-kubernetes-map-type: atomic
type: object
nodeSelector:
additionalProperties:
type: string
description: NodeSelector specifies a selector for installation of
NVIDIA driver
type: object
priorityClassName:
description: 'Optional: Set priorityClassName'
type: string
rdma:
description: GPUDirectRDMA defines the spec for NVIDIA Peer Memory
driver
properties:
enabled:
description: Enabled indicates if GPUDirect RDMA is enabled through
GPU operator
type: boolean
useHostMofed:
description: UseHostMOFED indicates to use MOFED drivers directly
installed on the host to enable GPUDirect RDMA
type: boolean
type: object
readinessProbe:
description: NVIDIA Driver container readiness probe settings
properties:
failureThreshold:
description: |-
Minimum consecutive failures for the probe to be considered failed after having succeeded.
Defaults to 3. Minimum value is 1.
format: int32
minimum: 1
type: integer
initialDelaySeconds:
description: |-
Number of seconds after the container has started before liveness probes are initiated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
format: int32
type: integer
periodSeconds:
description: |-
How often (in seconds) to perform the probe.
Default to 10 seconds. Minimum value is 1.
format: int32
minimum: 1
type: integer
successThreshold:
description: |-
Minimum consecutive successes for the probe to be considered successful after having failed.
Defaults to 1. Must be 1 for liveness and startup. Minimum value is 1.
format: int32
minimum: 1
type: integer
timeoutSeconds:
description: |-
Number of seconds after which the probe times out.
Defaults to 1 second. Minimum value is 1.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
format: int32
minimum: 1
type: integer
type: object
repoConfig:
description: 'Optional: Custom repo configuration for NVIDIA Driver
container'
properties:
name:
type: string
type: object
repository:
description: NVIDIA Driver repository
type: string
resources:
description: 'Optional: Define resources requests and limits for each
pod'
properties:
limits:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Limits describes the maximum amount of compute resources allowed.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
requests:
additionalProperties:
anyOf:
- type: integer
- type: string
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
x-kubernetes-int-or-string: true
description: |-
Requests describes the minimum amount of compute resources required.
If Requests is omitted for a container, it defaults to Limits if that is explicitly specified,
otherwise to an implementation-defined value. Requests cannot exceed Limits.
More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
type: object
type: object
startupProbe:
description: NVIDIA Driver container startup probe settings
properties:
failureThreshold:
description: |-
Minimum consecutive failures for the probe to be considered failed after having succeeded.
Defaults to 3. Minimum value is 1.
format: int32
minimum: 1
type: integer
initialDelaySeconds:
description: |-
Number of seconds after the container has started before liveness probes are initiated.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
format: int32
type: integer
periodSeconds:
description: |-
How often (in seconds) to perform the probe.
Default to 10 seconds. Minimum value is 1.
format: int32
minimum: 1
type: integer
successThreshold:
description: |-
Minimum consecutive successes for the probe to be considered successful after having failed.
Defaults to 1. Must be 1 for liveness and startup. Minimum value is 1.
format: int32
minimum: 1
type: integer
timeoutSeconds:
description: |-
Number of seconds after which the probe times out.
Defaults to 1 second. Minimum value is 1.
More info: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#container-probes
format: int32
minimum: 1
type: integer
type: object
tolerations:
description: 'Optional: Set tolerations'
items:
description: |-
The pod this Toleration is attached to tolerates any taint that matches
the triple <key,value,effect> using the matching operator <operator>.
properties:
effect:
description: |-
Effect indicates the taint effect to match. Empty means match all taint effects.
When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.
type: string
key:
description: |-
Key is the taint key that the toleration applies to. Empty means match all taint keys.
If the key is empty, operator must be Exists; this combination means to match all values and all keys.
type: string
operator:
description: |-
Operator represents a key's relationship to the value.
Valid operators are Exists and Equal. Defaults to Equal.
Exists is equivalent to wildcard for value, so that a pod can
tolerate all taints of a particular category.
type: string
tolerationSeconds:
description: |-
TolerationSeconds represents the period of time the toleration (which must be
of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
it is not set, which means tolerate the taint forever (do not evict). Zero and
negative values will be treated as 0 (evict immediately) by the system.
format: int64
type: integer
value:
description: |-
Value is the taint value the toleration matches to.
If the operator is Exists, the value should be empty, otherwise just a regular string.
type: string
type: object
type: array
useOpenKernelModules:
description: |-
Deprecated: This field is no longer honored by the gpu-operator. Please use KernelModuleType instead.
UseOpenKernelModules indicates if the open GPU kernel modules should be used
type: boolean
usePrecompiled:
description: UsePrecompiled indicates if deployment of NVIDIA Driver
using pre-compiled modules is enabled
type: boolean
x-kubernetes-validations:
- message: usePrecompiled is an immutable field. Please create a new
NvidiaDriver resource instead when you want to change this setting.
rule: self == oldSelf
version:
description: NVIDIA Driver version (or just branch for precompiled
drivers)
type: string
virtualTopologyConfig:
description: 'Optional: Virtual Topology Daemon configuration for
NVIDIA vGPU drivers'
properties:
name:
description: 'Optional: Config name representing virtual topology
daemon configuration file nvidia-topologyd.conf'
type: string
type: object
required:
- driverType
- image
type: object
status:
description: NVIDIADriverStatus defines the observed state of NVIDIADriver
properties:
conditions:
description: Conditions is a list of conditions representing the NVIDIADriver's
current state.
items:
description: Condition contains details for one aspect of the current
state of this API Resource.
properties:
lastTransitionTime:
description: |-
lastTransitionTime is the last time the condition transitioned from one status to another.
This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.
format: date-time
type: string
message:
description: |-
message is a human readable message indicating details about the transition.
This may be an empty string.
maxLength: 32768
type: string
observedGeneration:
description: |-
observedGeneration represents the .metadata.generation that the condition was set based upon.
For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date
with respect to the current state of the instance.
format: int64
minimum: 0
type: integer
reason:
description: |-
reason contains a programmatic identifier indicating the reason for the condition's last transition.
Producers of specific condition types may define expected values and meanings for this field,
and whether the values are considered a guaranteed API.
The value should be a CamelCase string.
This field may not be empty.
maxLength: 1024
minLength: 1
pattern: ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$
type: string
status:
description: status of the condition, one of True, False, Unknown.
enum:
- "True"
- "False"
- Unknown
type: string
type:
description: type of condition in CamelCase or in foo.example.com/CamelCase.
maxLength: 316
pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
type: string
required:
- lastTransitionTime
- message
- reason
- status
- type
type: object
type: array
namespace:
description: Namespace indicates a namespace in which the operator
and driver are installed
type: string
state:
description: |-
INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
Important: Run "make" to regenerate code after modifying this file
State indicates status of NVIDIADriver instance
enum:
- ignored
- ready
- notReady
type: string
required:
- state
type: object
type: object
served: true
storage: true
subresources:
status: {}

View File

@@ -0,0 +1,80 @@
{{/* vim: set filetype=mustache: */}}
{{/*
Expand the name of the chart.
*/}}
{{- define "gpu-operator.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "gpu-operator.fullname" -}}
{{- if .Values.fullnameOverride -}}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- $name := default .Chart.Name .Values.nameOverride -}}
{{- if contains $name .Release.Name -}}
{{- .Release.Name | trunc 63 | trimSuffix "-" -}}
{{- else -}}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{- end -}}
{{- end -}}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "gpu-operator.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" -}}
{{- end -}}
{{/*
Common labels
*/}}
{{- define "gpu-operator.labels" -}}
app.kubernetes.io/name: {{ include "gpu-operator.name" . }}
helm.sh/chart: {{ include "gpu-operator.chart" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- if .Values.operator.labels }}
{{ toYaml .Values.operator.labels }}
{{- end }}
{{- end -}}
{{- define "gpu-operator.operand-labels" -}}
helm.sh/chart: {{ include "gpu-operator.chart" . }}
app.kubernetes.io/managed-by: {{ include "gpu-operator.name" . }}
{{- if .Values.daemonsets.labels }}
{{ toYaml .Values.daemonsets.labels }}
{{- end }}
{{- end -}}
{{- define "gpu-operator.matchLabels" -}}
app.kubernetes.io/name: {{ include "gpu-operator.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end -}}
{{/*
Full image name with tag
*/}}
{{- define "gpu-operator.fullimage" -}}
{{- .Values.operator.repository -}}/{{- .Values.operator.image -}}:{{- .Values.operator.version | default .Chart.AppVersion -}}
{{- end }}
{{/*
Full image name with tag
*/}}
{{- define "driver-manager.fullimage" -}}
{{- .Values.driver.manager.repository -}}/{{- .Values.driver.manager.image -}}:{{- .Values.driver.manager.version -}}
{{- end }}

View File

@@ -0,0 +1,50 @@
{{- if .Values.operator.cleanupCRD }}
apiVersion: batch/v1
kind: Job
metadata:
name: gpu-operator-cleanup-crd
namespace: {{ .Release.Namespace }}
annotations:
"helm.sh/hook": pre-delete
"helm.sh/hook-weight": "1"
"helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation
labels:
{{- include "gpu-operator.labels" . | nindent 4 }}
app.kubernetes.io/component: "gpu-operator"
spec:
template:
metadata:
name: gpu-operator-cleanup-crd
labels:
{{- include "gpu-operator.labels" . | nindent 8 }}
app.kubernetes.io/component: "gpu-operator"
spec:
serviceAccountName: gpu-operator
{{- if .Values.operator.imagePullSecrets }}
imagePullSecrets:
{{- range .Values.operator.imagePullSecrets }}
- name: {{ . }}
{{- end }}
{{- end }}
{{- with .Values.operator.tolerations }}
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: cleanup-crd
image: {{ include "gpu-operator.fullimage" . }}
imagePullPolicy: {{ .Values.operator.imagePullPolicy }}
command:
- /bin/sh
- -c
- >
kubectl delete clusterpolicy cluster-policy;
kubectl delete crd clusterpolicies.nvidia.com;
kubectl delete crd nvidiadrivers.nvidia.com --ignore-not-found=true;
{{- if .Values.nfd.enabled -}}
kubectl delete crd nodefeatures.nfd.k8s-sigs.io --ignore-not-found=true;
kubectl delete crd nodefeaturegroups.nfd.k8s-sigs.io --ignore-not-found=true;
kubectl delete crd nodefeaturerules.nfd.k8s-sigs.io --ignore-not-found=true;
{{- end }}
restartPolicy: OnFailure
{{- end }}

View File

@@ -0,0 +1,680 @@
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
labels:
{{- include "gpu-operator.labels" . | nindent 4 }}
app.kubernetes.io/component: "gpu-operator"
{{- if .Values.operator.cleanupCRD }}
# CR cleanup is handled during pre-delete hook
# Add below annotation so that helm doesn't attempt to cleanup CR twice
annotations:
"helm.sh/resource-policy": keep
{{- end }}
spec:
hostPaths:
rootFS: {{ .Values.hostPaths.rootFS }}
driverInstallDir: {{ .Values.hostPaths.driverInstallDir }}
operator:
{{- if .Values.operator.runtimeClass }}
runtimeClass: {{ .Values.operator.runtimeClass }}
{{- end }}
{{- if .Values.operator.defaultGPUMode }}
defaultGPUMode: {{ .Values.operator.defaultGPUMode }}
{{- end }}
{{- if .Values.operator.initContainer }}
initContainer:
{{- if .Values.operator.initContainer.repository }}
repository: {{ .Values.operator.initContainer.repository }}
{{- end }}
{{- if .Values.operator.initContainer.image }}
image: {{ .Values.operator.initContainer.image }}
{{- end }}
{{- if .Values.operator.initContainer.version }}
version: {{ .Values.operator.initContainer.version | quote }}
{{- end }}
{{- if .Values.operator.initContainer.imagePullPolicy }}
imagePullPolicy: {{ .Values.operator.initContainer.imagePullPolicy }}
{{- end }}
{{- if .Values.operator.initContainer.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.operator.initContainer.imagePullSecrets | nindent 8 }}
{{- end }}
{{- end }}
{{- if .Values.operator.use_ocp_driver_toolkit }}
use_ocp_driver_toolkit: {{ .Values.operator.use_ocp_driver_toolkit }}
{{- end }}
daemonsets:
labels:
{{- include "gpu-operator.operand-labels" . | nindent 6 }}
{{- if .Values.daemonsets.annotations }}
annotations: {{ toYaml .Values.daemonsets.annotations | nindent 6 }}
{{- end }}
{{- if .Values.daemonsets.tolerations }}
tolerations: {{ toYaml .Values.daemonsets.tolerations | nindent 6 }}
{{- end }}
{{- if .Values.daemonsets.priorityClassName }}
priorityClassName: {{ .Values.daemonsets.priorityClassName }}
{{- end }}
{{- if .Values.daemonsets.updateStrategy }}
updateStrategy: {{ .Values.daemonsets.updateStrategy }}
{{- end }}
{{- if .Values.daemonsets.rollingUpdate }}
rollingUpdate:
maxUnavailable: {{ .Values.daemonsets.rollingUpdate.maxUnavailable | quote }}
{{- end }}
validator:
{{- if .Values.validator.repository }}
repository: {{ .Values.validator.repository }}
{{- end }}
{{- if .Values.validator.image }}
image: {{ .Values.validator.image }}
{{- end }}
version: {{ .Values.validator.version | default .Chart.AppVersion | quote }}
{{- if .Values.validator.imagePullPolicy }}
imagePullPolicy: {{ .Values.validator.imagePullPolicy }}
{{- end }}
{{- if .Values.validator.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.validator.imagePullSecrets | nindent 8 }}
{{- end }}
{{- if .Values.validator.resources }}
resources: {{ toYaml .Values.validator.resources | nindent 6 }}
{{- end }}
{{- if .Values.validator.env }}
env: {{ toYaml .Values.validator.env | nindent 6 }}
{{- end }}
{{- if .Values.validator.args }}
args: {{ toYaml .Values.validator.args | nindent 6 }}
{{- end }}
{{- if .Values.validator.plugin }}
plugin:
{{- if .Values.validator.plugin.env }}
env: {{ toYaml .Values.validator.plugin.env | nindent 8 }}
{{- end }}
{{- end }}
{{- if .Values.validator.cuda }}
cuda:
{{- if .Values.validator.cuda.env }}
env: {{ toYaml .Values.validator.cuda.env | nindent 8 }}
{{- end }}
{{- end }}
{{- if .Values.validator.driver }}
driver:
{{- if .Values.validator.driver.env }}
env: {{ toYaml .Values.validator.driver.env | nindent 8 }}
{{- end }}
{{- end }}
{{- if .Values.validator.toolkit }}
toolkit:
{{- if .Values.validator.toolkit.env }}
env: {{ toYaml .Values.validator.toolkit.env | nindent 8 }}
{{- end }}
{{- end }}
{{- if .Values.validator.vfioPCI }}
vfioPCI:
{{- if .Values.validator.vfioPCI.env }}
env: {{ toYaml .Values.validator.vfioPCI.env | nindent 8 }}
{{- end }}
{{- end }}
{{- if .Values.validator.vgpuManager }}
vgpuManager:
{{- if .Values.validator.vgpuManager.env }}
env: {{ toYaml .Values.validator.vgpuManager.env | nindent 8 }}
{{- end }}
{{- end }}
{{- if .Values.validator.vgpuDevices }}
vgpuDevices:
{{- if .Values.validator.vgpuDevices.env }}
env: {{ toYaml .Values.validator.vgpuDevices.env | nindent 8 }}
{{- end }}
{{- end }}
mig:
{{- if .Values.mig.strategy }}
strategy: {{ .Values.mig.strategy }}
{{- end }}
psa:
enabled: {{ .Values.psa.enabled }}
cdi:
enabled: {{ .Values.cdi.enabled }}
default: {{ .Values.cdi.default }}
driver:
enabled: {{ .Values.driver.enabled }}
useNvidiaDriverCRD: {{ .Values.driver.nvidiaDriverCRD.enabled }}
kernelModuleType: {{ .Values.driver.kernelModuleType }}
usePrecompiled: {{ .Values.driver.usePrecompiled }}
{{- if .Values.driver.repository }}
repository: {{ .Values.driver.repository }}
{{- end }}
{{- if .Values.driver.image }}
image: {{ .Values.driver.image }}
{{- end }}
{{- if .Values.driver.version }}
version: {{ .Values.driver.version | quote }}
{{- end }}
{{- if .Values.driver.imagePullPolicy }}
imagePullPolicy: {{ .Values.driver.imagePullPolicy }}
{{- end }}
{{- if .Values.driver.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.driver.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.driver.startupProbe }}
startupProbe: {{ toYaml .Values.driver.startupProbe | nindent 6 }}
{{- end }}
{{- if .Values.driver.livenessProbe }}
livenessProbe: {{ toYaml .Values.driver.livenessProbe | nindent 6 }}
{{- end }}
{{- if .Values.driver.readinessProbe }}
readinessProbe: {{ toYaml .Values.driver.readinessProbe | nindent 6 }}
{{- end }}
rdma:
enabled: {{ .Values.driver.rdma.enabled }}
useHostMofed: {{ .Values.driver.rdma.useHostMofed }}
manager:
{{- if .Values.driver.manager.repository }}
repository: {{ .Values.driver.manager.repository }}
{{- end }}
{{- if .Values.driver.manager.image }}
image: {{ .Values.driver.manager.image }}
{{- end }}
{{- if .Values.driver.manager.version }}
version: {{ .Values.driver.manager.version | quote }}
{{- end }}
{{- if .Values.driver.manager.imagePullPolicy }}
imagePullPolicy: {{ .Values.driver.manager.imagePullPolicy }}
{{- end }}
{{- if .Values.driver.manager.env }}
env: {{ toYaml .Values.driver.manager.env | nindent 8 }}
{{- end }}
{{- if .Values.driver.repoConfig }}
repoConfig: {{ toYaml .Values.driver.repoConfig | nindent 6 }}
{{- end }}
{{- if .Values.driver.certConfig }}
certConfig: {{ toYaml .Values.driver.certConfig | nindent 6 }}
{{- end }}
{{- if .Values.driver.licensingConfig }}
licensingConfig: {{ toYaml .Values.driver.licensingConfig | nindent 6 }}
{{- end }}
{{- if .Values.driver.virtualTopology }}
virtualTopology: {{ toYaml .Values.driver.virtualTopology | nindent 6 }}
{{- end }}
{{- if .Values.driver.kernelModuleConfig }}
kernelModuleConfig: {{ toYaml .Values.driver.kernelModuleConfig | nindent 6 }}
{{- end }}
{{- if .Values.driver.resources }}
resources: {{ toYaml .Values.driver.resources | nindent 6 }}
{{- end }}
{{- if .Values.driver.env }}
env: {{ toYaml .Values.driver.env | nindent 6 }}
{{- end }}
{{- if .Values.driver.args }}
args: {{ toYaml .Values.driver.args | nindent 6 }}
{{- end }}
{{- if .Values.driver.upgradePolicy }}
upgradePolicy:
autoUpgrade: {{ .Values.driver.upgradePolicy.autoUpgrade | default false }}
maxParallelUpgrades: {{ .Values.driver.upgradePolicy.maxParallelUpgrades | default 0 }}
maxUnavailable : {{ .Values.driver.upgradePolicy.maxUnavailable | default "25%" }}
waitForCompletion:
timeoutSeconds: {{ .Values.driver.upgradePolicy.waitForCompletion.timeoutSeconds }}
{{- if .Values.driver.upgradePolicy.waitForCompletion.podSelector }}
podSelector: {{ .Values.driver.upgradePolicy.waitForCompletion.podSelector }}
{{- end }}
podDeletion:
force: {{ .Values.driver.upgradePolicy.gpuPodDeletion.force | default false }}
timeoutSeconds: {{ .Values.driver.upgradePolicy.gpuPodDeletion.timeoutSeconds }}
deleteEmptyDir: {{ .Values.driver.upgradePolicy.gpuPodDeletion.deleteEmptyDir | default false }}
drain:
enable: {{ .Values.driver.upgradePolicy.drain.enable | default false }}
force: {{ .Values.driver.upgradePolicy.drain.force | default false }}
{{- if .Values.driver.upgradePolicy.drain.podSelector }}
podSelector: {{ .Values.driver.upgradePolicy.drain.podSelector }}
{{- end }}
timeoutSeconds: {{ .Values.driver.upgradePolicy.drain.timeoutSeconds }}
deleteEmptyDir: {{ .Values.driver.upgradePolicy.drain.deleteEmptyDir | default false}}
{{- end }}
vgpuManager:
enabled: {{ .Values.vgpuManager.enabled }}
{{- if .Values.vgpuManager.repository }}
repository: {{ .Values.vgpuManager.repository }}
{{- end }}
{{- if .Values.vgpuManager.image }}
image: {{ .Values.vgpuManager.image }}
{{- end }}
{{- if .Values.vgpuManager.version }}
version: {{ .Values.vgpuManager.version | quote }}
{{- end }}
{{- if .Values.vgpuManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.vgpuManager.imagePullPolicy }}
{{- end }}
{{- if .Values.vgpuManager.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.vgpuManager.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.vgpuManager.resources }}
resources: {{ toYaml .Values.vgpuManager.resources | nindent 6 }}
{{- end }}
{{- if .Values.vgpuManager.env }}
env: {{ toYaml .Values.vgpuManager.env | nindent 6 }}
{{- end }}
{{- if .Values.vgpuManager.args }}
args: {{ toYaml .Values.vgpuManager.args | nindent 6 }}
{{- end }}
driverManager:
{{- if .Values.vgpuManager.driverManager.repository }}
repository: {{ .Values.vgpuManager.driverManager.repository }}
{{- end }}
{{- if .Values.vgpuManager.driverManager.image }}
image: {{ .Values.vgpuManager.driverManager.image }}
{{- end }}
{{- if .Values.vgpuManager.driverManager.version }}
version: {{ .Values.vgpuManager.driverManager.version | quote }}
{{- end }}
{{- if .Values.vgpuManager.driverManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.vgpuManager.driverManager.imagePullPolicy }}
{{- end }}
{{- if .Values.vgpuManager.driverManager.env }}
env: {{ toYaml .Values.vgpuManager.driverManager.env | nindent 8 }}
{{- end }}
kataManager:
enabled: {{ .Values.kataManager.enabled }}
config: {{ toYaml .Values.kataManager.config | nindent 6 }}
{{- if .Values.kataManager.repository }}
repository: {{ .Values.kataManager.repository }}
{{- end }}
{{- if .Values.kataManager.image }}
image: {{ .Values.kataManager.image }}
{{- end }}
{{- if .Values.kataManager.version }}
version: {{ .Values.kataManager.version | quote }}
{{- end }}
{{- if .Values.kataManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.kataManager.imagePullPolicy }}
{{- end }}
{{- if .Values.kataManager.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.kataManager.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.kataManager.resources }}
resources: {{ toYaml .Values.kataManager.resources | nindent 6 }}
{{- end }}
{{- if .Values.kataManager.env }}
env: {{ toYaml .Values.kataManager.env | nindent 6 }}
{{- end }}
{{- if .Values.kataManager.args }}
args: {{ toYaml .Values.kataManager.args | nindent 6 }}
{{- end }}
vfioManager:
enabled: {{ .Values.vfioManager.enabled }}
{{- if .Values.vfioManager.repository }}
repository: {{ .Values.vfioManager.repository }}
{{- end }}
{{- if .Values.vfioManager.image }}
image: {{ .Values.vfioManager.image }}
{{- end }}
{{- if .Values.vfioManager.version }}
version: {{ .Values.vfioManager.version | quote }}
{{- end }}
{{- if .Values.vfioManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.vfioManager.imagePullPolicy }}
{{- end }}
{{- if .Values.vfioManager.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.vfioManager.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.vfioManager.resources }}
resources: {{ toYaml .Values.vfioManager.resources | nindent 6 }}
{{- end }}
{{- if .Values.vfioManager.env }}
env: {{ toYaml .Values.vfioManager.env | nindent 6 }}
{{- end }}
{{- if .Values.vfioManager.args }}
args: {{ toYaml .Values.vfioManager.args | nindent 6 }}
{{- end }}
driverManager:
{{- if .Values.vfioManager.driverManager.repository }}
repository: {{ .Values.vfioManager.driverManager.repository }}
{{- end }}
{{- if .Values.vfioManager.driverManager.image }}
image: {{ .Values.vfioManager.driverManager.image }}
{{- end }}
{{- if .Values.vfioManager.driverManager.version }}
version: {{ .Values.vfioManager.driverManager.version | quote }}
{{- end }}
{{- if .Values.vfioManager.driverManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.vfioManager.driverManager.imagePullPolicy }}
{{- end }}
{{- if .Values.vfioManager.driverManager.env }}
env: {{ toYaml .Values.vfioManager.driverManager.env | nindent 8 }}
{{- end }}
vgpuDeviceManager:
enabled: {{ .Values.vgpuDeviceManager.enabled }}
{{- if .Values.vgpuDeviceManager.repository }}
repository: {{ .Values.vgpuDeviceManager.repository }}
{{- end }}
{{- if .Values.vgpuDeviceManager.image }}
image: {{ .Values.vgpuDeviceManager.image }}
{{- end }}
{{- if .Values.vgpuDeviceManager.version }}
version: {{ .Values.vgpuDeviceManager.version | quote }}
{{- end }}
{{- if .Values.vgpuDeviceManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.vgpuDeviceManager.imagePullPolicy }}
{{- end }}
{{- if .Values.vgpuDeviceManager.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.vgpuDeviceManager.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.vgpuDeviceManager.resources }}
resources: {{ toYaml .Values.vgpuDeviceManager.resources | nindent 6 }}
{{- end }}
{{- if .Values.vgpuDeviceManager.env }}
env: {{ toYaml .Values.vgpuDeviceManager.env | nindent 6 }}
{{- end }}
{{- if .Values.vgpuDeviceManager.args }}
args: {{ toYaml .Values.vgpuDeviceManager.args | nindent 6 }}
{{- end }}
{{- if .Values.vgpuDeviceManager.config }}
config: {{ toYaml .Values.vgpuDeviceManager.config | nindent 6 }}
{{- end }}
ccManager:
enabled: {{ .Values.ccManager.enabled }}
defaultMode: {{ .Values.ccManager.defaultMode | quote }}
{{- if .Values.ccManager.repository }}
repository: {{ .Values.ccManager.repository }}
{{- end }}
{{- if .Values.ccManager.image }}
image: {{ .Values.ccManager.image }}
{{- end }}
{{- if .Values.ccManager.version }}
version: {{ .Values.ccManager.version | quote }}
{{- end }}
{{- if .Values.ccManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.ccManager.imagePullPolicy }}
{{- end }}
{{- if .Values.ccManager.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.ccManager.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.ccManager.resources }}
resources: {{ toYaml .Values.ccManager.resources | nindent 6 }}
{{- end }}
{{- if .Values.ccManager.env }}
env: {{ toYaml .Values.vfioManager.env | nindent 6 }}
{{- end }}
{{- if .Values.ccManager.args }}
args: {{ toYaml .Values.ccManager.args | nindent 6 }}
{{- end }}
toolkit:
enabled: {{ .Values.toolkit.enabled }}
{{- if .Values.toolkit.repository }}
repository: {{ .Values.toolkit.repository }}
{{- end }}
{{- if .Values.toolkit.image }}
image: {{ .Values.toolkit.image }}
{{- end }}
{{- if .Values.toolkit.version }}
version: {{ .Values.toolkit.version | quote }}
{{- end }}
{{- if .Values.toolkit.imagePullPolicy }}
imagePullPolicy: {{ .Values.toolkit.imagePullPolicy }}
{{- end }}
{{- if .Values.toolkit.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.toolkit.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.toolkit.resources }}
resources: {{ toYaml .Values.toolkit.resources | nindent 6 }}
{{- end }}
{{- if .Values.toolkit.env }}
env: {{ toYaml .Values.toolkit.env | nindent 6 }}
{{- end }}
{{- if .Values.toolkit.installDir }}
installDir: {{ .Values.toolkit.installDir }}
{{- end }}
devicePlugin:
enabled: {{ .Values.devicePlugin.enabled }}
{{- if .Values.devicePlugin.repository }}
repository: {{ .Values.devicePlugin.repository }}
{{- end }}
{{- if .Values.devicePlugin.image }}
image: {{ .Values.devicePlugin.image }}
{{- end }}
{{- if .Values.devicePlugin.version }}
version: {{ .Values.devicePlugin.version | quote }}
{{- end }}
{{- if .Values.devicePlugin.imagePullPolicy }}
imagePullPolicy: {{ .Values.devicePlugin.imagePullPolicy }}
{{- end }}
{{- if .Values.devicePlugin.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.devicePlugin.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.devicePlugin.resources }}
resources: {{ toYaml .Values.devicePlugin.resources | nindent 6 }}
{{- end }}
{{- if .Values.devicePlugin.env }}
env: {{ toYaml .Values.devicePlugin.env | nindent 6 }}
{{- end }}
{{- if .Values.devicePlugin.args }}
args: {{ toYaml .Values.devicePlugin.args | nindent 6 }}
{{- end }}
{{- if .Values.devicePlugin.config.name }}
config:
name: {{ .Values.devicePlugin.config.name }}
default: {{ .Values.devicePlugin.config.default }}
{{- end }}
dcgm:
enabled: {{ .Values.dcgm.enabled }}
{{- if .Values.dcgm.repository }}
repository: {{ .Values.dcgm.repository }}
{{- end }}
{{- if .Values.dcgm.image }}
image: {{ .Values.dcgm.image }}
{{- end }}
{{- if .Values.dcgm.version }}
version: {{ .Values.dcgm.version | quote }}
{{- end }}
{{- if .Values.dcgm.imagePullPolicy }}
imagePullPolicy: {{ .Values.dcgm.imagePullPolicy }}
{{- end }}
{{- if .Values.dcgm.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.dcgm.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.dcgm.resources }}
resources: {{ toYaml .Values.dcgm.resources | nindent 6 }}
{{- end }}
{{- if .Values.dcgm.env }}
env: {{ toYaml .Values.dcgm.env | nindent 6 }}
{{- end }}
{{- if .Values.dcgm.args }}
args: {{ toYaml .Values.dcgm.args | nindent 6 }}
{{- end }}
dcgmExporter:
enabled: {{ .Values.dcgmExporter.enabled }}
{{- if .Values.dcgmExporter.repository }}
repository: {{ .Values.dcgmExporter.repository }}
{{- end }}
{{- if .Values.dcgmExporter.image }}
image: {{ .Values.dcgmExporter.image }}
{{- end }}
{{- if .Values.dcgmExporter.version }}
version: {{ .Values.dcgmExporter.version | quote }}
{{- end }}
{{- if .Values.dcgmExporter.imagePullPolicy }}
imagePullPolicy: {{ .Values.dcgmExporter.imagePullPolicy }}
{{- end }}
{{- if .Values.dcgmExporter.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.dcgmExporter.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.dcgmExporter.resources }}
resources: {{ toYaml .Values.dcgmExporter.resources | nindent 6 }}
{{- end }}
{{- if .Values.dcgmExporter.env }}
env: {{ toYaml .Values.dcgmExporter.env | nindent 6 }}
{{- end }}
{{- if .Values.dcgmExporter.args }}
args: {{ toYaml .Values.dcgmExporter.args | nindent 6 }}
{{- end }}
{{- if and (.Values.dcgmExporter.config) (.Values.dcgmExporter.config.name) }}
config:
name: {{ .Values.dcgmExporter.config.name }}
{{- end }}
{{- if .Values.dcgmExporter.serviceMonitor }}
serviceMonitor: {{ toYaml .Values.dcgmExporter.serviceMonitor | nindent 6 }}
{{- end }}
gfd:
enabled: {{ .Values.gfd.enabled }}
{{- if .Values.gfd.repository }}
repository: {{ .Values.gfd.repository }}
{{- end }}
{{- if .Values.gfd.image }}
image: {{ .Values.gfd.image }}
{{- end }}
{{- if .Values.gfd.version }}
version: {{ .Values.gfd.version | quote }}
{{- end }}
{{- if .Values.gfd.imagePullPolicy }}
imagePullPolicy: {{ .Values.gfd.imagePullPolicy }}
{{- end }}
{{- if .Values.gfd.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.gfd.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.gfd.resources }}
resources: {{ toYaml .Values.gfd.resources | nindent 6 }}
{{- end }}
{{- if .Values.gfd.env }}
env: {{ toYaml .Values.gfd.env | nindent 6 }}
{{- end }}
{{- if .Values.gfd.args }}
args: {{ toYaml .Values.gfd.args | nindent 6 }}
{{- end }}
migManager:
enabled: {{ .Values.migManager.enabled }}
{{- if .Values.migManager.repository }}
repository: {{ .Values.migManager.repository }}
{{- end }}
{{- if .Values.migManager.image }}
image: {{ .Values.migManager.image }}
{{- end }}
{{- if .Values.migManager.version }}
version: {{ .Values.migManager.version | quote }}
{{- end }}
{{- if .Values.migManager.imagePullPolicy }}
imagePullPolicy: {{ .Values.migManager.imagePullPolicy }}
{{- end }}
{{- if .Values.migManager.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.migManager.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.migManager.resources }}
resources: {{ toYaml .Values.migManager.resources | nindent 6 }}
{{- end }}
{{- if .Values.migManager.env }}
env: {{ toYaml .Values.migManager.env | nindent 6 }}
{{- end }}
{{- if .Values.migManager.args }}
args: {{ toYaml .Values.migManager.args | nindent 6 }}
{{- end }}
{{- if .Values.migManager.config }}
config:
name: {{ .Values.migManager.config.name }}
default: {{ .Values.migManager.config.default }}
{{- end }}
{{- if .Values.migManager.gpuClientsConfig }}
gpuClientsConfig: {{ toYaml .Values.migManager.gpuClientsConfig | nindent 6 }}
{{- end }}
nodeStatusExporter:
enabled: {{ .Values.nodeStatusExporter.enabled }}
{{- if .Values.nodeStatusExporter.repository }}
repository: {{ .Values.nodeStatusExporter.repository }}
{{- end }}
{{- if .Values.nodeStatusExporter.image }}
image: {{ .Values.nodeStatusExporter.image }}
{{- end }}
version: {{ .Values.nodeStatusExporter.version | default .Chart.AppVersion | quote }}
{{- if .Values.nodeStatusExporter.imagePullPolicy }}
imagePullPolicy: {{ .Values.nodeStatusExporter.imagePullPolicy }}
{{- end }}
{{- if .Values.nodeStatusExporter.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.nodeStatusExporter.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.nodeStatusExporter.resources }}
resources: {{ toYaml .Values.nodeStatusExporter.resources | nindent 6 }}
{{- end }}
{{- if .Values.nodeStatusExporter.env }}
env: {{ toYaml .Values.nodeStatusExporter.env | nindent 6 }}
{{- end }}
{{- if .Values.nodeStatusExporter.args }}
args: {{ toYaml .Values.nodeStatusExporter.args | nindent 6 }}
{{- end }}
{{- if .Values.gds.enabled }}
gds:
enabled: {{ .Values.gds.enabled }}
{{- if .Values.gds.repository }}
repository: {{ .Values.gds.repository }}
{{- end }}
{{- if .Values.gds.image }}
image: {{ .Values.gds.image }}
{{- end }}
version: {{ .Values.gds.version | quote }}
{{- if .Values.gds.imagePullPolicy }}
imagePullPolicy: {{ .Values.gds.imagePullPolicy }}
{{- end }}
{{- if .Values.gds.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.gds.imagePullSecrets | nindent 8 }}
{{- end }}
{{- if .Values.gds.env }}
env: {{ toYaml .Values.gds.env | nindent 6 }}
{{- end }}
{{- if .Values.gds.args }}
args: {{ toYaml .Values.gds.args | nindent 6 }}
{{- end }}
{{- end }}
{{- if .Values.gdrcopy }}
gdrcopy:
enabled: {{ .Values.gdrcopy.enabled | default false }}
{{- if .Values.gdrcopy.repository }}
repository: {{ .Values.gdrcopy.repository }}
{{- end }}
{{- if .Values.gdrcopy.image }}
image: {{ .Values.gdrcopy.image }}
{{- end }}
version: {{ .Values.gdrcopy.version | quote }}
{{- if .Values.gdrcopy.imagePullPolicy }}
imagePullPolicy: {{ .Values.gdrcopy.imagePullPolicy }}
{{- end }}
{{- if .Values.gdrcopy.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.gdrcopy.imagePullSecrets | nindent 8 }}
{{- end }}
{{- if .Values.gdrcopy.env }}
env: {{ toYaml .Values.gdrcopy.env | nindent 6 }}
{{- end }}
{{- if .Values.gdrcopy.args }}
args: {{ toYaml .Values.gdrcopy.args | nindent 6 }}
{{- end }}
{{- end }}
sandboxWorkloads:
enabled: {{ .Values.sandboxWorkloads.enabled }}
{{- if .Values.sandboxWorkloads.defaultWorkload }}
defaultWorkload: {{ .Values.sandboxWorkloads.defaultWorkload }}
{{- end }}
sandboxDevicePlugin:
{{- if .Values.sandboxDevicePlugin.enabled }}
enabled: {{ .Values.sandboxDevicePlugin.enabled }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.repository }}
repository: {{ .Values.sandboxDevicePlugin.repository }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.image }}
image: {{ .Values.sandboxDevicePlugin.image }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.version }}
version: {{ .Values.sandboxDevicePlugin.version | quote }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.imagePullPolicy }}
imagePullPolicy: {{ .Values.sandboxDevicePlugin.imagePullPolicy }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.imagePullSecrets }}
imagePullSecrets: {{ toYaml .Values.sandboxDevicePlugin.imagePullSecrets | nindent 6 }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.resources }}
resources: {{ toYaml .Values.sandboxDevicePlugin.resources | nindent 6 }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.env }}
env: {{ toYaml .Values.sandboxDevicePlugin.env | nindent 6 }}
{{- end }}
{{- if .Values.sandboxDevicePlugin.args }}
args: {{ toYaml .Values.sandboxDevicePlugin.args | nindent 6 }}
{{- end }}

View File

@@ -0,0 +1,155 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gpu-operator
labels:
{{- include "gpu-operator.labels" . | nindent 4 }}
app.kubernetes.io/component: "gpu-operator"
rules:
- apiGroups:
- config.openshift.io
resources:
- clusterversions
- proxies
verbs:
- get
- list
- watch
- apiGroups:
- image.openshift.io
resources:
- imagestreams
verbs:
- get
- list
- watch
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
- use
- apiGroups:
- rbac.authorization.k8s.io
resources:
- clusterroles
- clusterrolebindings
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- update
- patch
- apiGroups:
- ""
resources:
- namespaces
verbs:
- get
- list
- watch
- update
- patch
- apiGroups:
- ""
resources:
- events
verbs:
- create
- get
- list
- watch
- delete
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods/eviction
verbs:
- create
- apiGroups:
- apps
resources:
- daemonsets
verbs:
- get
- list
- watch
- apiGroups:
- nvidia.com
resources:
- clusterpolicies
- clusterpolicies/finalizers
- clusterpolicies/status
- nvidiadrivers
- nvidiadrivers/finalizers
- nvidiadrivers/status
verbs:
- create
- get
- list
- watch
- update
- patch
- delete
- deletecollection
- apiGroups:
- scheduling.k8s.io
resources:
- priorityclasses
verbs:
- get
- list
- watch
- create
- apiGroups:
- node.k8s.io
resources:
- runtimeclasses
verbs:
- get
- list
- create
- update
- watch
- delete
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- get
- list
- watch
- update
- patch
- create
{{- if .Values.operator.cleanupCRD }}
- delete
{{- end }}

View File

@@ -0,0 +1,15 @@
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: gpu-operator
labels:
{{- include "gpu-operator.labels" . | nindent 4 }}
app.kubernetes.io/component: "gpu-operator"
subjects:
- kind: ServiceAccount
name: gpu-operator
namespace: {{ $.Release.Namespace }}
roleRef:
kind: ClusterRole
name: gpu-operator
apiGroup: rbac.authorization.k8s.io

View File

@@ -0,0 +1,14 @@
{{- if .Values.dcgmExporter.config }}
{{- if and (.Values.dcgmExporter.config.create) (not (empty .Values.dcgmExporter.config.data)) }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Values.dcgmExporter.config.name }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "gpu-operator.labels" . | nindent 4 }}
data:
dcgm-metrics.csv: |
{{- .Values.dcgmExporter.config.data | nindent 4 }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,10 @@
{{- if and (.Values.migManager.config.create) (not (empty .Values.migManager.config.data)) }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Values.migManager.config.name }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "gpu-operator.labels" . | nindent 4 }}
data: {{ toYaml .Values.migManager.config.data | nindent 2 }}
{{- end }}

View File

@@ -0,0 +1,107 @@
{{- if .Values.nfd.nodefeaturerules }}
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: nvidia-nfd-nodefeaturerules
spec:
rules:
- name: "TDX rule"
labels:
tdx.enabled: "true"
matchFeatures:
- feature: cpu.security
matchExpressions:
tdx.enabled: {op: IsTrue}
- name: "TDX total keys rule"
extendedResources:
tdx.total_keys: "@cpu.security.tdx.total_keys"
matchFeatures:
- feature: cpu.security
matchExpressions:
tdx.enabled: {op: IsTrue}
- name: "SEV-SNP rule"
labels:
sev.snp.enabled: "true"
matchFeatures:
- feature: cpu.security
matchExpressions:
sev.snp.enabled:
op: IsTrue
- name: "SEV-ES rule"
labels:
sev.es.enabled: "true"
matchFeatures:
- feature: cpu.security
matchExpressions:
sev.es.enabled:
op: IsTrue
- name: SEV system capacities
extendedResources:
sev_asids: '@cpu.security.sev.asids'
sev_es: '@cpu.security.sev.encrypted_state_ids'
matchFeatures:
- feature: cpu.security
matchExpressions:
sev.enabled:
op: Exists
- name: "NVIDIA H100"
labels:
"nvidia.com/gpu.H100": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2339"]}
- name: "NVIDIA H100 PCIe"
labels:
"nvidia.com/gpu.H100.pcie": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2331"]}
- name: "NVIDIA H100 80GB HBM3"
labels:
"nvidia.com/gpu.H100.HBM3": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2330"]}
- name: "NVIDIA H800"
labels:
"nvidia.com/gpu.H800": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2324"]}
- name: "NVIDIA H800 PCIE"
labels:
"nvidia.com/gpu.H800.pcie": "true"
"nvidia.com/gpu.family": "hopper"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["10de"]}
device: {op: In, value: ["2322"]}
- name: "NVIDIA CC Enabled"
labels:
"nvidia.com/cc.capable": "true"
matchAny: # TDX/SEV + Hopper GPU
- matchFeatures:
- feature: rule.matched
matchExpressions:
nvidia.com/gpu.family: {op: In, value: ["hopper"]}
sev.snp.enabled: {op: IsTrue}
- matchFeatures:
- feature: rule.matched
matchExpressions:
nvidia.com/gpu.family: {op: In, value: ["hopper"]}
tdx.enabled: {op: IsTrue}
{{- end }}

Some files were not shown because too many files have changed in this diff Show More