mirror of
				https://github.com/optim-enterprises-bv/kubernetes.git
				synced 2025-11-03 19:58:17 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			160 lines
		
	
	
		
			5.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			160 lines
		
	
	
		
			5.4 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Job Controller
 | 
						|
 | 
						|
## Abstract
 | 
						|
 | 
						|
A proposal for implementing a new controller - Job controller - which will be responsible
 | 
						|
for managing pod(s) that require running once to completion even if the machine
 | 
						|
the pod is running on fails, in contrast to what ReplicationController currently offers.
 | 
						|
 | 
						|
Several existing issues and PRs were already created regarding that particular subject:
 | 
						|
* Job Controller [#1624](https://github.com/kubernetes/kubernetes/issues/1624)
 | 
						|
* New Job resource [#7380](https://github.com/kubernetes/kubernetes/pull/7380)
 | 
						|
 | 
						|
 | 
						|
## Use Cases
 | 
						|
 | 
						|
1. Be able to start one or several pods tracked as a single entity.
 | 
						|
1. Be able to run batch-oriented workloads on Kubernetes.
 | 
						|
1. Be able to get the job status.
 | 
						|
1. Be able to specify the number of instances performing a job at any one time.
 | 
						|
1. Be able to specify the number of successfully finished instances required to finish a job.
 | 
						|
 | 
						|
 | 
						|
## Motivation
 | 
						|
 | 
						|
Jobs are needed for executing multi-pod computation to completion; a good example
 | 
						|
here would be the ability to implement any type of batch oriented tasks.
 | 
						|
 | 
						|
 | 
						|
## Implementation
 | 
						|
 | 
						|
Job controller is similar to replication controller in that they manage pods.
 | 
						|
This implies they will follow the same controller framework that replication
 | 
						|
controllers already defined.  The biggest difference between a `Job` and a
 | 
						|
`ReplicationController` object is the purpose; `ReplicationController`
 | 
						|
ensures that a specified number of Pods are running at any one time, whereas
 | 
						|
`Job` is responsible for keeping the desired number of Pods to a completion of
 | 
						|
a task.  This difference will be represented by the `RestartPolicy` which is
 | 
						|
required to always take value of `RestartPolicyNever` or `RestartOnFailure`.
 | 
						|
 | 
						|
 | 
						|
The new `Job` object will have the following content:
 | 
						|
 | 
						|
```go
 | 
						|
// Job represents the configuration of a single job.
 | 
						|
type Job struct {
 | 
						|
    TypeMeta
 | 
						|
    ObjectMeta
 | 
						|
 | 
						|
    // Spec is a structure defining the expected behavior of a job.
 | 
						|
    Spec JobSpec
 | 
						|
 | 
						|
    // Status is a structure describing current status of a job.
 | 
						|
    Status JobStatus
 | 
						|
}
 | 
						|
 | 
						|
// JobList is a collection of jobs.
 | 
						|
type JobList struct {
 | 
						|
    TypeMeta
 | 
						|
    ListMeta
 | 
						|
 | 
						|
    Items []Job
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
`JobSpec` structure is defined to contain all the information how the actual job execution
 | 
						|
will look like.
 | 
						|
 | 
						|
```go
 | 
						|
// JobSpec describes how the job execution will look like.
 | 
						|
type JobSpec struct {
 | 
						|
 | 
						|
    // Parallelism specifies the maximum desired number of pods the job should
 | 
						|
    // run at any given time. The actual number of pods running in steady state will
 | 
						|
    // be less than this number when ((.spec.completions - .status.successful) < .spec.parallelism),
 | 
						|
    // i.e. when the work left to do is less than max parallelism.
 | 
						|
    Parallelism *int
 | 
						|
 | 
						|
    // Completions specifies the desired number of successfully finished pods the
 | 
						|
    // job should be run with. Defaults to 1.
 | 
						|
    Completions *int
 | 
						|
 | 
						|
    // Selector is a label query over pods running a job.
 | 
						|
    Selector map[string]string
 | 
						|
 | 
						|
    // Template is the object that describes the pod that will be created when
 | 
						|
    // executing a job.
 | 
						|
    Template *PodTemplateSpec
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
`JobStatus` structure is defined to contain information about pods executing
 | 
						|
specified job.  The structure holds information about pods currently executing
 | 
						|
the job.
 | 
						|
 | 
						|
```go
 | 
						|
// JobStatus represents the current state of a Job.
 | 
						|
type JobStatus struct {
 | 
						|
    Conditions []JobCondition
 | 
						|
 | 
						|
    // CreationTime represents time when the job was created
 | 
						|
    CreationTime unversioned.Time
 | 
						|
 | 
						|
    // StartTime represents time when the job was started
 | 
						|
    StartTime unversioned.Time
 | 
						|
 | 
						|
    // CompletionTime represents time when the job was completed
 | 
						|
    CompletionTime unversioned.Time
 | 
						|
 | 
						|
    // Active is the number of actively running pods.
 | 
						|
    Active int
 | 
						|
 | 
						|
    // Successful is the number of pods successfully completed their job.
 | 
						|
    Successful int
 | 
						|
 | 
						|
    // Unsuccessful is the number of pods failures, this applies only to jobs
 | 
						|
    // created with RestartPolicyNever, otherwise this value will always be 0.
 | 
						|
    Unsuccessful int
 | 
						|
}
 | 
						|
 | 
						|
type JobConditionType string
 | 
						|
 | 
						|
// These are valid conditions of a job.
 | 
						|
const (
 | 
						|
    // JobComplete means the job has completed its execution.
 | 
						|
    JobComplete JobConditionType = "Complete"
 | 
						|
)
 | 
						|
 | 
						|
// JobCondition describes current state of a job.
 | 
						|
type JobCondition struct {
 | 
						|
    Type               JobConditionType
 | 
						|
    Status             ConditionStatus
 | 
						|
    LastHeartbeatTime  unversioned.Time
 | 
						|
    LastTransitionTime unversioned.Time
 | 
						|
    Reason             string
 | 
						|
    Message            string
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
## Events
 | 
						|
 | 
						|
Job controller will be emitting the following events:
 | 
						|
* JobStart
 | 
						|
* JobFinish
 | 
						|
 | 
						|
## Future evolution
 | 
						|
 | 
						|
Below are the possible future extensions to the Job controller:
 | 
						|
* Be able to limit the execution time for a job, similarly to ActiveDeadlineSeconds for Pods. *now implemented*
 | 
						|
* Be able to create a chain of jobs dependent one on another. *will be implemented in a separate type called Workflow*
 | 
						|
* Be able to specify the work each of the workers should execute (see type 1 from
 | 
						|
  [this comment](https://github.com/kubernetes/kubernetes/issues/1624#issuecomment-97622142))
 | 
						|
* Be able to inspect Pods running a Job, especially after a Job has finished, e.g.
 | 
						|
  by providing pointers to Pods in the JobStatus ([see comment](https://github.com/kubernetes/kubernetes/pull/11746/files#r37142628)).
 | 
						|
* help users avoid non-unique label selectors ([see this proposal](../../docs/design/selector-generation.md))
 | 
						|
 | 
						|
 | 
						|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 | 
						|
[]()
 | 
						|
<!-- END MUNGE: GENERATED_ANALYTICS -->
 |