How to Adjust Pod Resources for Suspended Kubernetes Jobs (v1.36+)

By

Introduction

In Kubernetes v1.36, a new beta feature allows you to modify CPU, memory, GPU, and extended resource requests and limits on a suspended Job. This is a game-changer for batch and machine learning workloads where resource requirements often depend on real-time cluster capacity and queue priorities. Previously, you'd have to delete and recreate a Job to change its resource spec, losing metadata and history. Now, you can adjust resources while the Job is paused and then resume it — without starting from scratch.

How to Adjust Pod Resources for Suspended Kubernetes Jobs (v1.36+)

This step-by-step guide will walk you through using this feature manually or with a queue controller like Kueue.

What You Need

  • A Kubernetes cluster running version v1.36 or later (the feature gate is enabled by default)
  • kubectl installed and configured to access your cluster
  • A suspended Job (or create one following Step 1)
  • Basic familiarity with Kubernetes Jobs and resource management

Step-by-Step Instructions

Step 1: Create or Identify a Suspended Job

If you don’t already have a suspended Job, create one that requests specific resources. The key is to set spec.suspend: true in the Job manifest. Below is an example of a machine learning training Job asking for 4 GPUs, 8 CPUs, and 32 GiB of memory:

apiVersion: batch/v1
kind: Job
metadata:
  name: training-job-example-abcd123
spec:
  suspend: true
  template:
    spec:
      containers:
      - name: trainer
        image: example-registry.example.com/training:2026-04-23T150405.678
        resources:
          requests:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
          limits:
            cpu: "8"
            memory: "32Gi"
            example-hardware-vendor.com/gpu: "4"
      restartPolicy: Never

Apply this manifest with kubectl apply -f job.yaml.

Step 2: Confirm the Job Is Suspended

Run the following command to verify that the Job is in a suspended state:

kubectl get job training-job-example-abcd123 -o jsonpath='{.spec.suspend}'

It should output true. You can also list all Jobs with kubectl get jobs and look for a Status of 0/1 completed tasks.

Step 3: Modify the Resource Requests and Limits

While the Job is suspended, you can change its pod template resource fields. For example, if the cluster only has 2 GPUs available, adjust the requests and limits accordingly. Use kubectl patch, kubectl edit, or a direct update via API. Here’s how to patch the resource fields:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"trainer","resources":{"requests":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"},"limits":{"cpu":"4","memory":"16Gi","example-hardware-vendor.com/gpu":"2"}}}]}}}}'

This updates the Job’s pod template. Because the Job is suspended, this modification is allowed (the immutability constraint is relaxed).

Step 4: Verify the Changes

Check that the resources have been updated correctly:

kubectl get job training-job-example-abcd123 -o yaml

Look under spec.template.spec.containers[0].resources — they should now show the adjusted values. No new Pods are created yet because the Job is still suspended.

Step 5: Resume the Job

Once you’re satisfied with the resource settings, unsuspend the Job by setting spec.suspend to false:

kubectl patch job training-job-example-abcd123 --type='merge' -p='{"spec":{"suspend":false}}'

Kubernetes will now launch the Pods using the updated resource specifications. You can monitor progress with:

kubectl get pods -l job-name=training-job-example-abcd123

Step 6: Confirm Pod Resources

After the Job resumes, inspect one of the running Pods to ensure the resources are applied:

kubectl get pod  -o jsonpath='{.spec.containers[0].resources}'

The output should match the new values you set in Step 3. If everything looks good, you’ve successfully adjusted resources for a suspended Job.

Tips and Best Practices

  • Use a Queue Controller: For automatic resource tuning based on cluster load, integrate this feature with controllers like Kueue. They can dynamically adjust resources without requiring manual intervention.
  • Pay attention to Limits vs Requests: When reducing resources, adjust both requests and limits to match. Kubernetes will enforce limits, so mismatched values could cause unexpected behavior.
  • Extended Resources: This feature works with any resource type, including extended resources (e.g., nvidia.com/gpu). Just update the corresponding field in the patch.
  • Version Compatibility: The feature is beta in v1.36 and enabled by default. If you’re on an earlier alpha release (v1.35), you may need to enable the MutablePodResourcesForSuspendedJobs feature gate manually.
  • Avoid Frequent Changes: While you can update resources multiple times while the Job is suspended, keep changes minimal to reduce the risk of misconfiguration.
  • Backup Original Spec: Before patching, save a copy of the original Job manifest (kubectl get job ... -o yaml > original.yaml) so you can revert if needed.

By following these steps, you can flexibly adjust resource allocations for batch and ML jobs without losing metadata, history, or requiring deletions. This feature streamlines workload scheduling in dynamic cluster environments.

Tags:

Related Articles

Recommended

Discover More

The Dissolution of Purdue Pharma: A Step-by-Step Guide to Company Transformation through Legal SettlementUnderstanding Transistor Matching: Why and HowHow to Automate Agent Performance Analysis with GitHub Copilot: A Step-by-Step GuideRevolutionizing Community Search: How Facebook Groups Now Deliver Smarter AnswersGreen Tea: Go's New Experimental Garbage Collector Explained