Batch job workloads
IBM Cloud® Code Engine is a fully managed, serverless platform that runs your containerized workloads, including batch job or application workloads. Learn about running batch jobs in IBM Cloud® Code Engine.
What are batch job workloads?
A job in Code Engine runs one or more instances of your executable code. Unlike applications, which handle HTTP requests, jobs are designed to run one time and exit, such that the resources to run the job workload are freed up.
You can scale batch jobs by defining multiple instances. Workloads can be split into parallel tasks to reduce compute time. You can set the job to run with automatic retry to address failing workloads within a job instance. You can trigger batch jobs manually, programmatically, and by events, such as Object Storage events.
When you create a job, you can specify the workload configuration information that is used each time that the job is run.
Typical batch job workloads include,
- machine model training
- analyzing files, such as voice analysis or image recognition
- compressing or decompressing files
- archiving information
When you create a job, you can specify workload configuration information that is used each time that the job is run.
What is the lifecycle of a batch job?
When you submit a batch job, it runs to completion. Typically, batch jobs retrieve input data, do computational work, and store the results in persistent data stores. When the batch job is completed, resources that are used to run the job are removed and no cost is incurred for any stand-by resources.
How do jobs compare to apps and functions?
Characteristic | App | Job | Function |
---|---|---|---|
Execution time (duration) | Long-running (10 minutes per request) | Long-running (up to 24 hours) | Short-running (2 minutes or less) |
Startup latency | Medium | Scheduled start | Low |
Termination | Run-continuously | Run-to-completion | Run-to-completion |
Invocation | On request or permanently running | Scheduled | On request, instant |
Programming Model | Container-based build and execution | Container-based build and execution | Language-specific source code files and dependency metadata |
Parallelism | Parallel execution, flexible | Low to medium parallel execution | High parallel execution |
Scale-out | Based on number of requests | Based on job workload definition | Based on events or direct invocations |
Optimized for | Long running, highly complex workload and on-demand scale-out | Scheduled or planned workloads with high resource demands | Startup time and rapid scale-out |
For more information, see Planning for Code Engine.
What are the key features of working with Code Engine batch jobs?
Review the following topics to learn more about working with batch jobs in Code Engine.
- Isolation
- Logging
- Queuing
- Retries
- Running batch jobs
- Scaling
- Status
- Submitting similar batch jobs
- Triggering batch jobs with events
Isolation
Code Engine is a multi-tenant, regional service where tenants share the same network and compute infrastructure. In particular, the network and compute infrastructure are shared resources and some management components are common to all tenants. However, tenants and their workloads are isolated from each other by using Code Engine projects. Code Engine prevents communication between projects, providing isolation to your applications inside a multi-tenant environment. In addition, there are access controls that are performed on a resource level to allow only authorized users to perform certain operations on project resources, such as applications or other Code Engine workloads.
For more information about workload isolation, see Code Engine workload isolation.
Logging
When you work with Code Engine jobs in the console with logging enabled, logs are forwarded to an IBM Cloud Logs service instance that is associated with your Code Engine project. In the IBM Cloud Logs service instance, the logs are indexed, enabling full-text search through all generated messages and convenient querying based on specific fields. See Getting logs for jobs.
System event information can also be helpful to troubleshoot problems when you run jobs. You can view system event information with the CLI. See Getting system event information for jobs.
For more information, see Viewing logs.
Queuing
Submitted batch jobs are automatically queued and are in a pending state until dispatched. Batch jobs are run based on the available compute resources as defined by your Code Engine project. Batch jobs that are in a pending
status
can be removed from the queue. You can monitor batch jobs that are pending, running, or completed by using the Code Engine console or command-line interface.
Retries
Each job instance runs to completion. However, the workload code might encounter an error during the job run. When a job instance completes with a nonzero return code, Code Engine restarts the job instance. With Code Engine, you can specify to limit the number of retries to avoid restarting failed job instances.
Running batch jobs
Whether your code exists as source in a local file or in a Git repository, or your code is a container image that exists in a public or private registry, Code Engine provides a streamlined way for you to run your code as a job.
You can create and run batch jobs in Code Engine in the following ways:
- Run an existing container image. Create a job and provide a reference to your image to use when you submit the job. For an example, see Create and run a job.
- Start with source code. If you are starting with source code that is located in a Git repository or on your local workstation, you can point to the location of your source and Code Engine takes care of building the image for you. See Create a job from repository source code and Create a job from local source code.
For more information, see Working with jobs.
Scaling
A job in Code Engine (batch job) consists of one or more job instances. While job instances run independent of each other, they run the same code. Suppose you have a database with 100 records to analyze. You can run your job such that each
job instance analyzes 10 records each. For example, the first job instance analyzes records 0 - 9, the second job instance can analyze records 10 - 19, and so on. In this example, each job instance receives the system provided job input
parameter JOB_INDEX
as an environment variable that you can use to calculate the range of database records to analyze by each job instance. The number of job instances is specified when the batch job is submitted.
When you run a batch job, during the startup time, job instances might linger in a pending state as the instances are being started. Pending times for job instances can vary due to the system and network infrastructure, as the system adjusts to fulfill the resource demands related to this run of the job. If the system infrastructure is responding to a high usage demand, the result might be a multi-minute delay of the startup for jobs because job instances are provisioned just-in-time.
Status
A batch job is in the Succeeded
status when all job run instances are completed.
A batch job is in Failed
status after one or more job run instances reaches the retry limit. Also, if a job takes too long to complete, the job is in Failed
status whenever the maximum execution time is reached. For
more information, see job status.
Submitting similar batch jobs
When you create a job, you can specify workload configuration information that is used each time that the job is run. When you use a common set of configuration information for batch jobs, with Code Engine, you can specify additional parameters with the job submission to overwrite the batch job parameters for this specific job submission.
Triggering batch jobs with events
Batch jobs can be submitted automatically based on events, such as periodic timers, Object Storage events, or Kafka topics.
Suppose you want to trigger your batch job automatically based on a subscription to an IBM Cloud Object Storage bucket that generates events whenever a file changes or is added to the Object Storage bucket. Review the Code Engine cos-event-job
sample for information about how to create Object Storage events and use the events to trigger batch jobs.
For more information about using eventing with Code Engine jobs, see the following topics: Working with the Periodic timer (cron) event producer, Working with the IBM Cloud Object Storage event producer, and Working with the Kafka event producer.
Looking for more code examples? Check out the Samples for IBM Cloud Code Engine GitHub repo.
How can I get started with batch jobs?
To create and run a simple Code Engine batch job application with the icr.io/codeengine/firstjob
sample image, see Running your first Code Engine job.
Also, you can try a batch job tutorial, see Running and updating jobs.
To dive deeper into working with batch jobs, see Working with jobs and job runs.