Batch Jobs
You can run “batch jobs” (essentially ad-hoc jobs running on managed compute) by using
the command line ark-batch-tool
. This takes in a configuration file that defines
your job. For example:
---
config:
default:
name: "Generate Pose Amendment"
type: "Batch Job Result"
container_image: "095412845506.dkr.ecr.us-east-1.amazonaws.com/djt-mapping-pipeline-ecr:latest"
cpu_count: 2
memory_size_mb: 8192
commands:
- command: ["https://ark-logs-rhq.s3.amazonaws.com/manifests/b0ab3324-b40e-486e-9620-3de6eb3f729f"]
- command: ["https://ark-logs-rhq.s3.amazonaws.com/manifests/d6437eac-ce13-44d9-b95a-536ca4bcbce2"]
This configuration contains a few bits:
name
- The name of the batch jobtype
- This is the artifact type, which must be registered on your instance of the catalogcontainer_image
- A reference to your container image that you wish to executecpu_count
- Count of cores to give your job.memory_size_mb
- Maximum amount of memory to give your job.commands
- The list of commands that you wish to pass to your container.
Each command will be run in a separate instance. In this example, we will start two batch jobs, each getting their own ‘result artifact’, and each running the given container against their command line arguments.
These result artifacts
will be grouped into a collection, which will allow you to easily
see all of your results (and if they are in progress, failed, or completed successfully).
Otherwise, batch jobs resemble ingest jobs very closely, and you can check on their standard output and status as they run.