Batch Jobs

You can run “batch jobs” (essentially ad-hoc jobs running on managed compute) by using the command line ark-batch-tool. This takes in a configuration file that defines your job. For example:

---
config:
  default:
    name: "Generate Pose Amendment"
    type: "Batch Job Result"
    container_image: "095412845506.dkr.ecr.us-east-1.amazonaws.com/djt-mapping-pipeline-ecr:latest"
    cpu_count: 2
    memory_size_mb: 8192
    commands:
      - command: ["https://ark-logs-rhq.s3.amazonaws.com/manifests/b0ab3324-b40e-486e-9620-3de6eb3f729f"]
      - command: ["https://ark-logs-rhq.s3.amazonaws.com/manifests/d6437eac-ce13-44d9-b95a-536ca4bcbce2"]

This configuration contains a few bits:

  • name - The name of the batch job
  • type - This is the artifact type, which must be registered on your instance of the catalog
  • container_image - A reference to your container image that you wish to execute
  • cpu_count - Count of cores to give your job.
  • memory_size_mb - Maximum amount of memory to give your job.
  • commands - The list of commands that you wish to pass to your container.

Each command will be run in a separate instance. In this example, we will start two batch jobs, each getting their own ‘result artifact’, and each running the given container against their command line arguments.

These result artifacts will be grouped into a collection, which will allow you to easily see all of your results (and if they are in progress, failed, or completed successfully).

Otherwise, batch jobs resemble ingest jobs very closely, and you can check on their standard output and status as they run.