Job Status and Tracking

Once your job is validated and all prerequisites check out, the job will launch and be listed under the "Previous Jobs" section.

A job can have four states:

  • Launching
  • In Progress
  • Completed
  • Failed

When job is In Progress:

After launching, it typically takes 4-5 minutes for the job to enter the In Progress state.

During this phase, you can view job logs by selecting View Logs from the Select an Action menu on your Job card.

If you've provided WandB credentials, you can also track the job run using the "View Metrics" option.

The most recent adapter checkpoints can be downloaded while the program is running.

Checkpoints are saved four times per epoch. If the job's duration is less than one epoch, then checkpoints are saved four times throughout the entire run

When job has been completed:

On job completion, the Download option becomes available under Select an Action.

Clicking this will download a zip file containing all the fine-tuned model files.

If you provided HuggingFace credentials, the model files would also be automatically uploaded to your specified HuggingFace repository.