Fine-tune a Large Language Model (LLM)

This guide walks you through the process of fine-tuning an LLM on the MonsterAPI Platform. Follow these steps to prepare your data, launch a fine-tuning job, track its progress, and deploy your model.

Welcome to our guide on fine-tuning LLMs. This guide provides detailed walkthroughs to help you launch, track, and understand the billing for fine-tuning Large Language Models (LLMs) on the MonsterAPI platform.

Demo Colab Notebooks with Python Client

Service NameColab Notebook
LLM FinetuningOpen In Colab
LLM Finetuning + Quantise ModelOpen In Colab
LLM Finetuning + Quantise + Deploy ModelOpen In Colab
LLM Finetuning + Evaluation + Quantise + Deploy ModelOpen In Colab

Fine-tuning Process

Supported Models for Fine-Tuning:

We support over 80 open-source LLMs, offering powerful LLM customization solutions for your needs.

Model NameModel NameModel NameModel NameModel Name
meta-llama/Meta-Llama-3.1-70B-Instructmeta-llama/Meta-Llama-3.1-70Bgoogle/gemma-2-27bgoogle/gemma-2-27b-itmonsterapi/Meta-Llama-3-70B-Instruct_4bit_bnb
monsterapi/Meta-Llama-3-70B_4bit_bnbmonsterapi/Llama-2-70b-hfmonsterapi/CodeLlama-70b-hf_4bit_bnbtiiuae/falcon-40bmistralai/Mixtral-8x7B-Instruct-v0.1
mistralai/Mixtral-8x7B-v0.1codellama/CodeLlama-34b-hfmonsterapi/Mixtral-8x7B-v0.1_4bit_bnbmistralai/Mistral-7B-Instruct-v0.2mistralai/Mistral-7B-v0.3
mistralai/Mistral-7Bmeta-llama/Llama-2-7b-chat-hfmeta-llama/Llama-2-7b-hfcodellama/CodeLlama-13b-hfmeta-llama/Llama-2-13b-chat-hf
google/gemma-2-9bgoogle/gemma-2-9b-itmeta-llama/Meta-Llama-3.1-8B-Instructmeta-llama/Meta-Llama-3.1-8BHuggingFaceTB/SmolLM-1.7B
HuggingFaceTB/SmolLM-1.7B-InstructHuggingFaceTB/SmolLM-360MHuggingFaceTB/SmolLM-360M-InstructHuggingFaceTB/SmolLM-135MHuggingFaceTB/SmolLM-135M-Instruct
facebook/opt-350mapple/OpenELM-1_1Bapple/OpenELM-1_1B-InstructTinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3Tapple/OpenELM-450M-Instruct
apple/OpenELM-450Mapple/OpenELM-270Mapple/OpenELM-270M-Instructgoogle/codegemma-7bmeta-llama/Llama-2-13b-chat-hf
sarvamai/OpenHathi-7B-Hi-v0.1-Baseteknium/OpenHermes-7BHuggingFaceTB/SmolLM-1.7Bfacebook/opt-1.3bHuggingFaceTB/SmolLM-1.7B
Qwen/Qwen2-0.5BQwen/Qwen2-0.5B-Instructapple/OpenELM-1_1Bfacebook/opt-350mmeta-llama/Llama-2-7b-chat-hf

For getting most up to date list of supported LLMs for fine-tuning, you may query our fine-tuning service contract API.


Click on the above link to learn how to initiate a LLM fine-tuning job on MonsterAPI, from creating a new job and selecting a model to preparing datasets and submitting your job. This guide provides all necessary steps for a successful fine-tuning job launch.


Click on the above link to learn how to prepare your dataset correctly to ensure optimal model performance. Our guide covers the list of supported format and provides examples to help you get started smoothly. Refer to the custom datasets guide for additional information.

Maximum Supported Cutoff Length for Models:

  • Models with ≤ 30 billion parameters size: 4096
  • Models with > 30 billion parameters size: 512

Monitor the progress of your fine-tuning job, view logs, track metrics using Weights & Biases (if enabled), and download your model weights upon completion.


Understand the cost of fine-tuning jobs, including billing details, per-minute costs, and handling of credits. Ensure an active payment method and subscription to avoid job interruptions.

For questions, reach out to MonsterAPI Support.


Deploy your fine-tuned model easily with Monster Deploy.

After fine-tuning is completed, follow these steps to launch an inference API endpoint:

  1. Click on "Deploy model" option from the dropdown on your fine-tuned model's card.

  2. Ensure the configuration is correct.

  3. Click on "Deploy" button.

This is MonsterAPI's One-click deploy option that launches an inference API endpoint for your fine-tuned model which can be easily queried with the Chat completions OpenAI format endpoint.


Benefits of Monster Deploy:

  1. Open-Source LLMs: Deploy open-source LLMs as REST API endpoints.
  2. Fine-tuned LLM Deployment: Deploy LoRA adapters and Inception from the vLLM project for enhanced performance.
  3. Custom Resource Allocation: Define GPU and RAM settings for efficient deployment.
  4. Multi-GPU Support: Allocate resources across up to 4 GPUs for handling large models.