MonsterAPI Deploy service deploy Opensource or private llm along with lora adapter with one request onto monsterAPI compute infrastructure.

Path to the base model can be a huggingface model. Needs to be a valid hugging face model.
Our service is based on vllm and any model supported by vllm is hence supported by this service.

Basemodels we have confirmed to work are as follow:

  1. Falcon (tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc.)
  2. GPT-2 (gpt2, gpt2-xl, etc.) - Limited to 1xGPU
  3. GPT-J (EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc.) - Limited to 1xGPU
  4. GPT-NeoX (EleutherAI/gpt-neox-20b, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b, etc.)
  5. LLaMA & LLaMA-2 (meta-llama/Llama-2-70b-hf, lmsys/vicuna-13b-v1.3, young-geng/koala, openlm-research/open_llama_13b, etc.) - lmsys/vicuna-13b-v1.3
  6. Mistral (mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc.)
  7. MPT (mosaicml/mpt-7b, mosaicml/mpt-30b, etc.)
  8. OPT (facebook/opt-66b, facebook/opt-iml-max-30b, etc.)
  9. Qwen (Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc.)

Please note that we are working on adding more models to this list and if you have any specific model request please reach out to us at
[email protected] or join our discord server at

Click Try It! to start a request and see the response here!