MonsterAPI Deploy

Introduction

Introducing MonsterAPI's Deploy Service, your gateway to seamlessly deploy Large language models (LLMs) and docker containers on MonsterAPI's scalable and compliant GPU cloud. Developed with the vLLM (Variably-Large Language Models) project as its foundation, the Deploy service is designed to maximize throughput for serving a vast array of AI models, providing swift deployment with a throughput and cost optimized pipeline.

Quick Start Resources:

📝 Demo Notebooks:

Discover Colab notebooks with Monster Deploy integration on our Projects page 😊🚀!

Features Overview

Diverse Deployment Options:
- Deploy open-source LLMs as REST API endpoints.
- Deploy docker containers with your choice of docker images.
- Deploy finetuned LLMs by simply specifying LoRA adapters.
Increased throughput: Inception from the vLLM project ensures higher throughput while serving requests.
Custom Resource Allocation: Define your custom GPU and RAM configurations.
Multi-GPU Support: Resource allocation for up to 4 GPUs to handle large AI models.

Supported deployment methods:

MonsterAPI Deploy currently supports deployment of LLMs as a REST API endpoint and any custom docker image as a hosted docker container on our low cost yet scalable and secure GPU infrastructure.

Two services are facilitate with these API:

Deployment Services

/deploy/llm: Deploy an LLM as a REST API service with/without LoRA adapter.
/deploy/custom_image: Deploy a docker container with any docker image from your docker registry.
/deploy/sdxl-dreambooth: Deploy a SDXL Gradio Dreambooth with finetuned model.

Finetuning Services

/finetune/llm: Finetune an LLM using LoRA/QLoRA.
/finetune/speech2text/whisper: Finetune a whisper speech to text model.
/finetune/text2image/sdxl-dreambooth: Dreambooth finetune stable diffusion models.

This opens up an array of possibilities such as:

Quickly get an API endpoint that can start serving text generation requests using models like Llama2 7B, CodeLlama 34B, Falcon 40B for your AI projects.

Deploy docker container driven applications such as Automatic1111 for stable diffusion UI, with just specifying a docker image.

Finetune models on MonsterAPI's no-code LLM finetuner and then deploy them with their LoRA adapters to swiftly get an API endpoint serving requests using the domain specific LLM finetuned on your datasets.

Resource Configurations

You can choose from a range of GPU vRAM configurations such as:

RAM Size (GB)	8	16	24	48	80

Our computing network ensures there's enough availability of GPU resource to meet the specific demands of your AI projects and thus, giving you unparalleled processing capability to tackle even the most complex tasks.

Model Compatibility Criteria

This section outlines the foundational requirements and benchmarks that models need to meet to be successfully integrated into the Deploy Service platform. Ensuring compatibility guarantees seamless integration and optimal performance when deploying your models.

Base Model Path: Initiate with a path to a Hugging Face model. Ensure it is authenticated and identified within the Hugging Face platform.
Vast Model Support: Leveraging vLLM technology, Deploy Service accepts any model supported by vLLM.

Curated List of Compatible Base Models

The following table lists the model architectures currently supported by vLLM, along with examples of popular models that utilize these architectures.

Architecture	Models	Example HuggingFace Models
AquilaForCausalLM	AquilaForCausalLM	BAAI/Aquila-7B, BAAI/AquilaChat-7B, etc.
ArcticForCausalLM	Arctic	Snowflake/snowflake-arctic-base, Snowflake/snowflake-arctic-instruct, etc.
BaiChuanForCausalLM	Baichuan & Baichuan2	baichuan-inc/Baichuan2-13B-Chat, baichuan-inc/Baichuan-7B, etc.
BloomForCausalLM	BLOOM, BLOOMZ, BLOOMChat	bigscience/bloom, bigscience/bloomz, etc.
ChatGLMModel	ChatGLM	THUDM/chatglm2-6b, THUDM/chatglm3-6b, etc.
CohereForCausalLM	Command-R	CohereForAI/c4ai-command-r-v01, etc.
DbrxForCausalLM	DBRX	databricks/dbrx-base, databricks/dbrx-instruct, etc.
DeciLMForCausalLM	DeciLM	Deci/DeciLM-7B, Deci/DeciLM-7B-instruct, etc.
FalconForCausalLM	Falcon	tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc.
GemmaForCausalLM	Gemma	google/gemma-2b, google/gemma-7b, etc.
GPT2LMHeadModel	GPT-2	gpt2, gpt2-xl, etc.
GPTBigCodeForCausalLM	StarCoder, SantaCoder, WizardCoder	bigcode/starcoder, bigcode/gpt_bigcode-santacoder, WizardLM/WizardCoder-15B-V1.0, etc.
GPTJForCausalLM	GPT-J	EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc.
GPTNeoXForCausalLM	GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM	EleutherAI/gpt-neox-20b, EleutherAI/pythia-12b, etc.
InternLMForCausalLM	InternLM	internlm/internlm-7b, internlm/internlm-chat-7b, etc.
InternLM2ForCausalLM	InternLM2	internlm/internlm2-7b, internlm/internlm2-chat-7b, etc.
JAISLMHeadModel	Jais	core42/jais-13b, core42/jais-13b-chat, core42/jais-30b-v3, etc.
LlamaForCausalLM	LLaMA, Llama 2, Meta Llama 3,Meta Llama 3.1, Vicuna, Alpaca, Yi	meta-llama/Meta-Llama-3-8B-Instruct, meta-llama/Meta-Llama-3-70B-Instruct, etc.
MiniCPMForCausalLM	MiniCPM	openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, etc.
MistralForCausalLM	Mistral, Mistral-Instruct	mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc.
MixtralForCausalLM	Mixtral-8x7B, Mixtral-8x7B-Instruct	mistralai/Mixtral-8x7B-v0.1, mistralai/Mixtral-8x7B-Instruct-v0.1, etc.
MPTForCausalLM	MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter	mosaicml/mpt-7b, mosaicml/mpt-7b-storywriter, mosaicml/mpt-30b, etc.
OLMoForCausalLM	OLMo	allenai/OLMo-1B-hf, allenai/OLMo-7B-hf, etc.
OPTForCausalLM	OPT, OPT-IML	facebook/opt-66b, facebook/opt-iml-max-30b, etc.
OrionForCausalLM	Orion	OrionStarAI/Orion-14B-Base, OrionStarAI/Orion-14B-Chat, etc.
PhiForCausalLM	Phi	microsoft/phi-1_5, microsoft/phi-2, etc.
Phi3ForCausalLM	Phi-3	microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct, etc.
Phi3SmallForCausalLM	Phi-3-Small	microsoft/Phi-3-small-8k-instruct, microsoft/Phi-3-small-128k-instruct, etc.
Phi3VForCausalLM	Phi-3-Vision	microsoft/Phi-3-vision-128k-instruct, etc.
QwenLMHeadModel	Qwen	Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc.
Qwen2ForCausalLM	Qwen2	Qwen/Qwen2-beta-7B, Qwen/Qwen2-beta-7B-Chat, etc.
Qwen2MoeForCausalLM	Qwen2MoE	Qwen/Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat, etc.
StableLmForCausalLM	StableLM	stabilityai/stablelm-3b-4e1t/ , stabilityai/stablelm-base-alpha-7b-v2, etc.
Starcoder2ForCausalLM	Starcoder2	bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b, etc.
XverseForCausalLM	Xverse	xverse/XVERSE-7B-Chat, xverse/XVERSE-13B-Chat, xverse/XVERSE-65B-Chat, etc.

Note: Our base model list is always expanding. Stay tuned for more integrations!

Beta Phase & Feedback

The Quick Deploy Service is still in its beta stage. We are keen on refining and enhancing the platform based on your feedback.

Get Beta Access: Sign up here for beta access and get free credits to try out the Deploy service deployments.

Guides & Tutorials

API Reference: Visit here
API CURL Usage: MonsterAPI Deploy CURL tutorial
Python Client Usage CookBooks: MonsterAPI Deploy Service Python Tutorial
Demo Notebooks: Quick Start Notebooks

Connect & Explore

Your feedback and insights are a cornerstone of our development.

📧 Email: [email protected]
🤖 Discord: Join our community on discord