post https://llm.monsterapi.ai/v1/generate
Generate An Output from the Model
The Supported Large Language Models (LLMs) are -
1. TinyLlama/TinyLlama-1.1B-Chat-v1.0 |
2. microsoft/phi-2 |
3. HuggingFaceH4/zephyr-7b-bet |
4. mistralai/Mistral-7B-Instruct-v0.2 |
Learn more about the New Gen LLMs here
The Request should be a JSON object with the following fields:
Parameter | Description | Default Value |
---|---|---|
model | Model to execute input on supported Models are in object definition. | - |
prompt | Formatted prompt to be fed as input for the model. Note: input to this value is expected to be formatted prompt. | - |
messages | Optional[List[dict]] OpenAI Formatted Message. Example: messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, {"role": "user", "content": "Do you have mayonnaise recipes?"}] When this input format is used, model prompt template is auto applied. Note this is not supported for microsoft/phi-2 . | - |
max_tokens | An integer representing the maximum number of tokens to generate in the output. | - |
n | Number of outputs to generate / number of beams to use. Optional. | - |
best_of | Controls the number of candidate generations to produce from which the best is selected. Optional. | None |
presence_penalty | A float that penalizes new tokens based on their existing presence in the text. Encourages exploration of new topics and ideas. | 0.0 |
frequency_penalty | A float that decreases the likelihood of repetition of previously used words. The higher the penalty, the less likely repetition. | 0.0 |
repetition_penalty | A float that controls the penalty for token repetitions in the output. Values > 1 will penalize and decrease repetition likelihood. | 1.0 |
temperature | A float that controls randomness in the generation. Lower values are more deterministic, higher values encourage diversity. | 1.0 |
top_p | A float in the range [0,1] controlling the nucleus sampling method, which truncates the distribution to the top p%. | 1.0 |
top_k | An integer controlling the number of highest probability vocabulary tokens to keep for top-k filtering. | -1 |
min_p | An integer controlling the minimum number of tokens to be considered for generation. This can prevent generating too few tokens. | 0.0 |
use_beam_search | Boolean indicating whether to use beam search for generation, which might provide better quality outputs at the expense of speed. | False |
length_penalty | A float that penalizes or rewards longer sequences. Values < 1 favor shorter sequences, and values > 1 favor longer ones. | 1.0 |
early_stopping | Boolean indicating whether to stop generation early if the end token is predicted. Makes generation faster and prevents long outputs. | False |
mock_response | Boolean indicating if a mock response is generated. Currently, only True is supported. | - |
Ensure that your input adheres to these parameters for optimal generation results. The model will process the input and generate text based on the configuration and content provided in 'input_variables'.