Generate An Output from the Model

The Supported Large Language Models (LLMs) are -

1. TinyLlama/TinyLlama-1.1B-Chat-v1.0
2. microsoft/phi-2
3. HuggingFaceH4/zephyr-7b-bet
4. mistralai/Mistral-7B-Instruct-v0.2

📘

Learn more about the New Gen LLMs here

The Request should be a JSON object with the following fields:

ParameterDescriptionDefault Value
modelModel to execute input on supported Models are in object definition.-
promptFormatted prompt to be fed as input for the model. Note: input to this value is expected to be formatted prompt.-
messagesOptional[List[dict]] OpenAI Formatted Message. Example:

messages = [ {"role": "user", "content": "What is your favourite condiment?"}, {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, {"role": "user", "content": "Do you have mayonnaise recipes?"}]

When this input format is used, model prompt template is auto applied. Note this is not supported for microsoft/phi-2.
-
max_tokensAn integer representing the maximum number of tokens to generate in the output.-
nNumber of outputs to generate / number of beams to use. Optional.-
best_ofControls the number of candidate generations to produce from which the best is selected. Optional.None
presence_penaltyA float that penalizes new tokens based on their existing presence in the text. Encourages exploration of new topics and ideas.0.0
frequency_penaltyA float that decreases the likelihood of repetition of previously used words. The higher the penalty, the less likely repetition.0.0
repetition_penaltyA float that controls the penalty for token repetitions in the output. Values > 1 will penalize and decrease repetition likelihood.1.0
temperatureA float that controls randomness in the generation. Lower values are more deterministic, higher values encourage diversity.1.0
top_pA float in the range [0,1] controlling the nucleus sampling method, which truncates the distribution to the top p%.1.0
top_kAn integer controlling the number of highest probability vocabulary tokens to keep for top-k filtering.-1
min_pAn integer controlling the minimum number of tokens to be considered for generation. This can prevent generating too few tokens.0.0
use_beam_searchBoolean indicating whether to use beam search for generation, which might provide better quality outputs at the expense of speed.False
length_penaltyA float that penalizes or rewards longer sequences. Values < 1 favor shorter sequences, and values > 1 favor longer ones.1.0
early_stoppingBoolean indicating whether to stop generation early if the end token is predicted. Makes generation faster and prevents long outputs.False
mock_responseBoolean indicating if a mock response is generated. Currently, only True is supported.-

Ensure that your input adheres to these parameters for optimal generation results. The model will process the input and generate text based on the configuration and content provided in 'input_variables'.

Language
Authorization
Bearer
Click Try It! to start a request and see the response here!