post https://llm.monsterapi.ai/v1/chat/completions
This endpoint handles requests for generating chat completions based on a given input sequence using an AI model.
This Endpoint behaves same as openAI route of /v1/chat/completions. So https://github.com/openai/openai-python/blob/main/api.md can be used for detailed usage
**Payload Structure**:
The request body must be a JSON object matching the `ChatCompletionRequest` model, which includes the following parameters:
- **messages** (List[ChatCompletionMessageParam]): A list of message objects, each containing role and content fields, defining the context of the chat.
- **model** (str): The model identifier to use for generating the completion.
- **frequency_penalty** (Optional[float]): A penalty to reduce the frequency of tokens.
- **logit_bias** (Optional[Dict[str, float]]): A dictionary to adjust token probabilities.
- **logprobs** (Optional[bool]): Whether to include log probabilities for the tokens.
- **top_logprobs** (Optional[int]): The number of top log probabilities to consider.
- **max_tokens** (Optional[int]): Maximum number of tokens in the completion.
- **n** (Optional[int]): Number of completions to generate for each prompt.
- **presence_penalty** (Optional[float]): A penalty to increase the likelihood of new tokens appearing in the text.
- **response_format** (Optional[ResponseFormat]): Format for the response output.
- **seed** (Optional[int]): Random seed for reproducibility.
- **stop** (Optional[Union[str, List[str]]]): Tokens at which to stop generating further tokens.
- **stream** (Optional[bool]): Whether to stream the response.
- **stream_options** (Optional[StreamOptions]): Options for streaming responses.
- **temperature** (Optional[float]): Sampling temperature, controlling randomness.
- **top_p** (Optional[float]): Nucleus sampling parameter.
- **tools** (Optional[List[ChatCompletionToolsParam]]): Tools to enhance the chat functionality.
- **tool_choice** (Optional[Union[Literal["none"], ChatCompletionNamedToolChoiceParam]]): Choice of tool to use.
- **user** (Optional[str]): Identifier for the user making the request.
- **best_of** (Optional[int]): Number of completions to generate and return the best.
- **use_beam_search** (Optional[bool]): Whether to use beam search for generating completions.
- **top_k** (Optional[int]): Number of top tokens to consider.
- **min_p** (Optional[float]): Minimum probability threshold for token selection.
- **repetition_penalty** (Optional[float]): Penalty for repeating tokens.
- **length_penalty** (Optional[float]): Penalty for length of the response.
- **early_stopping** (Optional[bool]): Whether to stop generation early.
- **ignore_eos** (Optional[bool]): Whether to ignore the end-of-sequence token.
- **min_tokens** (Optional[int]): Minimum number of tokens in the response.
- **stop_token_ids** (Optional[List[int]]): Token IDs at which to stop generating further tokens.
- **skip_special_tokens** (Optional[bool]): Whether to skip special tokens in the response.
- **spaces_between_special_tokens** (Optional[bool]): Whether to include spaces between special tokens.
- **echo** (Optional[bool]): If true, prepends the new message with the last message if they belong to the same role.
- **add_generation_prompt** (Optional[bool]): If true, adds the generation prompt to the chat template.
- **add_special_tokens** (Optional[bool]): If true, adds special tokens to the prompt.
- **include_stop_str_in_output** (Optional[bool]): Whether to include the stop string in the output.
- **guided_json** (Optional[Union[str, dict, BaseModel]]): If specified, output follows the JSON schema.
- **guided_regex** (Optional[str]): If specified, output follows the regex pattern.
- **guided_choice** (Optional[List[str]]): Choices that the output must match exactly.
- **guided_grammar** (Optional[str]): If specified, output follows the context-free grammar.
- **guided_decoding_backend** (Optional[str]): Overrides the default guided decoding backend for this request.
- **guided_whitespace_pattern** (Optional[str]): Overrides the default whitespace pattern for guided JSON decoding.
**Usage Notes**:
- Ensure that all parameters are correctly formatted and within their specified ranges.
- The `user_info` dependency is used to authenticate and authorize requests.
- The endpoint supports both synchronous and asynchronous processing, with streaming options for real-time responses.
Example request payload:
```json
{
"messages": [{"role": "user", "content": "Hello, who won the world series in 2020?"}],
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"temperature": 0.7,
"max_tokens": 150,
"top_p": 1.0,
"n": 1,
"stop": ["User:", "AI:"]
}
```
The endpoint returns a JSON response with the generated chat completion. Ensure you handle authentication and error responses appropriately.
The endpoint returns a JSON response with the generated chat completion. Ensure you handle authentication and error responses appropriately.