Create Completion

This endpoint facilitates text generation by submitting a structured CompletionRequest to an AI model, which produces text completions based on the input provided.

This Endpoint behaves same as openAI route of /v1/chat/completions. So can be used for detailed usage

**Payload Structure and Parameters**:
- **model** (str): The identifier of the AI model to be used.
- **prompt** (Union[List[int], List[List[int]], str, List[str]]): Input prompt for the model, which can be text or token IDs.
- **best_of** (Optional[int]): The number of best responses to generate from multiple completions.
- **echo** (Optional[bool]): If set to true, includes the prompt in the response.
- **frequency_penalty** (Optional[float]): Penalty applied to discourage frequency of token usage.
- **logit_bias** (Optional[Dict[str, float]]): Adjustments to the probabilities of specified tokens.
- **logprobs** (Optional[int]): The number of log probabilities to return.
- **max_tokens** (Optional[int]): Maximum number of tokens allowed in the completion.
- **n** (int): Number of completions to generate.
- **presence_penalty** (Optional[float]): Penalty that discourages repeated usage of tokens present in the prompt.
- **seed** (Optional[int]): Seed used for generating deterministic responses.
- **stop** (Optional[Union[str, List[str]]]): Sequence where the model stops generating further tokens.
- **stream** (Optional[bool]): If true, the response is streamed as it is generated.
- **stream_options** (Optional[StreamOptions]): Additional options for streaming responses.
- **suffix** (Optional[str]): Text to append to the end of the completion.
- **temperature** (Optional[float]): Controls the randomness of the completion.
- **top_p** (Optional[float]): Nucleus sampling parameter to control the diversity of the response.
- **user** (Optional[str]): User identifier for tracking or customization purposes.

**Advanced Sampling Parameters**:
- **use_beam_search**, **top_k**, **min_p**, **repetition_penalty**, **length_penalty**, **early_stopping**: Various parameters to fine-tune the model's text generation process.

- **stop_token_ids** (Optional[List[int]]): Token IDs that trigger the model to stop generation.
- **ignore_eos** (Optional[bool]): Whether to ignore the end of sequence token.
- **min_tokens** (Optional[int]): Minimum token count for the completion.
- **skip_special_tokens** (Optional[bool]): Skip tokens typically used for formatting.
- **spaces_between_special_tokens** (Optional[bool]): Include spaces between special tokens.
- **truncate_prompt_tokens** (Optional[int]): Truncate the prompt tokens to a specified count if needed.

**Guided Decoding Options**:
- **include_stop_str_in_output**, **response_format**, **guided_json**, **guided_regex**, **guided_choice**, **guided_grammar**, **guided_decoding_backend**, **guided_whitespace_pattern**: These settings allow for stringent control over the formatting and structure of the generated output, adhering to specified patterns, grammars, or formats.

**Example Usage**:
    "model": "meta-llama/Meta-Llama-3-8B-Instruct",
    "prompt": "Explain the theory of relativity:",
    "max_tokens": 150,
    "temperature": 0.7,
    "n": 1

The endpoint returns a JSON response containing the generated text. Security measures are in place to ensure that all requests are authenticated and authorized.
Click Try It! to start a request and see the response here!