Dataset Preparation

Once you have chosen an LLM and are on the Dataset Preparation page:

Choose a Task and Dataset:

  1. Click on the Select a Task dropdown and choose the type of task you are training over, such as:
  1. Then click on the Choose Dataset dropdown. You have 3 options here:
    1. Choose a pre-curated HuggingFace Dataset,
    2. Specify your own choice of HuggingFace Dataset by selecting "other" option, or
    3. Use your own dataset uploaded via 'Dataset management' portal.

Using HuggingFace Datasets:

  • If you want to use a pre-curated HuggingFace dataset then simply select it from the dropdown list.
  • If you want to use any unlisted HuggingFace dataset then choose "other" option and mention the name of HuggingFace dataset to be fetched as shown below:

Using your own Datasets:

If you have already uploaded a dataset using 'Dataset management' portal then it'd appear as an option under "My Datasets" section in the dropdown as shown below:

Prompt Configuration:

Upon selecting a dataset, you'll notice a section called as Prompt configuration.

This section needs to be modified based on your selected dataset.

📘

For pre-curated HuggingFace Datasets:

If you select a pre-curated HuggingFace Dataset then no dataset prompt configuration is required as it is pre-filled and you can simply click on "Next" to proceed ahead.

For other HuggingFace Datasets or your own Dataset:

Replace the placeholders inside the square the brackets with the actual column names in your dataset that you want to use for Fine-tuning:

For example if our dataset looks like this:

We will replace these

  • {replace with instruction column name} and
  • {replace with response column name}

with

  • {prompt} and
  • {response}

respectively i.e. the column names of our target columns in the dataset.

Our updated data preparation window would look like this after making the changes:

And we are done!!! Our FineTuner will take care of the rest.

Simply click on "Next" and finalize your finetuning job request.