In this guide, we’ll walk you through the process of creating a dataset used for training your custom model.

1

Prepare your dataset

Choose a dataset of image, video, or audio files for your custom model to learn from—ideally, one that captures the different states, preferences, or outcomes important to your application.

Each dataset must contain files of a single media type, such as all images, all videos, or all audio files.

Then, begin by organizing your files into labeled subfolders.

In this tutorial, we’ll put together a dataset of images with facial expressions classified as negative, neutral, or positive. This dataset can then be used to train a custom model for sentiment analysis. Start by creating a main folder called ‘User Sentiment’ with subfolders labeled ‘Negative,’ ‘Neutral or Ambiguous,’ and ‘Positive.’ Our platform will interpret these as labels for the images they contain.

Example Dataset Files

The amount of data you’ll need to build an accurate model depends on your goal’s complexity. Generally, it’s good practice to have a similar number of samples for each label you want to predict. You’ll also want to consider other forms of imbalance or bias in your dataset. The length of file, number of speakers, and language spoken can also impact the model’s predictive accuracy. To learn more, see our FAQ on building datasets.

3

Create your dataset

Provide a title for your dataset. Then, add a column named after the category you are predicting and specify the data type for this column (categorical or numerical).

In our example, we can name the column ‘User Sentiment’ and select ‘Categorical’ as the data type.

Enter a name and category for your dataset
4

Upload the folder containing your dataset

Now, drag-and-drop the folder containing your dataset.

Remember, the folder should include subfolders for each label containing the corresponding samples.

Upload the folder containing your dataset

In the pop-up window, assign a name to the label column, which represents the overall category you are predicting.

In our example, we can assign ‘User Sentiment’ as the name. Then, click the Save Labels and Continue button and subsequently approve the uploading process.

Assign a label name in the pop-up window
5

Verify your uploads

Check the total file count and address any detected issues.

Once you’re ready, hit the Save button on the top right of the page.

Verify your uploads

If you accidentally uploaded a mixed-media dataset, a pop-up window will ask you to select the single file type you would like to keep.

Now, you’re ready to train your custom model!