For this notebook we will be looking to classify a public dataset of transactions into a number of categories that we have predefined. These approaches should be replicable to any multiclass classification use case where we are trying to fit transactional data into predefined categories, and by the end of running through this you should have a few approaches for dealing with both labelled and unlabelled datasets.
The different approaches we'll be taking in this notebook are:
- Zero-shot Classification: First we'll do zero shot classification to put transactions in one of five named buckets using only a prompt for guidance
- Classification with Embeddings: Following this we'll create embeddings on a labelled dataset, and then use a traditional classification model to test their effectiveness at identifying our categories
- Fine-tuned Classification: Lastly we'll produce a fine-tuned model trained on our labelled dataset to see how this compares to the zero-shot and few-shot classification approaches