Active Learning Methods for Annotating Training Sets

Abstract: Active learning, a machine learning approach, identifies data requiring human annotations, thereby reducing the cost and time of data collection while maintaining high accuracy. This method involves training machine learning models on a small set of labeled data and then leveraging the model to predict labels for unlabeled objects. Selection of data points where the model is most uncertain for annotation iteratively refines the model until desired results are achieved. Active learning has proven beneficial across various machine learning tasks, including text classification, image classification, entity recognition, and natural language processing, particularly in scenarios where annotation is resource-intensive. In this study, we investigate active learning's application on CIFAR10, EuroSAT, and Fashion MNIST datasets, comparing different active learning methods such as minimum confidence, probability models, and entropy models. Our findings illustrate that both approaches enhance model performance compared to random sampling, underscoring the efficacy of active learning in improving image classification tasks across diverse datasets.

Keywords: Active learning , Human Labeling, Least Confidence, Margin Sampling, Entropy Sampling, CIFAR-10 , EuroSAT, CNN, Fashion MNIST.

| DOI: 10.17148/IARJSET.2024.11459