Abstract: This paper deals with the comparison of different dimensionality reduction techniques when combined with various classification techniques. The dimensionality reduction techniques considered are PCA, ICA, TSVD, LSI and RP. They are mainly used for feature extraction. Their main goal is to reduce noisy data, redundant data, memory/disk needed to store data. They prevent the problem of over-fitting and help to visualize high dimensional data. ANN, SVM, Na´ve Bayes, K-NN, Random Forest are some of the commonly used supervised learning models for classification. Their main advantage is speed of training, predictive accuracy on new data and less memory usage. This paper uses PCA, ICA, and TSVD to attain dimensionality reduction and ANN, SVM, RF to attain classification accuracy on donorchoose.org dataset that tests if a project is A+ or not and the results show that ICA with RF gives the best accuracy.
Keywords: Principle Component Analysis, Independent Component Analysis, Truncated Singular Value Decomposition, Support vector Machine, Artificial Neural Network, Random forest.