Classical Music Genre Predictor

Tools Used

Python, Python Notebooks, Pandas, Scikit-learn, Numpy, Matplotlib

Code

This is my most visual project due to it being a python notebook so I would recommend checking it out here.

Setup

For this project I grabbed a dataset from kaggle containing various stats for 50,000 songs. I started off by getting rid of all the information that was not useful which was mainly the metadata of the song. I then used inner table joins to enumerate leftover columns that used strings. While looking through the data, it was apparrent that the missing values were missing at random, this was my reasoning for replacing the missing values with the median. To finish off preprocessing, I scaled each feature between 0 and 1.

Models Tried

Multi-layer Perceptron, Logistic Regression, K Nearest Neighbor, and Random Forest Classifier.

Results

The multi-layer perceptron ended up being the most accurate at 96.5%. I tried to optimize the model by dropping the two least predictive features and modifying the two most predictive features. After the changes, the accuracy reached 96.8%.

Takeaways

I found the preprocessing to be the most rewarding part of this project. Working with Pandas to turn a raw dataset into something I could use turned out to be much harder than expected. This forced me to learn more such as table joins for enumeration, and strategies of handling missing data.

Ethan Harpster
Ethan Harpster
Computer Science and Engineering Student