Classical Music Genre Predictor
Tools Used
Python, Python Notebooks, Pandas, Scikit-learn, Numpy, Matplotlib
Code
This is my most visual project due to it being a python notebook so I would recommend checking it out here.
Setup
For this project I grabbed a dataset from kaggle containing various stats for 50,000 songs. I started off by getting rid of all the information that was not useful which was mainly the metadata of the song. I then used inner table joins to enumerate leftover columns that used strings. While looking through the data, it was apparrent that the missing values were missing at random, this was my reasoning for replacing the missing values with the median. To finish off preprocessing, I scaled each feature between 0 and 1.
Models Tried
Multi-layer Perceptron, Logistic Regression, K Nearest Neighbor, and Random Forest Classifier.
Results
The multi-layer perceptron ended up being the most accurate at 96.5%. I tried to optimize the model by dropping the two least predictive features and modifying the two most predictive features. After the changes, the accuracy reached 96.8%.
Takeaways
I found the preprocessing to be the most rewarding part of this project. Working with Pandas to turn a raw dataset into something I could use turned out to be much harder than expected. This forced me to learn more such as table joins for enumeration, and strategies of handling missing data.