Music Popularity: Decision Tree Model

音乐流行度:决策树模型

Python, Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, Graphviz, NLTK

Final Project

Explored how audio features and metadata can predict music popularity on Spotify using tree-based machine learning models. The project involved cleaning a 110k+ song dataset, transforming features (genre, duration, loudness, valence, etc.), and building classifiers and regressors to predict whether a song is popular or not. Key tasks included exploratory analysis, feature engineering, model comparison, and interpretation.

Applied techniques

  • One-hot encoding for categorical features (e.g., genre, mode, explicit flag)

  • Custom genre grouping to create binary popular_genre indicator

  • IQR-based outlier removal and MinMax scaling for numeric features

  • Decision Tree Classifier and Random Forest Classifier for popularity prediction

  • Decision Tree Regressor for modeling continuous popularity score

  • Feature importance extraction and decision tree visualization

  • Performance comparison between classification and regression approaches