
Music Popularity: Decision Tree Model
音乐流行度:决策树模型
Python, Pandas, NumPy, scikit-learn, Matplotlib, Seaborn, Graphviz, NLTK
Final Project
Explored how audio features and metadata can predict music popularity on Spotify using tree-based machine learning models. The project involved cleaning a 110k+ song dataset, transforming features (genre, duration, loudness, valence, etc.), and building classifiers and regressors to predict whether a song is popular or not. Key tasks included exploratory analysis, feature engineering, model comparison, and interpretation.
Applied techniques
One-hot encoding for categorical features (e.g., genre, mode, explicit flag)
Custom genre grouping to create binary
popular_genre
indicatorIQR-based outlier removal and MinMax scaling for numeric features
Decision Tree Classifier and Random Forest Classifier for popularity prediction
Decision Tree Regressor for modeling continuous popularity score
Feature importance extraction and decision tree visualization
Performance comparison between classification and regression approaches