Video Content Performance & Trend Prediction Analysis
Tools: Python, Random Forest, Logistic Regression, KMeans, LDA Topic Modeling
Final Project
In this project, I analyzed over 3 million YouTube trending video records across 113 countries to understand what drives video virality and content performance. I applied machine learning models (Random Forest, Logistic Regression) and clustering methods (KMeans, LDA topic modeling) to identify both engagement drivers and content pattern trends behind high-performing videos. Instead of focusing purely on prediction accuracy, I focused on extracting insights that could help content strategy and digital marketing decision-making.
Explore
Supervised Learning: Predicting Trending Videos
To evaluate trend prediction, I trained classification models using engagement metrics and video metadata, including views, likes, comments, ranking movement, posting timing, and channel information.
Random Forest achieved the best overall performance (F1 = 0.65), highlighting that engagement velocity and ranking momentum are strong indicators of virality.
Logistic Regression helped interpret which features positively or negatively influence trending probability.
Unsupervised Learning: Discovering Content Patterns
To better understand content structure, I applied clustering and topic modeling to group videos based on titles and tags.
I found that trending videos often cluster around themes such as:
Short-form entertainment
Humor and emotional content
Family-friendly or product-demo style content
These patterns were consistent across countries and content categories.
Key Insights
Engagement growth momentum is often a stronger signal of virality than total engagement volume.
Short, emotionally engaging, and easily shareable content has a higher probability of trending globally.
Content that aligns with platform recommendation behavior tends to spread faster across regions.
Virality is influenced by both content theme and early engagement dynamics.