Video Content Performance & Trend Prediction Analysis

Tools: Python, Random Forest, Logistic Regression, KMeans, LDA Topic Modeling

Final Project

In this project, I analyzed over 3 million YouTube trending video records across 113 countries to understand what drives video virality and content performance. I applied machine learning models (Random Forest, Logistic Regression) and clustering methods (KMeans, LDA topic modeling) to identify both engagement drivers and content pattern trends behind high-performing videos. Instead of focusing purely on prediction accuracy, I focused on extracting insights that could help content strategy and digital marketing decision-making.

Explore

Supervised Learning: Predicting Trending Videos

To evaluate trend prediction, I trained classification models using engagement metrics and video metadata, including views, likes, comments, ranking movement, posting timing, and channel information.

Random Forest achieved the best overall performance (F1 = 0.65), highlighting that engagement velocity and ranking momentum are strong indicators of virality.

Logistic Regression helped interpret which features positively or negatively influence trending probability.

Unsupervised Learning: Discovering Content Patterns

To better understand content structure, I applied clustering and topic modeling to group videos based on titles and tags.

  • I found that trending videos often cluster around themes such as:

    • Short-form entertainment

    • Humor and emotional content

    • Family-friendly or product-demo style content

    These patterns were consistent across countries and content categories.

Key Insights

Engagement growth momentum is often a stronger signal of virality than total engagement volume.

Short, emotionally engaging, and easily shareable content has a higher probability of trending globally.

Content that aligns with platform recommendation behavior tends to spread faster across regions.

Virality is influenced by both content theme and early engagement dynamics.