Back to Projects
MSE 446 — Introduction to Machine Learning

MTG Archetype Predictor

Machine learning classifier that predicts Magic: The Gathering commander deck archetypes by integrating Scryfall and EDHRec community data.

Sep 2024 – Dec 2024
MLClassificationPythonData ScienceJupyterScikit-Learn

Overview

Magic: The Gathering's Commander format has a rich taxonomy of deck archetypes — Aggro, Control, Combo, Stax, Voltron, Tokens, and dozens of sub-archetypes. Manually classifying a commander card into its archetype requires domain knowledge and is time-consuming. This project builds an ML pipeline that automates classification by combining card statistics from Scryfall with archetype labels and community deck data from EDHRec.

Problem Statement

MTG card databases provide rich mechanical attributes (mana cost, card types, keywords, power/toughness), but archetype classification requires higher-level reasoning about how a commander enables a particular play strategy. The challenge is bridging this gap: learning a mapping from raw card mechanics to community-recognized archetypes from the noisy, multi-label EDHRec dataset.

Approach

Data Integration Pipeline

The first challenge was integrating two heterogeneous data sources:

  • Scryfall API: Provides structured card metadata — mana value, colors, subtypes, keywords, oracle text, power/toughness, set, legality.
  • EDHRec API: Provides community-aggregated archetype tags and deck frequency statistics per commander. Tags represent the dominant play patterns associated with each card (e.g., "Voltron", "Counters", "Aristocrats").

A unified dataset was built by joining on card name, with data quality tracking for cards that appear in one source but not the other.

Feature Engineering

Raw card data was transformed into ML-ready features:

  • Color identity encoding: One-hot encoding of the 5 MTG colors (WUBRG) plus colorless.
  • Keyword extraction: Binary features for common keywords (Flying, Haste, Lifelink, Trample, etc.).
  • Oracle text TF-IDF: Bag-of-words features from the card's rules text, capturing mechanical patterns like "whenever a creature enters" or "sacrifice a creature".
  • Mana curve features: Converted mana value, color pip counts, hybrid mana presence.
  • Type line encoding: Card type and subtype indicators.

Classification Model

Given the multi-label nature of the target (a commander can belong to multiple archetypes), the problem was framed as multi-label classification. Several sklearn estimators were evaluated:

  • Logistic Regression with one-vs-rest
  • Random Forest (multi-output)
  • Gradient Boosted Trees

Evaluation used macro-averaged F1 to account for class imbalance — niche archetypes like "Stax" appear far less frequently in the EDHRec data than popular ones like "Tokens" or "Counters".

Key Findings

Oracle text TF-IDF features were the strongest predictors — specific mechanical phrases were highly correlated with particular archetypes, validating that the card's rules text encodes most of the archetype signal.

  • Color identity alone was a poor predictor — most colors support multiple archetypes, confirming that mechanics matter more than color for classification.
  • The integrated dataset revealed a meaningful subset of commanders present in Scryfall but absent from EDHRec — these tend to be newly released cards without enough community data yet, a natural limitation of any community-sourced label set.
  • Random Forest outperformed Logistic Regression on rare archetypes, where the ensemble structure helps generalize from sparse training examples.

Reflection

The most interesting challenge was the label noise in EDHRec data: community archetype tags reflect dominant deck strategies, not ground truth, and a single commander is often tagged for 3–5 archetypes simultaneously. A more sophisticated approach would model the label dependencies explicitly (e.g., with a classifier chain or a label-aware embedding model) rather than treating each label independently. This project was a strong introduction to real-world multi-label classification with noisy, multi-source data.