BMG · Music Pricing/2025/In production

An ML pricing pipeline for sync licensing.

Sync licensing is the business of placing recorded music in films, ads, and games. Quotes had historically been a manual rate-card exercise. I built the production ML pipeline that prices new requests — and, just as importantly, designs the model around the distribution it actually has to live in.

Role: Sole ML engineer · stakeholder discovery → build → roll-out
Year: 2025
Status: In production

PythonXGBoostLightGBMPydanticSpotify APIGeminiAimBigQuery

Problem

Sync deals are not normally distributed. A handful of premium placements drive most of the revenue while the long tail churns out hundreds of small, repetitive quotes. A single regression that minimises mean error optimises for the wrong thing — it gets the middle of the distribution close and underprices the tails badly.

The legacy system was a rate card with manual overrides. It was interpretable but brittle: pricing decisions weren't auditable, couldn't learn from new deal data, and didn't reflect the changing popularity of catalogue songs.

What I tried first

I started where the textbooks point you: quantile regression for an interval, conformal prediction for a calibrated coverage guarantee, and a Bayesian quantile model for a posterior over the price. Each looked principled on paper. None of them held up against the data.

The single distribution assumption was the issue. The pricing surface isn't one curve — it's several behaviours stitched together. A model that smooths across them produces the same underpriced tails as the legacy rate card, just with a respectable confidence interval drawn around the miss.

Architecture

I split the problem in two. A classifier first decides which pricing regime a request belongs to. A regression head per regime then produces the quote. The two stages are coupled by soft routing — instead of a hard hand-off, the regimes share probability mass at the boundaries, so requests that sit between regimes get a blended price rather than a discontinuous jump.

Stage 1 — LightGBM regime classifier.Predicts which of three pricing regimes a request belongs to (split at $2,500 and $7,500, the "lowsplit" configuration that won across a sweep of binary, lowsplit, balanced and percentile splits). Calibrated so the routing weights are usable as probabilities.
Stage 2 — XGBoost regressors per regime.Each regime has its own gradient-boosted regressor trained only on deals of that type. The final quote is a routing-weighted average of the regressors' predictions. (Tested HistGB, LightGBM, XGBoost, CatBoost — XGBoost wins on the long tail.)
Soft routing with temperature tuning. A temperature on the classifier softmax controls how sharply the regimes are separated. Tuned against held-out regression error, not classifier accuracy. Soft routing cut MAE on small deals from $2,697 (hard routing) to $1,402, with a barely-detectable hit to overall MAE.

Per-bucket MAE makes the routing payoff visible: B0 (<$1.2k) $156, B1 ($1.2k–3.2k) $296, B2 ($3.2k–11.3k) $1,288, B3 ($11.3k–55k) $6,685, B4 (>$55k) $77,975. The tail bucket is intentionally hard — the model gets the regimes that pay off right.

Feature engineering

The interesting features were the ones the rate card couldn't express. Three groups did the work — and the third, generated by Gemini from messy free-text fields, was where most of the accuracy gain came from after data filtering:

Song popularity signals from the Spotify API — monthly listener trends, decay curves, catalogue-vs-frontline age — turned a static catalogue into a time-aware one.
Licence-term features— territory, term length, media, exclusivity — encoded as structured fields with schema-validated types so a typo in an upstream system can't silently shift a price.
Free-text fields → categorical features via Gemini. Two-stage process: Gemini first proposes the schema (analysing ~4,800 unique licensee names to identify patterns), then bulk- classifies records against it. Out came client industry and scale (Super / Major / Medium / Low / Unknown), on-behalf-of flags (e.g. Universal Music Group on behalf of Coca-Cola → Primary_Client = Coca-Cola), legal entity type, content category and network indicators from project names. Same approach on use / media / exclusivity free-text columns.

What made these LLM features actually useful was the domain knowledge feeding into them. I went outside the formal scope of the project, networked directly with negotiators in the licensing team, and asked them what they actually look at when pricing a deal. Their answers — which signals matter, what the categorical edges should be, where the rate card breaks down — became the schema Gemini extracted into. Without that step, the LLM features would have been noise. With it, they were the single biggest accuracy gain after data filtering.

Shipping it

The model lives behind a typed inference contract. Inputs and outputs are Pydantic schemas; the same schema generates the validation layer at the API boundary and the structured-output parser used in feature enrichment. The whole thing ships as an installable package, so the backend service that serves quotes and the offline retraining job consume the same code.

Every training run is tracked in Aim: hyperparams, metrics per regime, and per-segment calibration plots. When a retrain regresses on any segment, it shows up before deploy, not after.

How the licensing team actually uses it. The model is wrapped as a FastMCP server with two functions — get_user_input (collects the song / artist / client / project context, fuzzy-matched against the catalogue) and predict(returns the price bucket, the specific predicted price, the per-bucket probability distribution, and Shapley values explaining which features pushed the price up or down). The server plugs into the licensing team's internal Claude-based AI chat alongside four other MCP tools I built:

Search licensing history by artist.
Search licensing history by song.
Catalogue statistics with filters— pull median / mean / min / max for genres, artists, deal types, time windows ("Black Eyed Peas in ads in the last five years").
Direct catalogue queries that translate natural language into SQL on the backend.

The Shapley values are what make the prediction actually usable. A price without a defensible reason behind it doesn't get adopted — negotiators want to see "$4,200, driven by artist popularity (+$1.2k), TV broadcast scope (+$800), short-term licence (–$300)…", not just a number. The licensing team interacts with the whole suite the same way they interact with the rest of their tooling: by asking the chat questions. The distribution layer turned out to be as important as the model.

Outcome

Reduction in MAE vs the legacy rate card (held-out): ~78%
Projected annual revenue uplift: $7M+
Engineered features in production: 249

The $7M projection has two components: roughly $3.2M from responding to ~20% more deals (currently many low-value requests get no response at all because the manual rate card can't price them confidently), plus roughly $3.8M from pitching ~20% more songs (because the pitching agent can now filter the catalogue by predicted price, surfacing better candidates against briefs).

Final MAE comparison vs every reasonable baseline: best model $2,648 · original model $9,310 (72% better) · legacy rate card $13,345 (80% better) · global median $13,003 · random guess $22,706.