How it works

Data Methodology

How the data in BillTaylorBass.com is gathered, normalized, validated, and published — built for long-term maintainability, rollback safety, and scalability to The Shared Stage.

Project Journey → TheSharedStage.com →

Non-negotiable

The Architecture Contract

This is the interface boundary that keeps the public site stable while the canonical data model evolves. It has been the central discipline of the v4+ build and does not get bent.

Editor → Canonical Tabs → Deterministic Export Builder → EXPORT_* Tabs → Public Site

The principle: separate "write complexity" from "read stability." The public site can be boring and reliable while the canonical engine evolves aggressively behind the contract.

Data model

Canonical Schema

v4+ is built around a normalized, time-aware schema that correctly models how live music actually works — multiple bands on a single bill, musicians in multiple roles, venues with their own lifecycle.

Events

  • The "show" entity
  • Date, venue, metadata
  • Poster + external links
  • Deterministic event_id

Performances

  • Band slots per event
  • Headliner + support roles
  • Stable performance_id
  • Order / billing position

Participants

  • Musician × performance
  • Role tags (member/fill-in/sit-in)
  • Instrument per appearance
  • Enables lineup timelines

Venues

  • Normalized venue identity
  • Geo + city + state
  • Venue key for joins
  • Alias resolution

Bands / Musicians

  • Dimension tables
  • Normalized keys
  • Alias crosswalks
  • Metadata + media

Songs / Song_Plays

  • Song catalog per band
  • Setlist-derived plays
  • Event + performance linkage
  • Aggregate stats tables

The public site may lag behind canonical capability until exports and UI wiring are validated. Shipping partial or misleading data is worse than deferring a feature.

Identity

Deterministic IDs & Keys

Stable identifiers are the foundation of a trustworthy archive. Identity never changes in place — if names change, aliases and crosswalks are added, not rewrites.

  • event_id — stable identifier for every show
  • performance_id — stable per band-slot at that event
  • musician_id — deterministic from normalized name
  • venue_key — normalized venue identity for geo + aggregation
  • band_key — normalized band identity for pages + alias resolution

Aliases

Alias & Normalization Layer

Real-world data is messy. Band names change. Venues close and reopen. Musicians go by different names in different contexts. The alias system handles this without corrupting stable keys.

  • Aliases map variant names to canonical keys
  • Import pipelines resolve aliases on ingest
  • Historical records stay accurate even when canonical names are updated
  • Crosswalk tables enable joins across data vintages

Evidence

Sources & Verification

The archive prioritizes primary evidence. When sources disagree, the record is flagged and corrected via the Admin Editor workflow — not resolved silently.

🎵 Archive.org recordings 📋 Setlist.fm 📅 Venue calendars 📰 Press mentions 🖼️ Posters & photos 👤 Human curation

Archive.org recordings are the highest-confidence source for older shows — if a recording exists, the show is confirmed. Setlist.fm provides structured setlist data when available. Venue calendars, press, and photos serve as corroborating evidence. Human curation and cross-checking resolve gaps and disputes.

Setlist ingestion

Songs Pipeline

Setlist data is ingested via a controlled pipeline that prevents duplicate plays, maintains stable event linkage, and keeps aggregate stats accurate. Each import is deterministic — the same source data produces the same result.

Step 1

Probe

Step 2

Discover

Step 3

Import

Step 4

Stats Rebuild

Step 5

Export Push

Feed layer

Publishing & Feeds

The public site pulls data exclusively from web-published CSV endpoints — never from local files, never directly from canonical tabs.


Gated by validation

Deferred Until Participants Pipeline Is Validated

Several features are deliberately held back until the Participants ingestion layer is complete and validated. Shipping partial or misleading data is worse than deferring the feature.

Personnel & lineup display — member timelines, fill-in tracking, and "who played this show" require Participants integrity end-to-end.

Shared bill relationship edges — the Shared Stage graph intelligence is built on Participants data. Invalid Participants = invalid graph.

Band-level advanced analytics — lineup-aware stats (rotation analysis, set role by lineup, per-member metrics) require validated Participants joins.

User submissions + moderation — the Shared Stage claims layer builds on top of a trusted canonical foundation. That foundation has to be solid first.

The bigger picture

This Architecture Scales to The Shared Stage

Every structural decision in the BTB data model was made with platform scalability in mind. The Events / Performances / Participants schema isn't just the right model for a personal archive — it's the right model for a global music relationship graph.

When other musicians and venues contribute their own documented histories, their data plugs into the same canonical structure. The IDs are deterministic. The aliases handle real-world messiness. The export contract keeps the public layer stable no matter how much the canonical engine grows.

BillTaylorBass.com is proof of concept. The Shared Stage is what it scales to.

Read the full Project Journey →