thoughtleaders.ai
AT THE FRONTIER OF AGENTIC WORKFLOWS

We gave our team's agents direct database access. Now you get the same.

Most "AI for business" products bolt a chat box onto a dashboard and call it a day. We're somewhere else on the curve. ThoughtLeaders rebuilt itself around AI in 2026, and along the way we learned something about how AI agents actually create value: it's freedom - give them access to data and let them fly.

01 · We watched the work

Agents are good at SQL. They are bad at clicking.

We instrumented every internal AI workflow for six months. The pattern was overwhelming: the moment a question got non-trivial — a join, a window function, an unusual filter — the agent wanted to write SQL, not navigate an API. Every layer of abstraction between the agent and the data was a tax. Endpoints, pagination, helper commands — they all got in the way of the question the agent was actually trying to ask.

02 · We made the call

Almost-direct read access to the database.

So we gave it to them. The agent writes a SELECT; the server enforces the guardrails (SELECT-only, plan-scoped column access, LIMIT caps, AST depth limits). No structured endpoints to wrap, no SDK to keep in sync with the schema, no "you must request access to field X" friction. The agent reads the schema once at the start of the conversation and from then on it asks the database directly — in PostgreSQL, in Elasticsearch DSL, or in Firebolt SQL, depending on which moat it needs.

03 · We rolled it out internally

Every account manager. Every analyst. Every day.

The internal CLI has been the daily driver for TL's account managers, analysts, and leadership since Q1 2026. Emma's rebooking-opportunity agent — one account manager's homebrew workflow built on top of tl db pg and tl db es — surfaces renewal pitches no dashboard could have surfaced and drafts the outreach email beside them. Multiply that by the team. The bet paid off: questions that used to be "ask data eng next sprint" are now answered in the agent's next turn.

04 · We're opening it

The same three commands. The same data. Now external.

ThoughtLeaders.AI productizes that exact setup. Your agent gets read access to the same three databases TL's own agents use — seven years of matched sponsorship outcomes, 1.3M tracked channels, every transcript and brand mention indexed and joined. The schema below is the whole API surface. There isn't a second one hidden behind it.

THE THREE COMMANDS

All you need is tl db.

Three databases sit behind ThoughtLeaders, each tuned for a different shape of question. The CLI exposes them as three thin commands that take a query, run it, and stream the result. Same auth, same response envelope, same output flags (--json, --csv, --md, --toon) on all three.

Command Engine What it does Why use it
tl db pg PostgreSQL Executes one SELECT against the source-of-truth Postgres. Deals, channels, brands, profiles, organizations. Joins, GROUP BY, window functions, percentiles — all accepted. SELECT-only, LIMIT ≤ 500. Use it for any analytical question that needs structured joins — pipeline math, deal counts by brand, channel cohorts, who-sold-what-to-whom. This is where seven years of matched outcomes live.
tl db es Elasticsearch Executes one Elasticsearch DSL query against the content index. Videos, titles, descriptions, full transcripts, brand mentions (paid + organic), demographic and reach fields on channel docs. Query types and shapes are an allowlist. Use it for content discovery and full-text work — "where was brand X mentioned in the last 6 months", "videos whose transcripts discuss Y", "find every channel running a sponsored read for category Z". This is the brand-extractor moat.
tl db fb Firebolt Executes one single-table SELECT against the historical time-series store. 8.5B rows across article_metrics (per-video daily snapshots) and channel_metrics (per-channel daily snapshots). The leading index column must appear in WHERE. Use it when you need a value at a point in time that's no longer in the current snapshot — view curves, evergreen multipliers, channel growth trajectories, historical CPM context. Anything "what did this look like on date D".
See the commands in 18 worked use-cases → — or keep reading for the full schema reference.
01 · POSTGRESQL

The Postgres tables you'll actually query.

thoughtleaders_adlink

150K rows

The deals table. Every match, proposal, sold ad, and rejection is one row. Internally called "AdLink" — exposed as sponsorships in the CLI.

Column Type Notes
id int Primary key
publish_status int 0 Proposed · 2 Pending · 3 Sold · 4 Brand-rej · 5 Pub-rej · 6 Approved · 7 Matched · 8 Outreach · 9 Agency-rej
price numeric Deal price (USD). Visible to media-buyer accounts.
cost numeric TL cost. Visible to media-seller accounts.
weighted_price numeric price × pipeline weight. Stored, not computed.
ad_spot_id int FK → thoughtleaders_adspot.id (your join hop to channel)
creator_profile_id int FK → thoughtleaders_profile.id (your hop to brand)
owner_advertiser_id int FK → auth_user (brand-side owner)
owner_publisher_id int FK → auth_user (channel-side owner)
owner_sales_id int FK → auth_user (TL rep)
created_at timestamptz When the row was created. Use for "all flow over time."
purchase_date timestamptz When the deal was sold. NULL if never sold. Use for revenue.
send_date timestamptz Scheduled video drop date.
publish_date timestamptz Actual video drop date. NULL until live.
outreach_date timestamptz When outreach went out (sparse).
rejection_reason int 1–24 codes. 1–9 brand-side, 10–17 publisher-side, 18 demo-mismatch, 19/21/22/24 quality, 20 brand-history, 23 cadence.
payment_status int 0 Unpaid · 1 Paid
performance_grade int TL-internal performance rating
article_id varchar <channel_id>:<youtube_id> — joins to ES _id and slices Firebolt
dashboard_campaign_id int FK Campaign grouping
tx_data jsonb Transaction metadata

thoughtleaders_channel

1.3M rows

Every YouTube channel TL has ever observed. The discovery table.

Column Type Notes
id int Primary key
channel_name varchar Display name (NOT `name`)
external_channel_id varchar YouTube channel ID (UCxxxxx)
url varchar Channel URL
reach bigint Subscriber count (NOT `subscribers`)
impression numeric Projected views per upload — used for forward-looking CPM
media_selling_network_join_date date MSN membership = NOT NULL. Scrubbed from the advertiser sandbox view.
is_tl_channel boolean TPP partner — the elite ~170 channels TL manages directly
is_active boolean Always include is_active=TRUE unless you specifically want archived rows
content_category int 1–22 enum. Use the recommender for ranked discovery; this column is best-effort.
country varchar ISO 3166-1 alpha-2. ~10% NULL.
language varchar ISO 639-1 (mostly 2-letter), occasionally 3-letter long-tail. NOT BCP-47.
last_published date Channel's most recent upload date. Use for "is the channel still alive?"
sponsorship_score double TL-internal channel-quality score. Useful tiebreaker.
description text LLM-generated channel description (regex-able for thematic filtering)
evergreenness float Cached evergreen multiplier
demographic_usa_share int 0–100 % of audience in the US
demographic_male_share int 0–100 % male audience
demographic_age jsonb {"18-24": 30, "25-34": 45, ...}
demographic_device jsonb {"mobile": 60, "desktop": 28, "tv": 12}
demographic_geo jsonb {"US": 60, "GB": 15, ...}
demographic_device_primary varchar Materialized top-device convenience field
total_views bigint All-time channel views
demographics_updated_at timestamptz When demographic screenshots last processed

thoughtleaders_brand

61K rows

Every brand TL has ever observed advertising or being mentioned. The brand intelligence table.

Column Type Notes
id int Primary key
name varchar Brand name
description text Brand description
creator_id int FK User who created the row

thoughtleaders_profile

20K rows

User profile. The bridge between auth_user and brands. Each profile attaches one brand in practice.

Column Type Notes
id int Primary key
organization_id int FK → thoughtleaders_organization.id
persona int User persona code (advertiser / publisher / agency / TL)
media_buying_network_join_date date MBN membership = NOT NULL

thoughtleaders_adspot

43K rows

Buyable ad placements. Channel × seller × integration type. List prices live here; actual deal prices live on the adlink.

Column Type Notes
id int Primary key
channel_id int FK → thoughtleaders_channel.id
price numeric List/catalogue price
cost numeric List/catalogue cost
integration int 1 = YouTube Mention (live read). One active mention adspot per channel.
is_active boolean Active flag
publisher_id int FK → auth_user.id (NOT profile)

thoughtleaders_organization

19K rows

Top-level organization. Brands and publishers each belong to one. Permission scoping happens here.

Column Type Notes
id int Primary key
name varchar Org name
plan varchar Plan tier — gates intelligence access

auth_user

20K rows

Standard Django users. Owners of deals (advertiser, publisher, sales) all point here.

Column Type Notes
id int Primary key
first_name varchar First name
last_name varchar Last name
email varchar Email

Junction tables

— rows

Many-to-many bridges between the core entities.

Column Type Notes
thoughtleaders_profile_brands (profile_id, brand_id) Profile ↔ Brand. In practice 1:1.
thoughtleaders_brand_brands (from_brand_id, to_brand_id) Self-referential brand similarity
$ tl db pg "
  SELECT b.name, COUNT(*) AS deals
  FROM thoughtleaders_adlink a
  JOIN thoughtleaders_profile p  ON a.creator_profile_id = p.id
  JOIN thoughtleaders_profile_brands pb ON p.id = pb.profile_id
  JOIN thoughtleaders_brand b    ON pb.brand_id = b.id
  WHERE a.publish_status = 3
    AND a.purchase_date >= NOW() - INTERVAL '12 months'
  GROUP BY b.name
  ORDER BY deals DESC
  LIMIT 20 OFFSET 0"
Top brands by sold-deal count, last 12 months. Three joins. One round-trip.
02 · ELASTICSEARCH

Video content, transcripts, and brand mentions.

Index is fixed server-side. The only time-window knob is a publication_date range filter inside the body. Article docs and channel docs share the index via a doc_type join — most filtering happens against article docs.

Field Type Notes
id keyword Compound <channel_id>:<youtube_id> — matches PG adlink.article_id
title text Video title
description text Video description
transcript text Full transcript text
summary text AI-generated video summary
publication_date date When video was published — your only time-window knob
views / total_views / projected_views long Current view counts
likes long Like count
comments integer Comment count
duration / duration_longform / duration_shorts integer Seconds. <61 = Short.
content_type keyword longform / short / live
content_category keyword Category label
language / country / hashtags / format keyword Categorical filters
face_on_screen boolean Is the host on screen?
sponsored_brand_mentions keyword[] IDs of brands paid to be mentioned in this video
organic_brand_mentions keyword[] IDs of brands mentioned without payment
all_brand_mentions keyword[] Union of the above
brand_mentions nested Full mention objects with timestamp/sentiment
channel.id / channel.country / channel.language / channel.content_category / channel.format keyword Embedded channel SUBSET on article docs (only these fields)
reach / impression / impression_live / impression_shorts long Channel-doc only. NOT on the embedded channel.* on article docs.
is_tl_channel / is_active / has_outreach_email boolean Channel-doc fields
media_selling_network_join_date date Channel-doc field — MSN status
$ tl db es '{"size": 50,
  "track_total_hits": true,
  "query": {"bool": {"must": [
    {"term": {"sponsored_brand_mentions": "5612"}},
    {"range": {"publication_date": {"gte": "now-6M"}}}
  ]}},
  "_source": ["id","title","channel.id","publication_date","views"]}'
Find every video where Surfshark (brand id 5612) was a paid sponsor in the last 6 months.

Not accepted: query_string, regexp, wildcard, fuzzy, more_like_this, parent/child joins, anything containing script, multiple aggregations in one body, deep scroll/pit pagination. Use search_after + sort to walk past 10,000.

03 · FIREBOLT

Historical metrics. 8.5 billion rows. Two tables.

Use Firebolt only when you need a value at a point in time that's no longer in the current ES snapshot — view curves, evergreen multipliers, growth trajectories. For "current views/subs", use ES.

article_metrics

7.4B rows · 159 GiB

Video-level snapshots. Each row = one scrape of one video on one date.

Column Type Notes
id TEXT Bare YouTube video ID — NOT the compound form
channel_id INT TL channel ID. MUST appear in WHERE.
publication_date DATE When the video published
scrape_date DATE When this row was captured
age INT Days since publication = scrape_date − publication_date
view_count INT Cumulative views at time of scrape
like_count INT Cumulative likes
comment_count INT Cumulative comments
duration INT Video duration in seconds

PRIMARY (channel_id, id) — channel_id MUST be equality- or IN-filtered.

channel_metrics

1.1B rows · 6.9 GiB

Channel-level snapshots. Subscriber and total-view trajectory.

Column Type Notes
id INT TL channel ID. MUST appear in WHERE.
scrape_date DATE When this row was captured
reach INT Subscriber count at time of scrape
total_views INT Channel total views at time of scrape

PRIMARY (id) — id MUST be equality- or IN-filtered.

$ tl db fb "
  SELECT age, view_count, like_count
  FROM article_metrics
  WHERE channel_id = 12465 AND id = 'dQw4w9WgXcQ'
  ORDER BY age"
The view curve for one video. Composite key required (channel_id is the leading index).

Joining across the three sources.

You join in jq or duckdb, not in SQL — the three engines are physically separate. The keys you stitch on are stable:

Postgres Elasticsearch Firebolt
channel.idchannel.id (article docs)article_metrics.channel_id · channel_metrics.id
adlink.article_id
<channel_id>:<youtube_id>
_id (same compound form)article_metrics.id (strip the channel_id: prefix)
brand.idsponsored_brand_mentions[] · organic_brand_mentions[]

Server-side guardrails (so your agent can't blow itself up).

  • SELECT only. No DDL, DML, transactions, SET, COPY, MERGE.
  • LIMIT ≤ 500 on every PG query (or the server fills in 50).
  • OFFSET ≥ 10,000 rejected — paginate via the response's next_offset.
  • SQL ≤ 50K chars, AST depth ≤ 64, ≤ 5,000 nodes.
  • RLS-style scoping on the user's plan: media buyers see price not cost; sellers see cost not price.
  • ES allowlist of query types — see the not-accepted list above.
  • Firebolt single-table only, no JOIN/CTE/subquery; leading index column required in WHERE.
  • Aggregation queries bill on min(hits.total, 200), not on len(hits).
  • Stable response envelope: { results, total, usage, _breadcrumbs }.
  • Bearer-token auth via PKCE OAuth or scoped API key. Same on every endpoint.

The schema is the product. The CLI is just one client.