We gave our team's agents direct database access. Now you get the same.
Most "AI for business" products bolt a chat box onto a dashboard and call it a day. We're somewhere else on the curve. ThoughtLeaders rebuilt itself around AI in 2026, and along the way we learned something about how AI agents actually create value: it's freedom - give them access to data and let them fly.
Agents are good at SQL. They are bad at clicking.
We instrumented every internal AI workflow for six months. The pattern was overwhelming: the moment a question got non-trivial — a join, a window function, an unusual filter — the agent wanted to write SQL, not navigate an API. Every layer of abstraction between the agent and the data was a tax. Endpoints, pagination, helper commands — they all got in the way of the question the agent was actually trying to ask.
Almost-direct read access to the database.
So we gave it to them. The agent writes a SELECT; the server enforces the guardrails (SELECT-only, plan-scoped column access, LIMIT caps, AST depth limits). No structured endpoints to wrap, no SDK to keep in sync with the schema, no "you must request access to field X" friction. The agent reads the schema once at the start of the conversation and from then on it asks the database directly — in PostgreSQL, in Elasticsearch DSL, or in Firebolt SQL, depending on which moat it needs.
Every account manager. Every analyst. Every day.
The internal CLI has been the daily driver for TL's account managers, analysts, and leadership since Q1 2026. Emma's rebooking-opportunity agent — one account manager's homebrew workflow built on top of tl db pg and tl db es — surfaces renewal pitches no dashboard could have surfaced and drafts the outreach email beside them. Multiply that by the team. The bet paid off: questions that used to be "ask data eng next sprint" are now answered in the agent's next turn.
The same three commands. The same data. Now external.
ThoughtLeaders.AI productizes that exact setup. Your agent gets read access to the same three databases TL's own agents use — seven years of matched sponsorship outcomes, 1.3M tracked channels, every transcript and brand mention indexed and joined. The schema below is the whole API surface. There isn't a second one hidden behind it.
All you need is tl db.
Three databases sit behind ThoughtLeaders, each tuned for a different shape of question. The CLI exposes them as three thin commands that take a query, run it, and stream the result. Same auth, same response envelope, same output flags (--json, --csv, --md, --toon) on all three.
| Command | Engine | What it does | Why use it |
|---|---|---|---|
| tl db pg | PostgreSQL | Executes one SELECT against the source-of-truth Postgres. Deals, channels, brands, profiles, organizations. Joins, GROUP BY, window functions, percentiles — all accepted. SELECT-only, LIMIT ≤ 500. | Use it for any analytical question that needs structured joins — pipeline math, deal counts by brand, channel cohorts, who-sold-what-to-whom. This is where seven years of matched outcomes live. |
| tl db es | Elasticsearch | Executes one Elasticsearch DSL query against the content index. Videos, titles, descriptions, full transcripts, brand mentions (paid + organic), demographic and reach fields on channel docs. Query types and shapes are an allowlist. | Use it for content discovery and full-text work — "where was brand X mentioned in the last 6 months", "videos whose transcripts discuss Y", "find every channel running a sponsored read for category Z". This is the brand-extractor moat. |
| tl db fb | Firebolt | Executes one single-table SELECT against the historical time-series store. 8.5B rows across article_metrics (per-video daily snapshots) and channel_metrics (per-channel daily snapshots). The leading index column must appear in WHERE. | Use it when you need a value at a point in time that's no longer in the current snapshot — view curves, evergreen multipliers, channel growth trajectories, historical CPM context. Anything "what did this look like on date D". |
The Postgres tables you'll actually query.
thoughtleaders_adlink
150K rows The deals table. Every match, proposal, sold ad, and rejection is one row. Internally called "AdLink" — exposed as sponsorships in the CLI.
| Column | Type | Notes |
|---|---|---|
| id | int | Primary key |
| publish_status | int | 0 Proposed · 2 Pending · 3 Sold · 4 Brand-rej · 5 Pub-rej · 6 Approved · 7 Matched · 8 Outreach · 9 Agency-rej |
| price | numeric | Deal price (USD). Visible to media-buyer accounts. |
| cost | numeric | TL cost. Visible to media-seller accounts. |
| weighted_price | numeric | price × pipeline weight. Stored, not computed. |
| ad_spot_id | int FK | → thoughtleaders_adspot.id (your join hop to channel) |
| creator_profile_id | int FK | → thoughtleaders_profile.id (your hop to brand) |
| owner_advertiser_id | int FK | → auth_user (brand-side owner) |
| owner_publisher_id | int FK | → auth_user (channel-side owner) |
| owner_sales_id | int FK | → auth_user (TL rep) |
| created_at | timestamptz | When the row was created. Use for "all flow over time." |
| purchase_date | timestamptz | When the deal was sold. NULL if never sold. Use for revenue. |
| send_date | timestamptz | Scheduled video drop date. |
| publish_date | timestamptz | Actual video drop date. NULL until live. |
| outreach_date | timestamptz | When outreach went out (sparse). |
| rejection_reason | int | 1–24 codes. 1–9 brand-side, 10–17 publisher-side, 18 demo-mismatch, 19/21/22/24 quality, 20 brand-history, 23 cadence. |
| payment_status | int | 0 Unpaid · 1 Paid |
| performance_grade | int | TL-internal performance rating |
| article_id | varchar | <channel_id>:<youtube_id> — joins to ES _id and slices Firebolt |
| dashboard_campaign_id | int FK | Campaign grouping |
| tx_data | jsonb | Transaction metadata |
thoughtleaders_channel
1.3M rows Every YouTube channel TL has ever observed. The discovery table.
| Column | Type | Notes |
|---|---|---|
| id | int | Primary key |
| channel_name | varchar | Display name (NOT `name`) |
| external_channel_id | varchar | YouTube channel ID (UCxxxxx) |
| url | varchar | Channel URL |
| reach | bigint | Subscriber count (NOT `subscribers`) |
| impression | numeric | Projected views per upload — used for forward-looking CPM |
| media_selling_network_join_date | date | MSN membership = NOT NULL. Scrubbed from the advertiser sandbox view. |
| is_tl_channel | boolean | TPP partner — the elite ~170 channels TL manages directly |
| is_active | boolean | Always include is_active=TRUE unless you specifically want archived rows |
| content_category | int | 1–22 enum. Use the recommender for ranked discovery; this column is best-effort. |
| country | varchar | ISO 3166-1 alpha-2. ~10% NULL. |
| language | varchar | ISO 639-1 (mostly 2-letter), occasionally 3-letter long-tail. NOT BCP-47. |
| last_published | date | Channel's most recent upload date. Use for "is the channel still alive?" |
| sponsorship_score | double | TL-internal channel-quality score. Useful tiebreaker. |
| description | text | LLM-generated channel description (regex-able for thematic filtering) |
| evergreenness | float | Cached evergreen multiplier |
| demographic_usa_share | int 0–100 | % of audience in the US |
| demographic_male_share | int 0–100 | % male audience |
| demographic_age | jsonb | {"18-24": 30, "25-34": 45, ...} |
| demographic_device | jsonb | {"mobile": 60, "desktop": 28, "tv": 12} |
| demographic_geo | jsonb | {"US": 60, "GB": 15, ...} |
| demographic_device_primary | varchar | Materialized top-device convenience field |
| total_views | bigint | All-time channel views |
| demographics_updated_at | timestamptz | When demographic screenshots last processed |
thoughtleaders_brand
61K rows Every brand TL has ever observed advertising or being mentioned. The brand intelligence table.
| Column | Type | Notes |
|---|---|---|
| id | int | Primary key |
| name | varchar | Brand name |
| description | text | Brand description |
| creator_id | int FK | User who created the row |
thoughtleaders_profile
20K rows User profile. The bridge between auth_user and brands. Each profile attaches one brand in practice.
| Column | Type | Notes |
|---|---|---|
| id | int | Primary key |
| organization_id | int FK | → thoughtleaders_organization.id |
| persona | int | User persona code (advertiser / publisher / agency / TL) |
| media_buying_network_join_date | date | MBN membership = NOT NULL |
thoughtleaders_adspot
43K rows Buyable ad placements. Channel × seller × integration type. List prices live here; actual deal prices live on the adlink.
| Column | Type | Notes |
|---|---|---|
| id | int | Primary key |
| channel_id | int FK | → thoughtleaders_channel.id |
| price | numeric | List/catalogue price |
| cost | numeric | List/catalogue cost |
| integration | int | 1 = YouTube Mention (live read). One active mention adspot per channel. |
| is_active | boolean | Active flag |
| publisher_id | int FK | → auth_user.id (NOT profile) |
thoughtleaders_organization
19K rows Top-level organization. Brands and publishers each belong to one. Permission scoping happens here.
| Column | Type | Notes |
|---|---|---|
| id | int | Primary key |
| name | varchar | Org name |
| plan | varchar | Plan tier — gates intelligence access |
auth_user
20K rows Standard Django users. Owners of deals (advertiser, publisher, sales) all point here.
| Column | Type | Notes |
|---|---|---|
| id | int | Primary key |
| first_name | varchar | First name |
| last_name | varchar | Last name |
| varchar |
Junction tables
— rows Many-to-many bridges between the core entities.
| Column | Type | Notes |
|---|---|---|
| thoughtleaders_profile_brands | (profile_id, brand_id) | Profile ↔ Brand. In practice 1:1. |
| thoughtleaders_brand_brands | (from_brand_id, to_brand_id) | Self-referential brand similarity |
$ tl db pg "
SELECT b.name, COUNT(*) AS deals
FROM thoughtleaders_adlink a
JOIN thoughtleaders_profile p ON a.creator_profile_id = p.id
JOIN thoughtleaders_profile_brands pb ON p.id = pb.profile_id
JOIN thoughtleaders_brand b ON pb.brand_id = b.id
WHERE a.publish_status = 3
AND a.purchase_date >= NOW() - INTERVAL '12 months'
GROUP BY b.name
ORDER BY deals DESC
LIMIT 20 OFFSET 0" Video content, transcripts, and brand mentions.
Index is fixed server-side. The only time-window knob is a publication_date range filter inside the body. Article docs and channel docs share the index via a doc_type join — most filtering happens against article docs.
| Field | Type | Notes |
|---|---|---|
| id | keyword | Compound <channel_id>:<youtube_id> — matches PG adlink.article_id |
| title | text | Video title |
| description | text | Video description |
| transcript | text | Full transcript text |
| summary | text | AI-generated video summary |
| publication_date | date | When video was published — your only time-window knob |
| views / total_views / projected_views | long | Current view counts |
| likes | long | Like count |
| comments | integer | Comment count |
| duration / duration_longform / duration_shorts | integer | Seconds. <61 = Short. |
| content_type | keyword | longform / short / live |
| content_category | keyword | Category label |
| language / country / hashtags / format | keyword | Categorical filters |
| face_on_screen | boolean | Is the host on screen? |
| sponsored_brand_mentions | keyword[] | IDs of brands paid to be mentioned in this video |
| organic_brand_mentions | keyword[] | IDs of brands mentioned without payment |
| all_brand_mentions | keyword[] | Union of the above |
| brand_mentions | nested | Full mention objects with timestamp/sentiment |
| channel.id / channel.country / channel.language / channel.content_category / channel.format | keyword | Embedded channel SUBSET on article docs (only these fields) |
| reach / impression / impression_live / impression_shorts | long | Channel-doc only. NOT on the embedded channel.* on article docs. |
| is_tl_channel / is_active / has_outreach_email | boolean | Channel-doc fields |
| media_selling_network_join_date | date | Channel-doc field — MSN status |
$ tl db es '{"size": 50,
"track_total_hits": true,
"query": {"bool": {"must": [
{"term": {"sponsored_brand_mentions": "5612"}},
{"range": {"publication_date": {"gte": "now-6M"}}}
]}},
"_source": ["id","title","channel.id","publication_date","views"]}'
Not accepted: query_string, regexp, wildcard, fuzzy, more_like_this, parent/child joins, anything containing script, multiple aggregations in one body, deep scroll/pit pagination. Use search_after + sort to walk past 10,000.
Historical metrics. 8.5 billion rows. Two tables.
Use Firebolt only when you need a value at a point in time that's no longer in the current ES snapshot — view curves, evergreen multipliers, growth trajectories. For "current views/subs", use ES.
article_metrics
7.4B rows · 159 GiB Video-level snapshots. Each row = one scrape of one video on one date.
| Column | Type | Notes |
|---|---|---|
| id | TEXT | Bare YouTube video ID — NOT the compound form |
| channel_id | INT | TL channel ID. MUST appear in WHERE. |
| publication_date | DATE | When the video published |
| scrape_date | DATE | When this row was captured |
| age | INT | Days since publication = scrape_date − publication_date |
| view_count | INT | Cumulative views at time of scrape |
| like_count | INT | Cumulative likes |
| comment_count | INT | Cumulative comments |
| duration | INT | Video duration in seconds |
PRIMARY (channel_id, id) — channel_id MUST be equality- or IN-filtered.
channel_metrics
1.1B rows · 6.9 GiB Channel-level snapshots. Subscriber and total-view trajectory.
| Column | Type | Notes |
|---|---|---|
| id | INT | TL channel ID. MUST appear in WHERE. |
| scrape_date | DATE | When this row was captured |
| reach | INT | Subscriber count at time of scrape |
| total_views | INT | Channel total views at time of scrape |
PRIMARY (id) — id MUST be equality- or IN-filtered.
$ tl db fb "
SELECT age, view_count, like_count
FROM article_metrics
WHERE channel_id = 12465 AND id = 'dQw4w9WgXcQ'
ORDER BY age" Joining across the three sources.
You join in jq or duckdb, not in SQL — the three engines are physically separate. The keys you stitch on are stable:
| Postgres | Elasticsearch | Firebolt |
|---|---|---|
| channel.id | channel.id (article docs) | article_metrics.channel_id · channel_metrics.id |
| adlink.article_id <channel_id>:<youtube_id> | _id (same compound form) | article_metrics.id (strip the channel_id: prefix) |
| brand.id | sponsored_brand_mentions[] · organic_brand_mentions[] | — |
Server-side guardrails (so your agent can't blow itself up).
- SELECT only. No DDL, DML, transactions, SET, COPY, MERGE.
- LIMIT ≤ 500 on every PG query (or the server fills in 50).
- OFFSET ≥ 10,000 rejected — paginate via the response's
next_offset. - SQL ≤ 50K chars, AST depth ≤ 64, ≤ 5,000 nodes.
- RLS-style scoping on the user's plan: media buyers see
pricenotcost; sellers seecostnotprice.
- ES allowlist of query types — see the not-accepted list above.
- Firebolt single-table only, no JOIN/CTE/subquery; leading index column required in WHERE.
- Aggregation queries bill on
min(hits.total, 200), not onlen(hits). - Stable response envelope:
{ results, total, usage, _breadcrumbs }. - Bearer-token auth via PKCE OAuth or scoped API key. Same on every endpoint.