RSS-powered knowledge bases: how to keep an AI clone current

Drew Harris · CEO and Chief Product and Technology Officer · 2026-04-10 · 7 min read

knowledge-managementproductzero-hallucinationpii

Why a stale knowledge base is worse than a small one

A small knowledge base produces a lot of "I don't know" responses. That's a design feature; architectural grounding means the protégé doesn't fabricate when it has no grounding. The expert sees the gap, fills it, moves on.

A stale knowledge base is harder to detect. The protégé answers confidently from content that's three years old. It recommends tactics that stopped working two election cycles ago. It quotes pricing that you changed in January. It repeats a position you've since revised in a newer article. The failures are not obvious to the client and often not obvious to you until someone brings them up.

Most expert-clone platforms treat the knowledge base like a backup tape. You upload once on onboarding. Maybe you re-upload quarterly. In between, the KB goes stale and the protégé drifts from the real expert.

A clone that was you in 2024 isn't you today. It's an archive. Archives have their uses; a protégé isn't supposed to be one.

What "RSS-powered" actually buys you

RSS is an old standard that does one thing well: it gives a machine-readable feed of new items from a source, with timestamps. Every blog platform, most publishing CMSes, every podcast host, and many newsletter tools publish RSS.

"RSS-powered knowledge base" means the platform:

Lets you register an RSS feed URL per protégé.
Polls that feed on schedule (we poll hourly by default; configurable).
Fetches new items since last poll.
Runs each new item through the ingestion pipeline: PII redaction, chunk, embed, dedupe against existing content, store.
Makes the new content retrievable in the protégé's next session.

No manual re-upload. No "every Tuesday I copy the latest blog into the KB." The protégé's knowledge base tracks your published work at the cadence of your publishing.

ApexReplicant's RSS ingestion is the mechanism covered by our patent-pending filing, which focuses on the combination of scheduled polling, PII-aware ingestion, semantic deduplication, and protégé-scoped isolation. Details are in the knowledge base feature page.

Where RSS feeds come from (more than you think)

Most experts don't think of themselves as having an RSS feed. In practice, almost every content source an expert produces has one:

Your blog: WordPress, Ghost, Substack, Medium, Squarespace, Webflow all expose RSS by default. Usually at yourdomain.com/feed or yourdomain.com/rss.
Your podcast: every podcast host publishes an RSS feed. That's what Spotify and Apple Podcasts read. Point it at our ingestion and every new episode's transcript becomes a KB source (audio is transcribed on ingest).
Your newsletter: Substack, Beehiiv, ConvertKit, Ghost all publish feeds. New issues flow in.
Your YouTube channel: YouTube exposes channel RSS (youtube.com/feeds/videos.xml?channel_id=…). Videos get transcribed.
Publications you contribute to: many media sites expose per-author feeds. If you're a regular columnist, point at your author page's RSS and your new columns are auto-ingested.
Your book or course platform: some platforms expose change feeds (new chapters, updates). Ingest those.

The recruiting expert Robin Walters (active on ApexReplicant) publishes regularly; RSS-ingestion means her protégé's answers reflect her newest thinking without her having to remember to upload. Matt Rossetti (legal intake) runs a similar setup for published legal commentary.

How ingestion handles updates, duplicates, and drift

RSS gives you a lot of raw input. The ingestion pipeline has to handle three issues:

Updates to the same post

Blog posts get edited. Typo fixes are noise; substantive revisions are important. Our pipeline detects the item by its canonical GUID (the RSS <guid> field), re-ingests on change, and replaces the prior embedding set. The old version is archived in the audit trail but no longer retrievable.

Duplicates across sources

A blog post is often cross-posted: your site, LinkedIn, Medium, a newsletter. SHA-256 content hashing (shipped dedup feature) detects identical content across sources. The canonical version is stored once; other sources link to it in the audit trail. No retrieval bloat from the same content appearing five times.

Drift between old and new

New content can contradict old content. A 2023 blog post said "I recommend X"; a 2026 blog post says "I no longer recommend X." Both are in the KB. Which wins during retrieval?

Our answer is recency-weighted retrieval when the expert has configured it (off by default, because many experts want their historical content treated as authoritative). When recency weighting is on, the more recent item is preferred at retrieval time, and the protégé's answer leans toward the newer perspective. The older content is still retrievable for questions that specifically reference the older context.

Setup: what the expert actually does

This is a HowTo block. If you're implementing, these are the steps.

Identify your feeds. Check the obvious: your blog, your podcast, your newsletter. Most URLs are /feed, /rss, or discoverable via "View Source" on the homepage (look for <link type="application/rss+xml">).
Open the KB dashboard for the protégé you want to update. (See the feature page.)
Click "Add RSS Feed." Paste the feed URL.
Review the backfill preview. The platform fetches recent items (default: last 90 days) and shows you what would be ingested. You confirm or skip.
Set the polling cadence. Hourly is the default. For podcasts or low-frequency blogs, daily is fine and uses less compute.
Confirm. The scheduler starts polling. New items flow in automatically from that point forward.
Watch the first week. Check the AI insight preview on any items flagged as large or ambiguous. Sometimes a long podcast transcript has long tangents that are better trimmed before ingestion. Epic 4's insight preview lets you review extractions before commit.

That's the whole flow. After step 7, the KB maintains itself for that source.

Common failure modes and how we handle them

RSS is standardized but messy. Common issues and our handling:

Feed returns errors. Retry with exponential backoff; after repeated failure, flag in the expert dashboard for manual review.
Feed changes URL. Detected when the old URL returns 301/permanent. We follow the redirect and update the registration automatically.
Feed is rate-limited. Backoff and retry; polling cadence adapts.
Feed includes items not meant for KB (e.g., a podcast has a "trailer" or "bonus" feed). Expert can filter by item tag or title pattern.
Feed is paywalled. We don't attempt bypass. If content is behind auth, the expert pastes individual items into the KB manually.
Feed includes PII-heavy content (e.g., a podcast transcript with guest names and emails). PII redaction runs on ingest, same as for any other source.

FAQ

How often does the RSS poller run? Hourly by default. Configurable per feed.

Can I ingest content that doesn't have an RSS feed? Yes. URL scraping, PDF upload, direct text paste, audio/video upload, and YouTube URL ingestion are all supported for one-off sources. RSS is for recurring publishing.

Does RSS ingestion cost extra? No. It's part of every protégé on every plan.

Can I have multiple RSS feeds per protégé? Yes, no limit. Experts commonly run 3–6 feeds (blog + podcast + newsletter + author pages on other sites).

What happens if I delete an item from my blog; does the KB delete it? If the item is removed from the feed, we flag it in the KB audit trail as source-removed but do not automatically delete the KB version. Experts can choose to remove items with one click. The default posture is "archive, don't auto-delete" because RSS feeds sometimes truncate (e.g., most feeds only include recent items by default).

Is RSS auto-ingestion a Delphi or Steno feature too? Steno supports podcast RSS ingestion specifically, mainly as an onboarding accelerator. Delphi does not publicly document RSS ingestion as of our competitive refresh. We are the only platform we know of with general-purpose RSS auto-ingestion across all source types, covered by a patent-pending filing.

Can I use this for an internal-only feed (e.g., a private Slack channel's summary feed)? RSS feeds that require authentication can be registered, but we need credentials handled through a secure integration; we don't store raw API keys in feed registrations. Contact support for the auth-required flow.

Talk to a digital protégé.

The fastest way to understand Apex Replicant is to have a conversation with one. It answers only from what its expert taught it — and when it doesn’t know, it says so.

Try a protégé →

Drew Harris

CEO and Chief Product and Technology Officer

Co-founder of Expert Scale, Inc. Writes on platform architecture, product decisions, and how Apex Replicant builds expert-driven AI that refuses to guess.