Docs

Everything you need to set up, configure, and get value from the Hi, I'm Kai conversation layer.

Crawling website content

This page explains how the crawl pipeline discovers public pages, fetches clean text, and refreshes the content that powers the knowledge base.

What does crawling do?

The crawl pipeline discovers and fetches pages from your site, then converts them into clean text that powers Hi, I'm Kai answers. Crawling is the first step in building a knowledge base from existing website content.

The crawler follows public URLs from the root page and respects exclusion rules. It stores status for each URL so you can see which pages were discovered, fetched, skipped, or failed.

How do I start a crawl?

  1. Go to the Crawl page.
  2. If your site is not listed yet, click Add site and enter its URL.
  3. Click Start crawl. The system begins discovering pages from the root URL.

After the crawl starts, review the status list and let the process finish before making large knowledge base edits.

What do crawl statuses mean?

The crawl status table has two columns: the status value stored by the crawler and the meaning of that value for the page.

StatusMeaning
discoveredURL found but not yet fetched.
fetchingPage is being downloaded.
fetchedContent successfully retrieved.
skippedPage excluded by robots.txt or crawl rules.
failedFetch attempt failed because of a timeout, 4xx response, or 5xx response.

Healthy crawls end with most important pages in the fetched state. Investigate repeated failed statuses on pages that should be answer sources.

How do I re-crawl changed content?

Re-run a crawl any time your content changes. Stale content is replaced during re-ingestion so outdated information does not persist in the knowledge base.

For high-impact content such as pricing, hours, or policies, re-crawl immediately after publishing the website update.