Summairize trust center

Security, privacy & trust

We design crawling, indexing, and answering workflows that put the website owner in control. Here is how we handle the data you share with us today.

What we index

We only crawl domains that you add to Summairize and verify as belonging to your organization. Everything we ingest has to be publicly accessible without authentication.

Our crawler reads robots.txt files and sitemaps on every run. That means we avoid disallowed paths by default, even when they are technically reachable. Managed customers can explicitly override those defaults (for example, when a sitemap is incomplete) once they confirm ownership.

What we don’t do

We do not browse or capture content that sits behind logins, paywalls, or VPNs.
We do not use your customer data or crawl outputs to train broad, general-purpose AI models.
We do not resell or share your indexed content with other customers.

Data handling

Crawl & index storage: We store crawl artifacts and generated embeddings in secure cloud storage tied to your workspace. Access is limited to the employees who operate the service.

API request logs: We log request/response metadata (timestamps, domain, query volume) to monitor uptime, abuse, and quality. Payloads are encrypted in transit and at rest.

Retention: Operational logs and crawl data are retained for debugging and performance tuning within reasonable limits. You can request deletion of specific domains or runs at any time, and we will confirm when the removal is complete.

Controls

Domain verification ensures only authorized owners can trigger a crawl or publish answers.
Allow- and block-list settings at the path level keep sensitive sections out of the index.
Indexed content can be removed on request today, and we are shipping an in-product “remove & re-crawl” control for even faster responses.

Links & policies

For more detail on the contractual and policy framework, visit: