Crawl Website
Request
Chatbase offers an API to crawl websites and returns the internal links it found. This works best with server rendered HTML pages.
To do that you need to make a GET request with to https://www.chatbase.co/api/v1/fetch-links with a sourceURL
query param. Chatbase will find all the internal links that have the sourceURL as a prefix. For example, if sourceURL is https://www.example.com/blog, the API will return ["https://www.example.com/blog", "https://www.example.com/blog/1", "https://www.example.com/blog/2"] but not https://www.example.com since it doesnt have the sourceURL as a prefix. (basically https://www.example.com/blog/* )
Example Requests
cURL
curl 'https://www.chatbase.co/api/v1/fetch-links?sourceURL=https://www.example.com/' \ --request "GET" \ -H 'Authorization: Bearer <Your-API-Key>'
Javascript
const res = await fetch(`/api/fetch-links?sourceURL=${dataSourceUrl}`); const data = await res.json(); console.log(data); // { fetchedLinks: ["https://www.example.com/blog/", "https://www.example.com/blog/1", "https://www.example.com/blog/2"], stoppingReason: '' }
Response
{ fetchedLinks: ["https://www.example.com/blog/", "https://www.example.com/blog/1", "https://www.example.com/blog/2"], stoppingReason: '' }
Stopping reason will sometime not be empty. For example if the site is too big, it will say it is taking more than 60s to find all the links and will return early with the links that it found in 60s.