Crawler

Overview

The Crawler Integration enables automated data extraction from specified web sources by connecting your platform with external URLs. It allows organizations to fetch, index, and manage publicly available or authorized content for Content Management purposes. This integration is particularly useful for aggregating knowledge base articles, documentation, or website content into a centralized system for easy access and analysis.

Prerequisites

Ensure the following requirements are fulfilled before initiating the integration:

  • Access to Base URL(s) that need to be crawled.

  • Required permissions to access the target web content (public or authorized sources).

  • Access to the platform (TheLoops) with Admin role privileges.

Best practices

  • Provide Valid and Accessible URLs: Ensure all Base URLs are correct, reachable, and authorized for crawling.

  • Use Structured Content Sources: Prefer well-structured websites or documentation portals for better data extraction quality.

  • Avoid Restricted or Dynamic Pages: Ensure the crawler has permission to access the content and avoid pages requiring frequent authentication or dynamic rendering.

  • Validate Integration Post Setup: Always perform a Test Connection to confirm successful configuration.

  • Optimize URL List: Add only relevant URLs to avoid unnecessary data processing and improve performance.

Setup instructions

Initiate Integration

1

Log in to TheLoops using an Admin account.

2

Navigate to Settings → Integrations module.

3

Click on the “+ Add Integration” button.

4

Search for Crawler integration and open it.

Configuration

1

In the Configuration tab, enter a suitable Integration Name.

2

The Content Management capability is selected by default for this integration.

3

In the Base URL field:

  1. Enter the URL(s) you want to crawl.

  2. Multiple URLs can be added as a list.

4

Click on the Connect button.

5

Upon successful configuration:

  • The integration will be added successfully.

  • A success message will be displayed.

Verification

1

Navigate back to the Integrations module.

2

Verify that the newly added integration appears at the top of the list.

3

Click on the Test Connection icon (beside the delete icon).

4

If the test connection is successful, a confirmation message will be displayed, indicating that the integration is configured correctly.

circle-info

Webhook configuration is not required for the Crawler integration, as it operates on a pull-based mechanism.

Last updated

Was this helpful?