uncloseai.

Our Ethical Web Crawler

About Our Web Crawler

Our team uses an ethical web crawler for research and development on the open web. We believe in responsible data collection that respects website owners and follows industry best practices.

User Agent

Our crawler identifies itself with the following user agent:

uncloseai.com/1.42 (ethical web crawler; +https://uncloseai.com)

This clearly identifies who we are and provides a link back to this page for more information.

What We Do

Our crawler is used by our AI systems to:

Answer questions - When users ask about websites or specific information, our bot can fetch and analyze web content to provide accurate answers
Research & development - We study how information is structured on the web to improve our AI models and tools
Content aggregation - We help users discover and understand information across multiple web pages

Our Ethical Practices

We follow strict ethical guidelines to ensure we're good web citizens:

1. Robots.txt Compliance

We always respect robots.txt files. If your site's robots.txt disallows our crawler, we won't access those pages.

# Example: Block our crawler from specific paths
User-agent: uncloseai.com
Disallow: /private/
Disallow: /admin/

2. Crawl Delays

We respect crawl delays specified in robots.txt. Our default is 2 seconds between requests to the same domain, but we'll honor any delay you specify:

# Example: Set custom crawl delay
User-agent: uncloseai.com
Crawl-delay: 5

3. Intelligent Caching

We cache fetched content for 7 days by default. This means:

We won't repeatedly hit your server for the same content
Reduces load on your infrastructure
Faster responses for our users

4. Smart Depth Crawling

We use intelligent depth-based crawling:

Depth 0 - Just the homepage
Depth 1 - Homepage plus directly linked pages
Depth 2 - Goes two levels deep when needed

Our crawler automatically decides how deep to go based on whether it found relevant information, so we don't waste resources fetching unnecessary pages.

5. Query-Aware Relevance

We score pages based on relevance to the user's question, prioritizing:

Pages that contain keywords from the user's query
Clean, readable URLs over dynamic query strings
Pages with substantial, unique content
Descriptive titles that match the topic

Transparency

We believe in being open about our crawling practices:

Clear identification - Our user agent clearly identifies us
Contact information - This page provides details on how to reach us
Open source approach - Our crawler follows public best practices
No stealth crawling - We never hide our identity or bypass restrictions

How to Block Our Crawler

If you don't want our crawler accessing your site, you can block it using robots.txt:

# Block uncloseai.com crawler entirely
User-agent: uncloseai.com
Disallow: /

Or use your server's firewall/WAF to block our user agent string.

Questions or Concerns?

If you have questions about our crawler or want to discuss our access to your site, please reach out:

Website: uncloseai.com
Discord: Join our community at discord.gg/DnhPJG5w
Email: Contact us through our website

Try Our Discord Bot

Our ethical web crawler powers the research capabilities in our Discord bot. The bot can:

Answer questions by fetching and analyzing web content
Execute code in 42+ programming languages
Provide intelligent summaries of web pages
Respect user privacy and website policies

Want to try it? Check out our Discord bot source code to see how our ethical crawler powers the bot's research capabilities.