uncloseai.
Our Ethical Web Crawler
About Our Web Crawler
Our team uses an ethical web crawler for research and development on the open web. We believe in responsible data collection that respects website owners and follows industry best practices.
User Agent
Our crawler identifies itself with the following user agent:
uncloseai.com/1.42 (ethical web crawler; +https://uncloseai.com)
This clearly identifies who we are and provides a link back to this page for more information.
What We Do
Our crawler is used by our AI systems to:
- Answer questions - When users ask about websites or specific information, our bot can fetch and analyze web content to provide accurate answers
- Research & development - We study how information is structured on the web to improve our AI models and tools
- Content aggregation - We help users discover and understand information across multiple web pages
Our Ethical Practices
We follow strict ethical guidelines to ensure we're good web citizens:
1. Robots.txt Compliance
We always respect robots.txt files. If your site's robots.txt disallows our crawler, we won't access those pages.
# Example: Block our crawler from specific paths
User-agent: uncloseai.com
Disallow: /private/
Disallow: /admin/
2. Crawl Delays
We respect crawl delays specified in robots.txt. Our default is 2 seconds between requests to the same domain, but we'll honor any delay you specify:
# Example: Set custom crawl delay
User-agent: uncloseai.com
Crawl-delay: 5
3. Intelligent Caching
We cache fetched content for 7 days by default. This means:
- We won't repeatedly hit your server for the same content
- Reduces load on your infrastructure
- Faster responses for our users
4. Smart Depth Crawling
We use intelligent depth-based crawling:
- Depth 0 - Just the homepage
- Depth 1 - Homepage plus directly linked pages
- Depth 2 - Goes two levels deep when needed
Our crawler automatically decides how deep to go based on whether it found relevant information, so we don't waste resources fetching unnecessary pages.
5. Query-Aware Relevance
We score pages based on relevance to the user's question, prioritizing:
- Pages that contain keywords from the user's query
- Clean, readable URLs over dynamic query strings
- Pages with substantial, unique content
- Descriptive titles that match the topic
Transparency
We believe in being open about our crawling practices:
- Clear identification - Our user agent clearly identifies us
- Contact information - This page provides details on how to reach us
- Open source approach - Our crawler follows public best practices
- No stealth crawling - We never hide our identity or bypass restrictions
How to Block Our Crawler
If you don't want our crawler accessing your site, you can block it using robots.txt:
# Block uncloseai.com crawler entirely
User-agent: uncloseai.com
Disallow: /
Or use your server's firewall/WAF to block our user agent string.
Questions or Concerns?
If you have questions about our crawler or want to discuss our access to your site, please reach out:
- Website: uncloseai.com
- Discord: Join our community at discord.agents.ai.unturf.com
- Email: Contact us through our website
Try Our Discord Bot
Our ethical web crawler powers the research capabilities in our Discord bot. The bot can:
- Answer questions by fetching and analyzing web content
- Execute code in 42+ programming languages
- Provide intelligent summaries of web pages
- Respect user privacy and website policies
Want to try it? Check out our Discord bot source code to see how our ethical crawler powers the bot's research capabilities.