Documentation
Complete documentation for copyright.sh - including our Creator Guide for content licensing and API Reference for developers building AI applications.
🚀 Quick Start Guide
- For Creators: Read our Creator Guide and add meta tags to your content
- For Developers: Get an API key at copyright.sh/dashboard
- Check for
ai-licensemeta tags before using content - Verify licensing and log usage through our API
Authentication
All API requests require authentication using Bearer tokens. Include your API key in the Authorization header.
curl -H "Authorization: Bearer sk_live_abc123..."
https://api.copyright.sh/v1/verify-license
API Key Types
sk_test_- Test environmentsk_live_- Production environment
Creator Guide
This guide helps content creators understand AI licensing concepts and implement them effectively using copyright.sh.
The Three Pillars of AI Licensing
Every AI license consists of three key components:
- Stage: What phase of AI processing (infer, embed, tune, train)
- Distribution: Whether outputs will be private or public
- Price: What you get paid per 1,000 tokens read (defaults to free but usage is still logged)
🎨 For Creators: Meta Tag Basics
Add this to your page's <head> section:
<meta name="ai-license" content="allow; distribution:private; price:0.12">
This allows AI systems to read your content for private outputs only, paying you $0.12 per 1,000 tokens read.
WordPress Integration
The official Copyright.sh WordPress plugin automatically adds AI license meta tags to your posts and pages based on your settings.
🔌 WordPress Plugin
Get the plugin: https://github.com/tymrtn/ai-license-wp
Quick Setup:
- Upload the
copyright-sh-ai-licensefolder to/wp-content/plugins/ - Activate through 'Plugins' menu in WordPress
- Configure under Settings → AI License
Requirements: WordPress 6.2+, PHP 7.4+
📖 Complete Documentation
For detailed installation instructions, configuration options, and troubleshooting, see the GitHub repository documentation.
Features include: Automatic meta tags, /ai-license.txt endpoint, global settings with per-page overrides, monetization support, and privacy controls.
AI Stages Explained
Different AI processing stages have different implications for your content. Here's what each stage means:
Core Stages
Train
What it means: Your content becomes part of the dataset used to teach AI models new patterns and knowledge.
Legal context: May be considered fair use in the US but not guaranteed in the EU.
Typical pricing: $0.25-$0.50 per 1K tokens (highest value due to permanent inclusion)
<meta name="ai-license-train" content="allow; distribution:private; price:0.35">
Infer
What it means: AI uses your content to generate real-time responses to user queries.
Usage pattern: Temporary, on-demand access for immediate responses
Typical pricing: $0.05-$0.25 per 1K tokens (depending on distribution)
<meta name="ai-license-infer" content="allow; distribution:private; price:0.10">
Embed
What it means: Your content is converted into mathematical vectors for search and retrieval.
Usage pattern: Semantic search, content recommendation, similarity matching
Typical pricing: $0.05 per 1K tokens (lowest impact on your content)
<meta name="ai-license-embed" content="allow; distribution:public; price:0.05">
Tune
What it means: Fine-tuning and adapting existing models for specific tasks using your content.
Usage pattern: Specialized model adaptation and customization
Typical pricing: $0.15-$0.30 per 1K tokens (higher than inference, lower than training)
<meta name="ai-license-tune" content="allow; distribution:private; price:0.20">
💡 Pro Tip: Stage-Specific Pricing
You can set different prices for different stages. Training typically commands higher rates than inference or embedding.
Distribution Guide
Distribution controls whether AI outputs using your content will be kept private or made publicly accessible.
Distribution Options
Internal use only
Best for: Most content where you want to allow AI assistance but control public distribution
AI can do: Use your content to generate responses shown only to individual users or internal teams
AI cannot do: Publish or redistribute AI outputs based on your content publicly
TTL: Cached tokens valid for 24 hours by default
<meta name="ai-license" content="allow; distribution:private; price:0.08">
Public distribution allowed
Best for: Reference materials, public information, promotional content
AI can do: Use your content to generate responses that may be published, shared, or distributed
Higher pricing recommended: Since this allows broader reach and potential competition
TTL: No caching by default (0 hours) - fresh payment required each time
<meta name="ai-license" content="allow; distribution:public; price:0.25">
Choosing the Right Distribution
| Content Type | Recommended Distribution | Reasoning |
|---|---|---|
| News Articles | Private or tiered pricing | Preserve exclusive reporting while allowing personal AI assistance |
| Educational Content | Private initially, public at higher price | Allow individual learning while charging for broader distribution |
| Creative Writing | Private only | Protect artistic expression from unauthorized public distribution |
| Reference Data | Public | Factual information benefits from broader accessibility |
Combined Stage and Distribution Examples
<!-- Block training, allow private inference -->
<meta name="ai-license-train" content="deny">
<meta name="ai-license-infer" content="allow; distribution:private; price:0.10">
<!-- Free private, paid public -->
<meta name="ai-license" content="allow; distribution:private">
<meta name="ai-license" content="allow; distribution:public; price:0.20">
Pricing Strategy Guide
Setting the right price balances fair compensation with encouraging AI innovation. Our research shows creators are successfully charging these rates:
Industry Benchmark Rates (USD per 1K tokens)
| Content Type | Market Rate | Premium Rate | Real-World Examples |
|---|---|---|---|
| News Articles | $0.02 - $0.05 | $0.05 - $0.10 | News Corp × OpenAI deal benchmark |
| Blog Posts | $0.04 - $0.08 | $0.08 - $0.15 | Professional blogs, niche expertise |
| Technical Docs | $0.05 - $0.12 | $0.12 - $0.25 | API docs, tutorials, how-to guides |
| Research Papers | $0.10 - $0.25 | $0.25 - $0.50 | Academic work, original research |
| Creative Writing | $0.08 - $0.20 | $0.20 - $0.40 | Fiction, poetry, original expression |
Stage-Based Pricing Strategy
| AI Stage | Multiplier | Reasoning |
|---|---|---|
| train | 3-5× base | Permanent inclusion in model weights |
| infer | 1× base | Standard rate for real-time usage |
| embed | 0.5× base | Lower impact, indexing only |
| tune | 2× base | Specialized adaptation of models |
Pricing Factors to Consider
- Content quality: Exclusive, well-researched content commands higher rates
- Freshness: Breaking news or cutting-edge research can be priced premium
- Exclusivity: If you're the only source, you have pricing power
- Volume: Consider offering volume discounts for bulk usage
- Competition: Research what similar creators charge
💰 Pricing Strategy Examples
Conservative Approach
<!-- Encourage adoption with competitive rates -->
<meta name="ai-license" content="allow; distribution:private; price:0.10">
Balanced Approach
<!-- Different rates by stage and distribution -->
<meta name="ai-license" content="allow; distribution:private; price:0.10">
<meta name="ai-license-train" content="allow; distribution:private; price:1.00">
<meta name="ai-license-infer" content="allow; distribution:public; price:0.25">
Premium Approach
<!-- High-value, exclusive content -->
<meta name="ai-license-train" content="allow; distribution:private; price:0.50">
<meta name="ai-license-infer" content="deny">
Tiered Distribution Approach
<!-- Free private, paid public -->
<meta name="ai-license" content="allow; distribution:private">
<meta name="ai-license" content="allow; distribution:public; price:0.25">
🎯 Remember: You Set the Terms
These are guidelines, not rules. You own your content and can price it however you see fit. Start conservative and adjust based on demand and your content's unique value.
Verify License
/v1/verify-license
Check if content is licensed and get pricing information.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | URL of the content to verify |
| tokens | integer | Yes | Number of tokens to be used |
| stage | string | No | infer, embed, tune, or train (defaults to infer) |
| distribution | string | No | private or public (defaults to public) |
Example Request
{
"url": "https://example.com/article",
"tokens": 1500,
"stage": "train",
"distribution": "private"
}
Response
{
"licensed": true,
"rate": 0.05,
"cost": 0.075,
"currency": "EUR",
"license_id": "lic_abc123",
"creator": {
"name": "John Smith",
"verified": true
},
"terms_url": "https://copyright.sh/terms"
}
Get Content
Retrieve normalized metadata and the latest licensing directives for a specific URL. This endpoint is ideal for dashboards, preflight checks, and surfacing creator attribution alongside AI usage.
/v1/content
Provide a canonical URL and we return the creator, pricing, and license settings you should respect before generating outputs.
Example Request
fetch('https://api.copyright.sh/v1/content?url=' + encodeURIComponent('https://example.com/article'), {
headers: {
Authorization: 'Bearer sk_live_abc123'
}
}).then(res => res.json());
{
"url": "https://example.com/article",
"title": "Generative AI Licensing Guide",
"creator": {
"name": "Jordan Lee",
"verified": true
},
"license": {
"stage": "infer",
"distribution": "private",
"price": 0.08
},
"updated_at": "2025-09-18T14:32:00Z"
}
For pagination, filtering, and expanded schema details see the full API reference.
Log Usage
/v1/log-usage
Record content usage and process payment.
{
"license_id": "lic_abc123",
"tokens_used": 1500,
"hmac_signature": "sha256=abc123...",
"timestamp": "2025-01-23T10:00:00Z"
}
Webhooks
Subscribe to webhook events to keep your internal systems synchronized with creator preferences and usage receipts in near real time.
Retry policy: We retry delivery up to three times with exponential backoff. Any 2xx response marks the attempt as successful.
Available Events
usage.logged— fired after a usage record posts to the ledger.license.updated— sent when a creator updates pricing, stages, or distribution settings.license.revoked— triggered if a creator withdraws access for a URL you previously verified.
Verify Signatures
Webhook payloads use the same HMAC signature scheme as the REST API. Compare the x-cs-signature header against a locally generated hash.
import hmac
import hashlib
import json
def verify_webhook(payload: dict, signature: str, secret: str) -> bool:
message = json.dumps(payload, sort_keys=True)
expected = hmac.new(secret.encode(), message.encode(), hashlib.sha256).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
See /docs-api.html#webhooks for payload schemas, testing utilities, and replay attack guidance.
HMAC Verification
All usage logging requires HMAC signatures for security and tamper-proofing.
Python Example
import hmac
import hashlib
import json
def create_hmac_signature(payload, secret_key):
message = json.dumps(payload, sort_keys=True)
signature = hmac.new(
secret_key.encode('utf-8'),
message.encode('utf-8'),
hashlib.sha256
).hexdigest()
return f"sha256={signature}"
# Usage
payload = {
"license_id": "lic_abc123",
"tokens_used": 1500,
"timestamp": "2025-01-23T10:00:00Z"
}
signature = create_hmac_signature(payload, "your_secret_key")
Rate Limits
Rate limits protect creators and maintain reliable performance for every integration. Limits are enforced per API key.
| Endpoint | Limit | Window | Notes |
|---|---|---|---|
POST /v1/verify-license |
60 requests | Per minute | Burstable to 120 for short periods. Contact support if you regularly exceed this. |
GET /v1/content |
90 requests | Per minute | Responses are cached at the edge; repeated lookups reuse the same quota bucket. |
POST /v1/log-usage |
60 requests | Per minute | Use batching or idempotency keys to avoid spikes from retries. |
Tip: If you need higher throughput, email support with projected volumes and we can raise the limits on a per-partner basis.
Python SDK
The Python SDK will provide:
- Simple client initialization with API keys
- License verification and validation
- Usage logging with HMAC signatures
- Automatic retry logic and error handling
- Type hints and comprehensive documentation
Expected release: Q1 2025. Contact us for early access.
JavaScript SDK
The JavaScript SDK will provide:
- Browser and Node.js compatibility
- TypeScript definitions included
- Promise-based async operations
- Automatic token counting utilities
- Built-in caching for license lookups
Expected release: Q1 2025. Contact us for early access.
Go SDK
The Go SDK will provide:
- Idiomatic Go interfaces
- Concurrent request handling
- Context support for cancellation
- Efficient memory usage
- gRPC support for high-performance applications
Expected release: Q2 2025. Contact us for early access.
Error Handling
The API returns standard HTTP status codes and detailed error messages.
| Status Code | Description |
|---|---|
| 200 | Success |
| 400 | Bad Request - Invalid parameters |
| 401 | Unauthorized - Invalid API key |
| 403 | Forbidden - Content not licensed |
| 429 | Rate limit exceeded |
| 500 | Internal server error |
Robots.txt for AI Protection
While meta tags provide licensing terms for AI systems, you can also use robots.txt to completely block unwanted AI crawlers while preserving search engine access.
Why Use Robots.txt?
Complete Control: Robots.txt allows you to:
- Block specific AI companies from accessing your content entirely
- Preserve access for search engines (Google, Bing) for SEO
- Prevent unauthorized scraping while allowing licensed access
- Control which parts of your site are accessible to different bots
How to Implement
Step 1: Download our Pre-configured Robots.txt
Our robots.txt template blocks major AI crawlers while allowing search engines:
# Copyright.sh Robots.txt - AI Protection Template
# Block AI training bots while allowing search engines
# Allow search engines
User-agent: Googlebo
Allow: /
User-agent: Bingbo
Allow: /
User-agent: DuckDuckBo
Allow: /
# Block OpenAI
User-agent: GPTBo
Disallow: /
User-agent: ChatGPT-User
Disallow: /
# Block Anthropic
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
# Block Common Crawl (used by many AI companies)
User-agent: CCBo
Disallow: /
# Block other AI/ML bots
User-agent: PerplexityBo
Disallow: /
User-agent: YouBo
Disallow: /
User-agent: Bytespider
Disallow: /
# Default: Allow all other bots (customize as needed)
User-agent: *
Allow: /
# Sitemap location (optional)
Sitemap: https://yoursite.com/sitemap.xml
Step 2: Upload to Your Website Root
The robots.txt file must be placed at the root of your domain:
# File location
https://yoursite.com/robots.tx
# For WordPress
Upload to your WordPress root directory via FTP or file manager
# For static sites
Place in the root of your public HTML folder
# For Next.js/Reac
Place in the /public directory
Step 3: Verify It's Working
Test your robots.txt implementation:
# Check if accessible
curl https://yoursite.com/robots.tx
# Test with Google Search Console
Use the robots.txt Tester tool in Google Search Console
# Monitor your server logs
Look for bot user agents respecting your rules
Combining with AI Licensing
Best Practice: Use both approaches for maximum protection:
- Meta Tags: Define licensing terms for AI companies that respect them
- Robots.txt: Block unauthorized scrapers and bad actors
- Legal Terms: Include AI licensing in your Terms of Service
This multi-layered approach ensures both technical and legal protection for your content.
Advanced Configuration
Selective Path Blocking
Allow AI access to some content while protecting sensitive areas:
# Allow AI to access your about page
User-agent: GPTBo
Allow: /abou
Allow: /contac
Disallow: /blog/
Disallow: /premium-content/
# Block specific file types
User-agent: CCBo
Disallow: /*.pdf$
Disallow: /*.doc$
Disallow: /downloads/
Rate Limiting with Crawl-delay
Slow down aggressive crawlers:
# Limit crawl rate (in seconds between requests)
User-agent: *
Crawl-delay: 10
# Note: Not all bots respect crawl-delay
# Consider using server-side rate limiting for enforcement
Known AI Bot User Agents
| Company | User Agent(s) | Purpose |
|---|---|---|
| OpenAI | GPTBot, ChatGPT-User | Training & Web browsing |
| Anthropic | anthropic-ai, Claude-Web | Training & Web access |
| Google-Extended | Bard/Gemini training | |
| Common Crawl | CCBot | Dataset collection |
| Perplexity | PerplexityBot | Search & answers |
| You.com | YouBot | Search & AI chat |
| ByteDance | Bytespider | TikTok AI training |
Important: Robots.txt is a public file. Anyone can view your robots.txt to see which bots you're blocking. Don't include sensitive paths or information in your robots.txt file.
Monitoring & Enforcement
While robots.txt is a widely respected standard, it's not legally binding. For complete protection:
- Monitor your server logs for non-compliant bots
- Implement server-side blocking for persistent violators
- Use CloudFlare or similar services for advanced bot management
- Include legal terms prohibiting unauthorized scraping in your ToS
- Consider implementing rate limiting and CAPTCHAs for suspicious traffic
Need Help? Download our pre-configured robots.txt from your Copyright.sh dashboard, or contact support for assistance with custom configurations.