Documentation

Complete documentation for copyright.sh - including our Creator Guide for content licensing and API Reference for developers building AI applications.

🚀 Quick Start Guide

For Creators: Read our Creator Guide and add meta tags to your content
For Developers: Get an API key at copyright.sh/dashboard
Check for ai-license meta tags before using content
Verify licensing and log usage through our API

Authentication

All API requests require authentication using Bearer tokens. Include your API key in the Authorization header.

curl -H "Authorization: Bearer sk_live_abc123..."
     https://api.copyright.sh/v1/verify-license

API Key Types

sk_test_ - Test environment
sk_live_ - Production environment

Creator Guide

This guide helps content creators understand AI licensing concepts and implement them effectively using copyright.sh.

The Three Pillars of AI Licensing

Every AI license consists of three key components:

Stage: What phase of AI processing (infer, embed, tune, train)
Distribution: Whether outputs will be private or public
Price: What you get paid per 1,000 tokens read (defaults to free but usage is still logged)

🎨 For Creators: Meta Tag Basics

Add this to your page's <head> section:

<meta name="ai-license" content="allow; distribution:private; price:0.12">

This allows AI systems to read your content for private outputs only, paying you $0.12 per 1,000 tokens read.

WordPress Integration

The official Copyright.sh WordPress plugin automatically adds AI license meta tags to your posts and pages based on your settings.

🔌 WordPress Plugin

Get the plugin: https://github.com/tymrtn/ai-license-wp

Quick Setup:

Upload the copyright-sh-ai-license folder to /wp-content/plugins/
Activate through 'Plugins' menu in WordPress
Configure under Settings → AI License

Requirements: WordPress 6.2+, PHP 7.4+

📖 Complete Documentation

For detailed installation instructions, configuration options, and troubleshooting, see the GitHub repository documentation.

Features include: Automatic meta tags, /ai-license.txt endpoint, global settings with per-page overrides, monetization support, and privacy controls.

AI Stages Explained

Different AI processing stages have different implications for your content. Here's what each stage means:

Core Stages

Train

What it means: Your content becomes part of the dataset used to teach AI models new patterns and knowledge.

Legal context: May be considered fair use in the US but not guaranteed in the EU.

Typical pricing: $0.25-$0.50 per 1K tokens (highest value due to permanent inclusion)

<meta name="ai-license-train" content="allow; distribution:private; price:0.35">

Infer

What it means: AI uses your content to generate real-time responses to user queries.

Usage pattern: Temporary, on-demand access for immediate responses

Typical pricing: $0.05-$0.25 per 1K tokens (depending on distribution)

<meta name="ai-license-infer" content="allow; distribution:private; price:0.10">

Embed

What it means: Your content is converted into mathematical vectors for search and retrieval.

Usage pattern: Semantic search, content recommendation, similarity matching

Typical pricing: $0.05 per 1K tokens (lowest impact on your content)

<meta name="ai-license-embed" content="allow; distribution:public; price:0.05">

Tune

What it means: Fine-tuning and adapting existing models for specific tasks using your content.

Usage pattern: Specialized model adaptation and customization

Typical pricing: $0.15-$0.30 per 1K tokens (higher than inference, lower than training)

<meta name="ai-license-tune" content="allow; distribution:private; price:0.20">

💡 Pro Tip: Stage-Specific Pricing

You can set different prices for different stages. Training typically commands higher rates than inference or embedding.

Distribution Guide

Distribution controls whether AI outputs using your content will be kept private or made publicly accessible.

Distribution Options

PRIVATE Internal use only

Best for: Most content where you want to allow AI assistance but control public distribution

AI can do: Use your content to generate responses shown only to individual users or internal teams

AI cannot do: Publish or redistribute AI outputs based on your content publicly

TTL: Cached tokens valid for 24 hours by default

<meta name="ai-license" content="allow; distribution:private; price:0.08">

PUBLIC Public distribution allowed

Best for: Reference materials, public information, promotional content

AI can do: Use your content to generate responses that may be published, shared, or distributed

Higher pricing recommended: Since this allows broader reach and potential competition

TTL: No caching by default (0 hours) - fresh payment required each time

<meta name="ai-license" content="allow; distribution:public; price:0.25">

Choosing the Right Distribution

Content Type	Recommended Distribution	Reasoning
News Articles	Private or tiered pricing	Preserve exclusive reporting while allowing personal AI assistance
Educational Content	Private initially, public at higher price	Allow individual learning while charging for broader distribution
Creative Writing	Private only	Protect artistic expression from unauthorized public distribution
Reference Data	Public	Factual information benefits from broader accessibility

Combined Stage and Distribution Examples

<!-- Block training, allow private inference -->
<meta name="ai-license-train" content="deny">
<meta name="ai-license-infer" content="allow; distribution:private; price:0.10">

<!-- Free private, paid public -->
<meta name="ai-license" content="allow; distribution:private">
<meta name="ai-license" content="allow; distribution:public; price:0.20">

Pricing Strategy Guide

Setting the right price balances fair compensation with encouraging AI innovation. Our research shows creators are successfully charging these rates:

Industry Benchmark Rates (USD per 1K tokens)

Content Type	Market Rate	Premium Rate	Real-World Examples
News Articles	$0.02 - $0.05	$0.05 - $0.10	News Corp × OpenAI deal benchmark
Blog Posts	$0.04 - $0.08	$0.08 - $0.15	Professional blogs, niche expertise
Technical Docs	$0.05 - $0.12	$0.12 - $0.25	API docs, tutorials, how-to guides
Research Papers	$0.10 - $0.25	$0.25 - $0.50	Academic work, original research
Creative Writing	$0.08 - $0.20	$0.20 - $0.40	Fiction, poetry, original expression

Stage-Based Pricing Strategy

AI Stage	Multiplier	Reasoning
train	3-5× base	Permanent inclusion in model weights
infer	1× base	Standard rate for real-time usage
embed	0.5× base	Lower impact, indexing only
tune	2× base	Specialized adaptation of models

Pricing Factors to Consider

Content quality: Exclusive, well-researched content commands higher rates
Freshness: Breaking news or cutting-edge research can be priced premium
Exclusivity: If you're the only source, you have pricing power
Volume: Consider offering volume discounts for bulk usage
Competition: Research what similar creators charge

💰 Pricing Strategy Examples

Conservative Approach

<!-- Encourage adoption with competitive rates -->
<meta name="ai-license" content="allow; distribution:private; price:0.10">

Balanced Approach

<!-- Different rates by stage and distribution -->
<meta name="ai-license" content="allow; distribution:private; price:0.10">
<meta name="ai-license-train" content="allow; distribution:private; price:1.00">
<meta name="ai-license-infer" content="allow; distribution:public; price:0.25">

Premium Approach

<!-- High-value, exclusive content -->
<meta name="ai-license-train" content="allow; distribution:private; price:0.50">
<meta name="ai-license-infer" content="deny">

Tiered Distribution Approach

<!-- Free private, paid public -->
<meta name="ai-license" content="allow; distribution:private">
<meta name="ai-license" content="allow; distribution:public; price:0.25">

🎯 Remember: You Set the Terms

These are guidelines, not rules. You own your content and can price it however you see fit. Start conservative and adjust based on demand and your content's unique value.

Verify License

POST /v1/verify-license

Check if content is licensed and get pricing information.

Parameters

Parameter	Type	Required	Description
url	string	Yes	URL of the content to verify
tokens	integer	Yes	Number of tokens to be used
stage	string	No	infer, embed, tune, or train (defaults to infer)
distribution	string	No	private or public (defaults to public)

Example Request

{
  "url": "https://example.com/article",
  "tokens": 1500,
  "stage": "train",
  "distribution": "private"
}

Response

{
  "licensed": true,
  "rate": 0.05,
  "cost": 0.075,
  "currency": "EUR",
  "license_id": "lic_abc123",
  "creator": {
    "name": "John Smith",
    "verified": true
  },
  "terms_url": "https://copyright.sh/terms"
}

Get Content

Retrieve normalized metadata and the latest licensing directives for a specific URL. This endpoint is ideal for dashboards, preflight checks, and surfacing creator attribution alongside AI usage.

GET /v1/content

Provide a canonical URL and we return the creator, pricing, and license settings you should respect before generating outputs.

Example Request

fetch('https://api.copyright.sh/v1/content?url=' + encodeURIComponent('https://example.com/article'), {
  headers: {
    Authorization: 'Bearer sk_live_abc123'
  }
}).then(res => res.json());

{
  "url": "https://example.com/article",
  "title": "Generative AI Licensing Guide",
  "creator": {
    "name": "Jordan Lee",
    "verified": true
  },
  "license": {
    "stage": "infer",
    "distribution": "private",
    "price": 0.08
  },
  "updated_at": "2025-09-18T14:32:00Z"
}

For pagination, filtering, and expanded schema details see the full API reference.

Log Usage

POST /v1/log-usage

Record content usage and process payment.

{
  "license_id": "lic_abc123",
  "tokens_used": 1500,
  "hmac_signature": "sha256=abc123...",
  "timestamp": "2025-01-23T10:00:00Z"
}

Webhooks

Subscribe to webhook events to keep your internal systems synchronized with creator preferences and usage receipts in near real time.

Retry policy: We retry delivery up to three times with exponential backoff. Any 2xx response marks the attempt as successful.

Available Events

usage.logged — fired after a usage record posts to the ledger.
license.updated — sent when a creator updates pricing, stages, or distribution settings.
license.revoked — triggered if a creator withdraws access for a URL you previously verified.

Verify Signatures

Webhook payloads use the same HMAC signature scheme as the REST API. Compare the x-cs-signature header against a locally generated hash.

import hmac
import hashlib
import json

def verify_webhook(payload: dict, signature: str, secret: str) -> bool:
    message = json.dumps(payload, sort_keys=True)
    expected = hmac.new(secret.encode(), message.encode(), hashlib.sha256).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

See /docs-api.html#webhooks for payload schemas, testing utilities, and replay attack guidance.

HMAC Verification

All usage logging requires HMAC signatures for security and tamper-proofing.

Python Example

import hmac
import hashlib
import json

def create_hmac_signature(payload, secret_key):
    message = json.dumps(payload, sort_keys=True)
    signature = hmac.new(
        secret_key.encode('utf-8'),
        message.encode('utf-8'),
        hashlib.sha256
    ).hexdigest()
    return f"sha256={signature}"

# Usage
payload = {
    "license_id": "lic_abc123",
    "tokens_used": 1500,
    "timestamp": "2025-01-23T10:00:00Z"
}
signature = create_hmac_signature(payload, "your_secret_key")

Rate Limits

Rate limits protect creators and maintain reliable performance for every integration. Limits are enforced per API key.

Endpoint	Limit	Window	Notes
`POST /v1/verify-license`	60 requests	Per minute	Burstable to 120 for short periods. Contact support if you regularly exceed this.
`GET /v1/content`	90 requests	Per minute	Responses are cached at the edge; repeated lookups reuse the same quota bucket.
`POST /v1/log-usage`	60 requests	Per minute	Use batching or idempotency keys to avoid spikes from retries.

Tip: If you need higher throughput, email support with projected volumes and we can raise the limits on a per-partner basis.

Python SDK

🚀 Coming Soon! Our Python SDK is currently in development. For now, use our REST API directly or integrate via our MCP server.

The Python SDK will provide:

Simple client initialization with API keys
License verification and validation
Usage logging with HMAC signatures
Automatic retry logic and error handling
Type hints and comprehensive documentation

Expected release: Q1 2025. Contact us for early access.

JavaScript SDK

🚀 Coming Soon! Our JavaScript/TypeScript SDK is currently in development. For now, use our REST API directly.

The JavaScript SDK will provide:

Browser and Node.js compatibility
TypeScript definitions included
Promise-based async operations
Automatic token counting utilities
Built-in caching for license lookups

Expected release: Q1 2025. Contact us for early access.

Go SDK

🚀 Coming Soon! Our Go SDK is currently in development. For now, use our REST API directly.

The Go SDK will provide:

Idiomatic Go interfaces
Concurrent request handling
Context support for cancellation
Efficient memory usage
gRPC support for high-performance applications

Expected release: Q2 2025. Contact us for early access.

Error Handling

The API returns standard HTTP status codes and detailed error messages.

Status Code	Description
200	Success
400	Bad Request - Invalid parameters
401	Unauthorized - Invalid API key
403	Forbidden - Content not licensed
429	Rate limit exceeded
500	Internal server error

Robots.txt for AI Protection

While meta tags provide licensing terms for AI systems, you can also use robots.txt to completely block unwanted AI crawlers while preserving search engine access.

Why Use Robots.txt?

Complete Control: Robots.txt allows you to:

Block specific AI companies from accessing your content entirely
Preserve access for search engines (Google, Bing) for SEO
Prevent unauthorized scraping while allowing licensed access
Control which parts of your site are accessible to different bots

How to Implement

Step 1: Download our Pre-configured Robots.txt

Our robots.txt template blocks major AI crawlers while allowing search engines:

# Copyright.sh Robots.txt - AI Protection Template
# Block AI training bots while allowing search engines

# Allow search engines
User-agent: Googlebo
Allow: /

User-agent: Bingbo
Allow: /

User-agent: DuckDuckBo
Allow: /

# Block OpenAI
User-agent: GPTBo
Disallow: /

User-agent: ChatGPT-User
Disallow: /

# Block Anthropic
User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

# Block Common Crawl (used by many AI companies)
User-agent: CCBo
Disallow: /

# Block other AI/ML bots
User-agent: PerplexityBo
Disallow: /

User-agent: YouBo
Disallow: /

User-agent: Bytespider
Disallow: /

# Default: Allow all other bots (customize as needed)
User-agent: *
Allow: /

# Sitemap location (optional)
Sitemap: https://yoursite.com/sitemap.xml

Step 2: Upload to Your Website Root

The robots.txt file must be placed at the root of your domain:

# File location
https://yoursite.com/robots.tx

# For WordPress
Upload to your WordPress root directory via FTP or file manager

# For static sites
Place in the root of your public HTML folder

# For Next.js/Reac
Place in the /public directory

Step 3: Verify It's Working

Test your robots.txt implementation:

# Check if accessible
curl https://yoursite.com/robots.tx

# Test with Google Search Console
Use the robots.txt Tester tool in Google Search Console

# Monitor your server logs
Look for bot user agents respecting your rules

Combining with AI Licensing

Best Practice: Use both approaches for maximum protection:

Meta Tags: Define licensing terms for AI companies that respect them
Robots.txt: Block unauthorized scrapers and bad actors
Legal Terms: Include AI licensing in your Terms of Service

This multi-layered approach ensures both technical and legal protection for your content.

Advanced Configuration

Selective Path Blocking

Allow AI access to some content while protecting sensitive areas:

# Allow AI to access your about page
User-agent: GPTBo
Allow: /abou
Allow: /contac
Disallow: /blog/
Disallow: /premium-content/

# Block specific file types
User-agent: CCBo
Disallow: /*.pdf$
Disallow: /*.doc$
Disallow: /downloads/

Rate Limiting with Crawl-delay

Slow down aggressive crawlers:

# Limit crawl rate (in seconds between requests)
User-agent: *
Crawl-delay: 10

# Note: Not all bots respect crawl-delay
# Consider using server-side rate limiting for enforcement

Known AI Bot User Agents

Company	User Agent(s)	Purpose
OpenAI	GPTBot, ChatGPT-User	Training & Web browsing
Anthropic	anthropic-ai, Claude-Web	Training & Web access
Google	Google-Extended	Bard/Gemini training
Common Crawl	CCBot	Dataset collection
Perplexity	PerplexityBot	Search & answers
You.com	YouBot	Search & AI chat
ByteDance	Bytespider	TikTok AI training

Important: Robots.txt is a public file. Anyone can view your robots.txt to see which bots you're blocking. Don't include sensitive paths or information in your robots.txt file.

Monitoring & Enforcement

While robots.txt is a widely respected standard, it's not legally binding. For complete protection:

Monitor your server logs for non-compliant bots
Implement server-side blocking for persistent violators
Use CloudFlare or similar services for advanced bot management
Include legal terms prohibiting unauthorized scraping in your ToS
Consider implementing rate limiting and CAPTCHAs for suspicious traffic

Need Help? Download our pre-configured robots.txt from your Copyright.sh dashboard, or contact support for assistance with custom configurations.