Documentation

Complete documentation for copyright.sh - including our Creator Guide for content licensing and API Reference for developers building AI applications.

🚀 Quick Start Guide

  1. For Creators: Read our Creator Guide and add meta tags to your content
  2. For Developers: Get an API key at copyright.sh/dashboard
  3. Check for ai-license meta tags before using content
  4. Verify licensing and log usage through our API

Authentication

All API requests require authentication using Bearer tokens. Include your API key in the Authorization header.

curl -H "Authorization: Bearer sk_live_abc123..."
     https://api.copyright.sh/v1/verify-license

API Key Types

  • sk_test_ - Test environment
  • sk_live_ - Production environment

Creator Guide

This guide helps content creators understand AI licensing concepts and implement them effectively using copyright.sh.

The Three Pillars of AI Licensing

Every AI license consists of three key components:

  • Stage: What phase of AI processing (infer, embed, tune, train)
  • Distribution: Whether outputs will be private or public
  • Price: What you get paid per 1,000 tokens read (defaults to free but usage is still logged)

🎨 For Creators: Meta Tag Basics

Add this to your page's <head> section:

<meta name="ai-license" content="allow; distribution:private; price:0.12">

This allows AI systems to read your content for private outputs only, paying you $0.12 per 1,000 tokens read.

WordPress Integration

The official Copyright.sh WordPress plugin automatically adds AI license meta tags to your posts and pages based on your settings.

🔌 WordPress Plugin

Get the plugin: https://github.com/tymrtn/ai-license-wp

Quick Setup:

  1. Upload the copyright-sh-ai-license folder to /wp-content/plugins/
  2. Activate through 'Plugins' menu in WordPress
  3. Configure under Settings → AI License

Requirements: WordPress 6.2+, PHP 7.4+

📖 Complete Documentation

For detailed installation instructions, configuration options, and troubleshooting, see the GitHub repository documentation.

Features include: Automatic meta tags, /ai-license.txt endpoint, global settings with per-page overrides, monetization support, and privacy controls.

AI Stages Explained

Different AI processing stages have different implications for your content. Here's what each stage means:

Core Stages

Train

What it means: Your content becomes part of the dataset used to teach AI models new patterns and knowledge.

Legal context: May be considered fair use in the US but not guaranteed in the EU.

Typical pricing: $0.25-$0.50 per 1K tokens (highest value due to permanent inclusion)

<meta name="ai-license-train" content="allow; distribution:private; price:0.35">

Infer

What it means: AI uses your content to generate real-time responses to user queries.

Usage pattern: Temporary, on-demand access for immediate responses

Typical pricing: $0.05-$0.25 per 1K tokens (depending on distribution)

<meta name="ai-license-infer" content="allow; distribution:private; price:0.10">

Embed

What it means: Your content is converted into mathematical vectors for search and retrieval.

Usage pattern: Semantic search, content recommendation, similarity matching

Typical pricing: $0.05 per 1K tokens (lowest impact on your content)

<meta name="ai-license-embed" content="allow; distribution:public; price:0.05">

Tune

What it means: Fine-tuning and adapting existing models for specific tasks using your content.

Usage pattern: Specialized model adaptation and customization

Typical pricing: $0.15-$0.30 per 1K tokens (higher than inference, lower than training)

<meta name="ai-license-tune" content="allow; distribution:private; price:0.20">

💡 Pro Tip: Stage-Specific Pricing

You can set different prices for different stages. Training typically commands higher rates than inference or embedding.

Distribution Guide

Distribution controls whether AI outputs using your content will be kept private or made publicly accessible.

Distribution Options

PRIVATE Internal use only

Best for: Most content where you want to allow AI assistance but control public distribution

AI can do: Use your content to generate responses shown only to individual users or internal teams

AI cannot do: Publish or redistribute AI outputs based on your content publicly

TTL: Cached tokens valid for 24 hours by default

<meta name="ai-license" content="allow; distribution:private; price:0.08">
PUBLIC Public distribution allowed

Best for: Reference materials, public information, promotional content

AI can do: Use your content to generate responses that may be published, shared, or distributed

Higher pricing recommended: Since this allows broader reach and potential competition

TTL: No caching by default (0 hours) - fresh payment required each time

<meta name="ai-license" content="allow; distribution:public; price:0.25">

Choosing the Right Distribution

Content Type Recommended Distribution Reasoning
News Articles Private or tiered pricing Preserve exclusive reporting while allowing personal AI assistance
Educational Content Private initially, public at higher price Allow individual learning while charging for broader distribution
Creative Writing Private only Protect artistic expression from unauthorized public distribution
Reference Data Public Factual information benefits from broader accessibility

Combined Stage and Distribution Examples

<!-- Block training, allow private inference -->
<meta name="ai-license-train" content="deny">
<meta name="ai-license-infer" content="allow; distribution:private; price:0.10">

<!-- Free private, paid public -->
<meta name="ai-license" content="allow; distribution:private">
<meta name="ai-license" content="allow; distribution:public; price:0.20">

Pricing Strategy Guide

Setting the right price balances fair compensation with encouraging AI innovation. Our research shows creators are successfully charging these rates:

Industry Benchmark Rates (USD per 1K tokens)

Content Type Market Rate Premium Rate Real-World Examples
News Articles $0.02 - $0.05 $0.05 - $0.10 News Corp × OpenAI deal benchmark
Blog Posts $0.04 - $0.08 $0.08 - $0.15 Professional blogs, niche expertise
Technical Docs $0.05 - $0.12 $0.12 - $0.25 API docs, tutorials, how-to guides
Research Papers $0.10 - $0.25 $0.25 - $0.50 Academic work, original research
Creative Writing $0.08 - $0.20 $0.20 - $0.40 Fiction, poetry, original expression

Stage-Based Pricing Strategy

AI Stage Multiplier Reasoning
train 3-5× base Permanent inclusion in model weights
infer 1× base Standard rate for real-time usage
embed 0.5× base Lower impact, indexing only
tune 2× base Specialized adaptation of models

Pricing Factors to Consider

  • Content quality: Exclusive, well-researched content commands higher rates
  • Freshness: Breaking news or cutting-edge research can be priced premium
  • Exclusivity: If you're the only source, you have pricing power
  • Volume: Consider offering volume discounts for bulk usage
  • Competition: Research what similar creators charge

💰 Pricing Strategy Examples

Conservative Approach

<!-- Encourage adoption with competitive rates -->
<meta name="ai-license" content="allow; distribution:private; price:0.10">

Balanced Approach

<!-- Different rates by stage and distribution -->
<meta name="ai-license" content="allow; distribution:private; price:0.10">
<meta name="ai-license-train" content="allow; distribution:private; price:1.00">
<meta name="ai-license-infer" content="allow; distribution:public; price:0.25">

Premium Approach

<!-- High-value, exclusive content -->
<meta name="ai-license-train" content="allow; distribution:private; price:0.50">
<meta name="ai-license-infer" content="deny">

Tiered Distribution Approach

<!-- Free private, paid public -->
<meta name="ai-license" content="allow; distribution:private">
<meta name="ai-license" content="allow; distribution:public; price:0.25">

🎯 Remember: You Set the Terms

These are guidelines, not rules. You own your content and can price it however you see fit. Start conservative and adjust based on demand and your content's unique value.

Verify License

POST /v1/verify-license

Check if content is licensed and get pricing information.

Parameters

Parameter Type Required Description
url string Yes URL of the content to verify
tokens integer Yes Number of tokens to be used
stage string No infer, embed, tune, or train (defaults to infer)
distribution string No private or public (defaults to public)

Example Request

{
  "url": "https://example.com/article",
  "tokens": 1500,
  "stage": "train",
  "distribution": "private"
}

Response

{
  "licensed": true,
  "rate": 0.05,
  "cost": 0.075,
  "currency": "EUR",
  "license_id": "lic_abc123",
  "creator": {
    "name": "John Smith",
    "verified": true
  },
  "terms_url": "https://copyright.sh/terms"
}

Get Content

Retrieve normalized metadata and the latest licensing directives for a specific URL. This endpoint is ideal for dashboards, preflight checks, and surfacing creator attribution alongside AI usage.

GET /v1/content

Provide a canonical URL and we return the creator, pricing, and license settings you should respect before generating outputs.

Example Request

fetch('https://api.copyright.sh/v1/content?url=' + encodeURIComponent('https://example.com/article'), {
  headers: {
    Authorization: 'Bearer sk_live_abc123'
  }
}).then(res => res.json());
{
  "url": "https://example.com/article",
  "title": "Generative AI Licensing Guide",
  "creator": {
    "name": "Jordan Lee",
    "verified": true
  },
  "license": {
    "stage": "infer",
    "distribution": "private",
    "price": 0.08
  },
  "updated_at": "2025-09-18T14:32:00Z"
}

For pagination, filtering, and expanded schema details see the full API reference.

Log Usage

POST /v1/log-usage

Record content usage and process payment.

{
  "license_id": "lic_abc123",
  "tokens_used": 1500,
  "hmac_signature": "sha256=abc123...",
  "timestamp": "2025-01-23T10:00:00Z"
}

Webhooks

Subscribe to webhook events to keep your internal systems synchronized with creator preferences and usage receipts in near real time.

Retry policy: We retry delivery up to three times with exponential backoff. Any 2xx response marks the attempt as successful.

Available Events

  • usage.logged — fired after a usage record posts to the ledger.
  • license.updated — sent when a creator updates pricing, stages, or distribution settings.
  • license.revoked — triggered if a creator withdraws access for a URL you previously verified.

Verify Signatures

Webhook payloads use the same HMAC signature scheme as the REST API. Compare the x-cs-signature header against a locally generated hash.

import hmac
import hashlib
import json

def verify_webhook(payload: dict, signature: str, secret: str) -> bool:
    message = json.dumps(payload, sort_keys=True)
    expected = hmac.new(secret.encode(), message.encode(), hashlib.sha256).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

See /docs-api.html#webhooks for payload schemas, testing utilities, and replay attack guidance.

HMAC Verification

All usage logging requires HMAC signatures for security and tamper-proofing.

Python Example

import hmac
import hashlib
import json

def create_hmac_signature(payload, secret_key):
    message = json.dumps(payload, sort_keys=True)
    signature = hmac.new(
        secret_key.encode('utf-8'),
        message.encode('utf-8'),
        hashlib.sha256
    ).hexdigest()
    return f"sha256={signature}"

# Usage
payload = {
    "license_id": "lic_abc123",
    "tokens_used": 1500,
    "timestamp": "2025-01-23T10:00:00Z"
}
signature = create_hmac_signature(payload, "your_secret_key")

Rate Limits

Rate limits protect creators and maintain reliable performance for every integration. Limits are enforced per API key.

Endpoint Limit Window Notes
POST /v1/verify-license 60 requests Per minute Burstable to 120 for short periods. Contact support if you regularly exceed this.
GET /v1/content 90 requests Per minute Responses are cached at the edge; repeated lookups reuse the same quota bucket.
POST /v1/log-usage 60 requests Per minute Use batching or idempotency keys to avoid spikes from retries.

Tip: If you need higher throughput, email support with projected volumes and we can raise the limits on a per-partner basis.

Python SDK

🚀 Coming Soon! Our Python SDK is currently in development. For now, use our REST API directly or integrate via our MCP server.

The Python SDK will provide:

  • Simple client initialization with API keys
  • License verification and validation
  • Usage logging with HMAC signatures
  • Automatic retry logic and error handling
  • Type hints and comprehensive documentation

Expected release: Q1 2025. Contact us for early access.

JavaScript SDK

🚀 Coming Soon! Our JavaScript/TypeScript SDK is currently in development. For now, use our REST API directly.

The JavaScript SDK will provide:

  • Browser and Node.js compatibility
  • TypeScript definitions included
  • Promise-based async operations
  • Automatic token counting utilities
  • Built-in caching for license lookups

Expected release: Q1 2025. Contact us for early access.

Go SDK

🚀 Coming Soon! Our Go SDK is currently in development. For now, use our REST API directly.

The Go SDK will provide:

  • Idiomatic Go interfaces
  • Concurrent request handling
  • Context support for cancellation
  • Efficient memory usage
  • gRPC support for high-performance applications

Expected release: Q2 2025. Contact us for early access.

Error Handling

The API returns standard HTTP status codes and detailed error messages.

Status Code Description
200 Success
400 Bad Request - Invalid parameters
401 Unauthorized - Invalid API key
403 Forbidden - Content not licensed
429 Rate limit exceeded
500 Internal server error

Robots.txt for AI Protection

While meta tags provide licensing terms for AI systems, you can also use robots.txt to completely block unwanted AI crawlers while preserving search engine access.

Why Use Robots.txt?

Complete Control: Robots.txt allows you to:

  • Block specific AI companies from accessing your content entirely
  • Preserve access for search engines (Google, Bing) for SEO
  • Prevent unauthorized scraping while allowing licensed access
  • Control which parts of your site are accessible to different bots

How to Implement

Step 1: Download our Pre-configured Robots.txt

Our robots.txt template blocks major AI crawlers while allowing search engines:

# Copyright.sh Robots.txt - AI Protection Template
# Block AI training bots while allowing search engines

# Allow search engines
User-agent: Googlebo
Allow: /

User-agent: Bingbo
Allow: /

User-agent: DuckDuckBo
Allow: /

# Block OpenAI
User-agent: GPTBo
Disallow: /

User-agent: ChatGPT-User
Disallow: /

# Block Anthropic
User-agent: anthropic-ai
Disallow: /

User-agent: Claude-Web
Disallow: /

# Block Common Crawl (used by many AI companies)
User-agent: CCBo
Disallow: /

# Block other AI/ML bots
User-agent: PerplexityBo
Disallow: /

User-agent: YouBo
Disallow: /

User-agent: Bytespider
Disallow: /

# Default: Allow all other bots (customize as needed)
User-agent: *
Allow: /

# Sitemap location (optional)
Sitemap: https://yoursite.com/sitemap.xml

Step 2: Upload to Your Website Root

The robots.txt file must be placed at the root of your domain:

# File location
https://yoursite.com/robots.tx

# For WordPress
Upload to your WordPress root directory via FTP or file manager

# For static sites
Place in the root of your public HTML folder

# For Next.js/Reac
Place in the /public directory

Step 3: Verify It's Working

Test your robots.txt implementation:

# Check if accessible
curl https://yoursite.com/robots.tx

# Test with Google Search Console
Use the robots.txt Tester tool in Google Search Console

# Monitor your server logs
Look for bot user agents respecting your rules

Combining with AI Licensing

Best Practice: Use both approaches for maximum protection:

  1. Meta Tags: Define licensing terms for AI companies that respect them
  2. Robots.txt: Block unauthorized scrapers and bad actors
  3. Legal Terms: Include AI licensing in your Terms of Service

This multi-layered approach ensures both technical and legal protection for your content.

Advanced Configuration

Selective Path Blocking

Allow AI access to some content while protecting sensitive areas:

# Allow AI to access your about page
User-agent: GPTBo
Allow: /abou
Allow: /contac
Disallow: /blog/
Disallow: /premium-content/

# Block specific file types
User-agent: CCBo
Disallow: /*.pdf$
Disallow: /*.doc$
Disallow: /downloads/

Rate Limiting with Crawl-delay

Slow down aggressive crawlers:

# Limit crawl rate (in seconds between requests)
User-agent: *
Crawl-delay: 10

# Note: Not all bots respect crawl-delay
# Consider using server-side rate limiting for enforcement

Known AI Bot User Agents

Company User Agent(s) Purpose
OpenAI GPTBot, ChatGPT-User Training & Web browsing
Anthropic anthropic-ai, Claude-Web Training & Web access
Google Google-Extended Bard/Gemini training
Common Crawl CCBot Dataset collection
Perplexity PerplexityBot Search & answers
You.com YouBot Search & AI chat
ByteDance Bytespider TikTok AI training

Important: Robots.txt is a public file. Anyone can view your robots.txt to see which bots you're blocking. Don't include sensitive paths or information in your robots.txt file.

Monitoring & Enforcement

While robots.txt is a widely respected standard, it's not legally binding. For complete protection:

  • Monitor your server logs for non-compliant bots
  • Implement server-side blocking for persistent violators
  • Use CloudFlare or similar services for advanced bot management
  • Include legal terms prohibiting unauthorized scraping in your ToS
  • Consider implementing rate limiting and CAPTCHAs for suspicious traffic

Need Help? Download our pre-configured robots.txt from your Copyright.sh dashboard, or contact support for assistance with custom configurations.