User Guide¶
Complete guide to using Clean Docs CLI for documentation maintenance.
Table of Contents¶
- Getting Started
- Core Commands
- Configuration
- Link Checking
- Fixing Links
- CI/CD Integration
- Advanced Features
- Troubleshooting
Getting Started¶
Installation¶
Choose your installation method:
# Via pip (recommended)
pip install clean-docs # Core features
pip install 'clean-docs[snippets]' # + Code snippet validation
pip install 'clean-docs[semantic]' # + AI-powered analysis
pip install 'clean-docs[snippets,semantic]' # All features
# Or via curl installer
curl -fsSL https://raw.githubusercontent.com/Algiras/clean-docs/main/install.sh | bash
Verify Installation¶
Core Commands¶
1. Doctor Command¶
Check your system setup:
What it checks: - Python version (>= 3.10 required) - GitHub CLI installation & authentication - GITHUB_TOKEN environment variable - Cache directory permissions - Optional: Semantic analysis dependencies - Optional: Copilot CLI availability
2. Scan Command¶
Find broken links in your documentation:
# Scan current directory
clean-docs scan .
# Scan specific directory
clean-docs scan ./docs
# Scan single file
clean-docs scan ./README.md
# Verbose mode (shows all links)
clean-docs scan ./docs --verbose
# JSON output for automation
clean-docs scan ./docs --format json
# Use custom config
clean-docs scan ./docs --config ./my-config.yaml
3. Cache Command¶
Manage the link status cache:
Configuration¶
Create .clean-docs.yaml in your project root:
# Basic configuration
links:
timeout: 10 # HTTP timeout in seconds
concurrency: 20 # Parallel checks
ignore_patterns: # URLs to skip
- "localhost"
- "127.0.0.1"
- "example.com"
- "*.local"
- "file://"
cache:
ttl_hours: 24 # Cache expiration
max_size_mb: 100 # Max cache size
# dir: ~/.cache/clean-docs # Custom cache location
output:
show_progress: true # Show progress bars
colors: auto # auto/always/never
Generate Default Config¶
This creates .clean-docs.yaml with sensible defaults.
Link Checking¶
Supported Link Types¶
| Link Type | Example | Validation |
|---|---|---|
| Internal (relative) | ./file.md | File exists? |
| Internal (absolute) | /docs/file.md | File exists? |
| External (HTTP) | https://example.com | HTTP 200? |
| GitHub | github.com/user/repo | Repo/branch/file exists? |
| Anchor | #section-name | Heading exists? |
| Reference | [text][ref] | Reference defined? |
Checking GitHub Links¶
Clean Docs can verify GitHub repository links:
-
Using
ghCLI (recommended): -
Using GITHUB_TOKEN:
Performance Tips¶
- First run: Takes longer (no cache)
- Subsequent runs: Fast (cached results)
- Cache location: System temp directory
- Cache TTL: 24 hours by default
Fixing Links¶
Dry Run (Preview)¶
See what would be fixed without making changes:
Interactive Mode¶
Review and approve each fix:
You'll be prompted to confirm each iteration of fixes.
Auto-Fix Mode¶
Apply all auto-fixable changes without prompting:
What Gets Fixed¶
- ✅ Missing
.mdextensions (./file→./file.md) - ✅ Anchor typos (suggestions like
#sectin→#section) - ✅ Case sensitivity issues (
File.md→file.md)
What Needs Manual Review¶
- ❌ External broken URLs
- ❌ Missing files without clear suggestions
- ❌ Non-existent GitHub repos
CI/CD Integration¶
GitHub Actions¶
name: Documentation Check
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install Clean Docs
run: pip install clean-docs
- name: Check documentation
run: clean-docs scan . --format json
continue-on-error: true
- name: Comment PR with results
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const results = JSON.parse(fs.readFileSync('link-check-results.json', 'utf8'));
let body = '## 🔗 Link Check Results\n\n';
body += `**Status:** ${results.summary.broken_links === 0 ? '✅ All good' : '❌ Issues found'}\n\n`;
body += `- Files: ${results.summary.files_checked}\n`;
body += `- Links: ${results.summary.total_links}\n`;
body += `- Broken: ${results.summary.broken_links}\n`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: body
});
GitLab CI¶
# .gitlab-ci.yml
docs-check:
image: python:3.12
script:
- pip install clean-docs
- clean-docs scan . --format json > results.json
artifacts:
reports:
junit: results.xml
allow_failure: true
Exit Codes¶
0: All checks passed1: Broken links or errors found
Use for CI gates:
# Strict mode (fail CI on broken links)
clean-docs scan . || exit 1
# Lenient mode (report but don't fail)
clean-docs scan . || true
Advanced Features¶
Semantic Analysis (Optional)¶
When installed with pip install clean-docs[semantic]:
# Find docs without related code
clean-docs semantic --orphaned ./docs
# Find code without documentation
clean-docs semantic --missing-docs ./src
# Suggest related code
clean-docs semantic --suggest ./docs/api.md ./src
Use case: Identify orphaned documentation or code that needs documentation.
Working with Monorepos¶
# Scan specific packages
for pkg in packages/*/; do
echo "Checking $pkg..."
clean-docs scan "$pkg/docs" --config "$pkg/.clean-docs.yaml" || true
done
Pre-commit Hook¶
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: clean-docs
name: Check documentation links
entry: clean-docs scan .
language: system
pass_filenames: false
always_run: true
Troubleshooting¶
Common Issues¶
"clean-docs: command not found"
# Add to PATH (see install.sh output)
export PATH="$HOME/.local/bin:$PATH"
# Or reinstall with proper permissions
pip install --force-reinstall clean-docs
"Cannot connect to host" errors
"Rate limit exceeded" for GitHub - Authenticate with gh auth login - Or set GITHUB_TOKEN environment variable
Cache issues
# Clear cache
clean-docs cache --clear
# Or use custom location
clean-docs scan . --config ./no-cache-config.yaml
Debug Mode¶
# Verbose output
clean-docs scan . --verbose
# Check specific file
clean-docs scan ./problematic.md --verbose
# Dry run to see what's happening
clean-docs scan . --fix --dry-run --verbose
Getting Help¶
# General help
clean-docs --help
# Command-specific help
clean-docs scan --help
clean-docs doctor --help
clean-docs cache --help
Best Practices¶
- Run regularly: Add to CI/CD pipeline
- Fix incrementally: Use
--fixin interactive mode first - Cache wisely: Default cache location is fine for most use cases
- Ignore wisely: Use
ignore_patternsfor known exceptions - Version control: Commit
.clean-docs.yamlto share settings
Examples¶
Example 1: Fix all internal links¶
# Preview
clean-docs scan ./docs --fix --dry-run
# Apply fixes
clean-docs scan ./docs --fix --yes
# Verify
clean-docs scan ./docs
Example 2: Check before release¶
# Full check
clean-docs doctor
clean-docs scan . --verbose
# Export results
clean-docs scan . --format json > link-report.json
Example 3: Daily cron job¶
#!/bin/bash
# daily-check.sh
cd /path/to/docs
if ! clean-docs scan . --format json > /tmp/daily-check.json; then
echo "Broken links detected!"
cat /tmp/daily-check.json | jq '.summary'
exit 1
fi
Next: Check the CI/CD Integration guide for automation setup.