Diff Mode (Incremental Exports)
GitHub Extractor CLI v0.4.0 introduced revolutionary diff mode (also called incremental exports), which dramatically reduces API calls and export times by exporting only new or updated items since the last run.
Why Diff Mode?
When regularly exporting repository data, you often only need the new or updated items rather than re-exporting everything. Diff mode solves this by:
- Reducing API calls by 80-95% on subsequent exports
- Making exports 10x faster for large repositories
- Minimizing impact on GitHub rate limits
- Maintaining persistent state for intelligent tracking
How It Works
Diff mode automatically tracks the last export timestamp for each repository and export type. On subsequent runs with --diff, it will:
- ✅ Only fetch items updated since the last export
- ✅ Skip unchanged items
- ✅ Maintain incremental state in
~/.ghextractor/state/exports.json
Implementation Details
- Pull Requests & Issues: Filters by
updatedAtdate - Commits: Uses GitHub API's
sinceparameter for optimal performance - Branches: Filters by last commit date
- Releases: Filters by
publishedAtdate
Quick Start
# First run: Full export (creates baseline state)
ghextractor --diff
# Second run: Only exports new/updated items since last run
ghextractor --diffCommand Line Options
| Option | Alias | Description |
|---|---|---|
--diff | --incremental | Enable incremental export mode |
--force-full | Force full export even if previous state exists |
Examples
Basic Diff Export
ghextractor --diffForce Full Export
Even with diff mode enabled, you can force a full export:
ghextractor --diff --force-fullCombine with Other Options
# Diff mode with custom output and both formats
ghextractor --diff --format both --output ./exports --verbose
# Diff mode with date filtering
ghextractor --diff --since 2024-01-01
# Diff mode with label filtering
ghextractor --diff --labels bug,enhancementState Management
Diff mode maintains state in ~/.ghextractor/state/exports.json with information about:
- Last export timestamp for each repository
- Last export timestamp for each export type
- Successful completion status
State File Structure
{
"owner/repo": {
"prs": "2025-11-21T10:30:00Z",
"issues": "2025-11-20T15:45:00Z",
"commits": "2025-11-19T09:15:00Z"
}
}Manual State Management
You can manually edit or reset the state file if needed:
# View current state
cat ~/.ghextractor/state/exports.json
# Reset state for a specific repository (forces full export next time)
# Edit the file to remove or modify timestampsBatch Processing Integration
Diff mode works seamlessly with batch processing:
# Batch export with incremental mode
ghextractor --batch-repos "repo1,repo2,repo3" --diff
# In JSON configuration
{
"repositories": ["repo1", "repo2"],
"diffMode": true
}Best Practices
- First Run: Always run with diff mode enabled for the first export to establish baseline state
- Regular Updates: Use diff mode for regular updates to minimize API usage
- Periodic Full Exports: Occasionally run with
--force-fullto ensure data completeness - State Backup: Consider backing up your state file if you rely heavily on incremental exports
- Error Recovery: If an export fails, the state won't be updated, so the next diff export will include the missed items
Troubleshooting
State Conflicts
If you encounter issues with incremental exports:
# Force a full export to reset state
ghextractor --diff --force-fullMissing Items
If you suspect items are missing from incremental exports:
- Check the state file timestamps
- Verify the item's update timestamp is after the last export
- Run with
--verboseto see detailed filtering information
Performance Issues
For very large repositories:
- Use
--verboseto monitor API usage - Consider combining with date filters (
--since,--until) - Monitor GitHub rate limits during export