Migrating from pyEGA3¶
EGAfetch is a drop-in replacement for pyEGA3 with a similar interface. This guide maps pyEGA3 commands to their EGAfetch equivalents.
Command Mapping¶
Authentication¶
Download a Dataset¶
Download Specific Files¶
List Dataset Files¶
Config File¶
The credentials file format is identical:
The only difference is the flag name: pyEGA3 uses -cf, EGAfetch uses --cf (double dash).
Key Differences¶
| Feature | pyEGA3 | EGAfetch |
|---|---|---|
| Parallelism | Sequential (1 file, 1 stream) | Configurable (4 files x 8 chunks default) |
| Resume | Re-downloads from scratch | Byte-precise resume with HTTP Range |
| Token refresh | Fails when token expires | Automatic refresh before expiry |
| Progress | Basic text output | Live progress bars per file |
| Interruption | May corrupt state | Safe at any point (atomic state writes) |
| Metadata | Not available | egafetch metadata exports TSV/CSV/JSON |
| Metadata auto-download | Not available | Auto-fetched after dataset download (with --cf) |
| Bandwidth throttling | Not available | --max-bandwidth global limit |
| File filtering | Not available | --include / --exclude glob patterns |
| Adaptive chunk sizing | Not available | --adaptive-chunks auto-tunes based on throughput |
| Persistent config | Not available | ~/.egafetch/config.yaml for default settings |
| Installation | Python + pip | Single binary, zero dependencies |
| Batch file input | Not available | Text file with identifiers (one per line) |
| MD5 sidecar files | Not available | .md5 file written alongside each download |
.cip stripping |
Strips .cip extension |
Strips .cip extension (same behavior) |
| Checksum | After download | After download (same, but automatic) |
New Features in EGAfetch¶
Features not available in pyEGA3:
egafetch metadata-- Export dataset metadata as TSV, CSV, or JSON with a merged master fileegafetch status-- Check download progress without re-running the downloadegafetch verify-- Re-verify checksums at any timeegafetch clean-- Remove temporary files while keeping completed downloads--restart-- Force a fresh download, discarding all progress--parallel-files/--parallel-chunks/--chunk-size-- Fine-grained control over download parallelism--max-bandwidth-- Global bandwidth throttling (e.g.,100M,1G)--include/--exclude-- Glob-based file filtering (e.g.,--include "*.bam")--adaptive-chunks-- Auto-adjust chunk sizes based on measured throughput--no-metadata/--metadata-format-- Control automatic metadata download during dataset downloads~/.egafetch/config.yaml-- Persistent config file for default settings (chunk size, parallelism, bandwidth, output dir)- Identifier files -- Pass a text file with one EGAD/EGAF per line instead of listing IDs on the command line
- MD5 sidecar files --
.md5checksum file written alongside each downloaded file for easy verification