Changelog¶
All notable changes to EGAfetch are documented here.
v1.1.0 (2026-02-16)¶
New Features¶
- Bandwidth throttling -- Global rate limit across all connections with
--max-bandwidth(e.g.,100M,1G). Usesgolang.org/x/time/rate.Limitershared across all goroutines. - Adaptive chunk sizing --
--adaptive-chunksmonitors throughput over a rolling window and auto-adjusts chunk sizes (8 MB -- 256 MB) based on connection speed. - File filtering --
--include/--excludeglob patterns to selectively download files from a dataset (e.g.,--include "*.bam"). - Persistent configuration --
~/.egafetch/config.yamlfor default settings (chunk size, parallelism, bandwidth, output dir, metadata format). CLI flags override config values. - Identifier file input -- Pass a text file with one EGAD/EGAF identifier per line instead of listing IDs on the command line. Blank lines and
#comments are supported. - MD5 sidecar files --
.md5checksum file in standardmd5sumformat written alongside each downloaded file after verification. - Automatic metadata download -- Dataset metadata (TSV/CSV/JSON + PEP) is fetched automatically after data download when using
--cf. Control with--no-metadataand--metadata-format. - Dataset details in
list--egafetch listnow shows dataset title, description, and number of samples from the public metadata API.
Bug Fixes¶
- Checksum verification --
GetChecksum()no longer falls back to the encrypted file's checksum (Checksumfield). OnlyPlainChecksumandUnencryptedChecksumare used, preventing false mismatches when downloading in plain mode. .cipextension stripping -- Output filenames now have the.cipextension stripped (e.g.,sample.bam.cipbecomessample.bam), matching pyEGA3 behavior.- Resume safety (HTTP 200 vs 206) -- When resuming a partial chunk, if the server returns HTTP 200 (ignoring the Range header) instead of 206 Partial Content, the existing
.partfile is now truncated instead of appended to, preventing data corruption.
Other Changes¶
- Added
golang.org/x/timeandgopkg.in/yaml.v3dependencies. - New
internal/configpackage for YAML configuration. - Adaptive chunk sizing logic in
internal/download/file.gowith batch dispatch and rechunking. - Updated documentation across all pages.
v1.0.0¶
Initial Release¶
- Two-level parallelism -- Configurable parallel files (default 4) and parallel chunks per file (default 8) for maximum throughput.
- Chunked downloads -- Files split into configurable chunks (default 64 MB) downloaded via HTTP Range requests.
- Byte-precise resume -- Interrupted downloads resume from the exact byte using persisted chunk state and append-mode writes.
- Atomic state persistence -- All state files written via temp file + fsync + rename to prevent corruption on crash.
- Automatic token refresh -- OAuth2 tokens refreshed 5 minutes before expiry using the refresh token.
- Checksum verification -- MD5/SHA256 verification after merge, with standalone
verifycommand. - Exponential backoff with jitter -- Up to 5 retries per chunk (1s base, 60s max) and 3 retries per file.
- Graceful shutdown -- SIGINT/SIGTERM saves state and preserves partial chunks for resume.
- Subcommands --
auth login/status/logout,download,list,info,metadata,status,verify,clean. - pyEGA3-compatible credentials -- Same JSON format (
{"username": "...", "password": "..."}). - Single static binary -- No runtime dependencies; cross-compiled for Linux, macOS, and Windows.