disk.grep searches the files on an Archil disk for lines matching a regular expression. Local tools like grep and ripgrep are fast, but they can only use the resources of a single machine — one host’s CPU, memory, and disk throughput — and they need the data on that machine to begin with. disk.grep runs next to your data and fans the listing and matching out across many ephemeral containers at once, so a search scales across many machines instead of being capped by one.
Search is available today for disks in our AWS regions (aws-us-east-1, aws-us-west-2, aws-eu-west-1).
disk.grep is the specialized primitive for searching. If you need to run an arbitrary command — transform files, run a script, build something — rather than match a pattern, use serverless execution (disk.exec) instead.
This is regex line-matching (
grep -E), not indexed full-text search. There is no ranking, stemming, or relevance score — a line either matches the pattern or it doesn’t.How it works
When you calldisk.grep:
- List. Worker containers page through the directory’s contents. With
recursive: true, subdirectories are walked breadth-first. - Match. As soon as files are listed, grep workers start consuming them and matching your pattern — listing and matching overlap, so the pipeline produces results before listing finishes.
- Aggregate. An aggregator collects matches from all workers and stops the pipeline once the search completes, hits
maxResults, or hits the deadline.
| Knob | Controls |
|---|---|
maxDurationSeconds | How long the search runs before it stops and returns whatever it has found so far. |
concurrency | How many grep workers run in parallel. More workers scan a large dataset faster — at proportionally more compute. |
maxResults | Stops the search once this many matches have been collected. |
Completeness
A search doesn’t always scan every file — it can stop early once it has collectedmaxResults matches or reached the maxDurationSeconds deadline. So a result isn’t necessarily exhaustive: check the stoppedReason on every response to know whether the search scanned everything or returned a partial answer.
This matters because when a search stops early, the matches it returns are a sample of whichever workers reported first — not the lexicographically first N. If you need deterministic, complete results, give the search enough budget that it scans the whole directory before returning. See the TypeScript SDK or the API reference for the individual stoppedReason values.
Cost
Search runs on the same container runtime asdisk.exec and bills the same way: on computeSecondsUsed, the summed execution time across every container it dispatched (1ms increments, 100ms minimum per container). A higher concurrency finishes faster but dispatches more containers, so wall-clock latency and compute cost trade off against each other. See the pricing page for current rates.
Next steps
- TypeScript SDK: searching files — the
disk.grepAPI surface - API reference: Search Files — the raw HTTP endpoint
- Serverless execution — run arbitrary commands when you need more than search