Search Files

disk.grep searches the files on an Archil disk for lines matching a regular expression. Local tools like grep and ripgrep are fast, but they can only use the resources of a single machine — one host’s CPU, memory, and disk throughput — and they need the data on that machine to begin with. disk.grep runs next to your data and fans the listing and matching out across many ephemeral containers at once, so a search scales across many machines instead of being capped by one. Search is available today for disks in our AWS regions (aws-us-east-1, aws-us-west-2, aws-eu-west-1).

import * as archil from "disk";

const d = await archil.getDisk("dsk-abc123");

const { matches, stoppedReason, durationMs } = await d.grep({
  directory: "logs",
  pattern: "ERROR|FATAL",
  recursive: true,
});

for (const m of matches) {
  console.log(`${m.file}:${m.line}: ${m.text}`);
}
console.log(`${matches.length} matches in ${durationMs}ms (${stoppedReason})`);

import archil

d = archil.get_disk("dsk-abc123")

res = d.grep(directory="logs", pattern="ERROR|FATAL", recursive=True)

for m in res.matches:
    print(f"{m.file}:{m.line}: {m.text}")
print(f"{len(res.matches)} matches in {res.duration_ms}ms ({res.stopped_reason})")

disk.grep is the specialized primitive for searching. If you need to run an arbitrary command — transform files, run a script, build something — rather than match a pattern, use serverless execution (disk.exec) instead.

This is regex line-matching (grep -E), not indexed full-text search. There is no ranking, stemming, or relevance score — a line either matches the pattern or it doesn’t.

How it works

When you call disk.grep:

List. Worker containers page through the directory’s contents. With recursive: true, subdirectories are walked breadth-first.
Match. As soon as files are listed, grep workers start consuming them and matching your pattern — listing and matching overlap, so the pipeline produces results before listing finishes.
Aggregate. An aggregator collects matches from all workers and stops the pipeline once the search completes, hits maxResults, or hits the deadline.

Three knobs trade cost off against latency:

Knob	Controls
`maxDurationSeconds`	How long the search runs before it stops and returns whatever it has found so far.
`concurrency`	How many grep workers run in parallel. More workers scan a large dataset faster — at proportionally more compute.
`maxResults`	Stops the search once this many matches have been collected.

See the API reference for defaults and limits.

Completeness

A search doesn’t always scan every file — it can stop early once it has collected maxResults matches or reached the maxDurationSeconds deadline. So a result isn’t necessarily exhaustive: check the stoppedReason on every response to know whether the search scanned everything or returned a partial answer. This matters because when a search stops early, the matches it returns are a sample of whichever workers reported first — not the lexicographically first N. If you need deterministic, complete results, give the search enough budget that it scans the whole directory before returning. See the TypeScript SDK, the Python SDK, or the API reference for the individual stoppedReason values.

Cost

Search runs on the same container runtime as disk.exec and bills the same way: on computeSecondsUsed, the summed execution time across every container it dispatched (1ms increments, 100ms minimum per container). A higher concurrency finishes faster but dispatches more containers, so wall-clock latency and compute cost trade off against each other. See the pricing page for current rates.

Next steps

TypeScript SDK: searching files — the disk.grep API surface
Python SDK: searching files — the Disk.grep API surface
API reference: Search Files — the raw HTTP endpoint
Serverless execution — run arbitrary commands when you need more than search

Getting started

Mounting

Compute

Concepts

Data Sources

Details

Administration

Protocols

SDKs

Integrations

Reference

Legal

How it works

Completeness

Cost

Next steps

​How it works

​Completeness

​Cost

​Next steps

How it works

Completeness

Cost

Next steps