> ## Documentation Index
> Fetch the complete documentation index at: https://docs.archil.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Search Files

> Search files on a disk with a regular expression, parallelized across many machines.

`disk.grep` searches the files on an Archil disk for lines matching a regular expression. Local tools like `grep` and `ripgrep` are fast, but they can only use the resources of a single machine — one host's CPU, memory, and disk throughput — and they need the data on that machine to begin with. `disk.grep` runs next to your data and fans the listing and matching out across many ephemeral containers at once, so a search scales across many machines instead of being capped by one.

Search is available today for disks in our AWS regions (`aws-us-east-1`, `aws-us-west-2`, `aws-eu-west-1`).

<CodeGroup>
  ```typescript TypeScript theme={null}
  import * as archil from "disk";

  const d = await archil.getDisk("dsk-abc123");

  const { matches, stoppedReason, durationMs } = await d.grep({
    directory: "logs",
    pattern: "ERROR|FATAL",
    recursive: true,
  });

  for (const m of matches) {
    console.log(`${m.file}:${m.line}: ${m.text}`);
  }
  console.log(`${matches.length} matches in ${durationMs}ms (${stoppedReason})`);
  ```

  ```python Python theme={null}
  import archil

  d = archil.get_disk("dsk-abc123")

  res = d.grep(directory="logs", pattern="ERROR|FATAL", recursive=True)

  for m in res.matches:
      print(f"{m.file}:{m.line}: {m.text}")
  print(f"{len(res.matches)} matches in {res.duration_ms}ms ({res.stopped_reason})")
  ```
</CodeGroup>

`disk.grep` is the specialized primitive for searching. If you need to run an arbitrary command — transform files, run a script, build something — rather than match a pattern, use [serverless execution](/compute/serverless-execution) (`disk.exec`) instead.

<Note>This is regex line-matching (`grep -E`), not indexed full-text search. There is no ranking, stemming, or relevance score — a line either matches the pattern or it doesn't.</Note>

## How it works

When you call `disk.grep`:

1. **List.** Worker containers page through the directory's contents. With `recursive: true`, subdirectories are walked breadth-first.
2. **Match.** As soon as files are listed, grep workers start consuming them and matching your pattern — listing and matching overlap, so the pipeline produces results before listing finishes.
3. **Aggregate.** An aggregator collects matches from all workers and stops the pipeline once the search completes, hits `maxResults`, or hits the deadline.

Three knobs trade cost off against latency:

| Knob                 | Controls                                                                                                          |
| -------------------- | ----------------------------------------------------------------------------------------------------------------- |
| `maxDurationSeconds` | How long the search runs before it stops and returns whatever it has found so far.                                |
| `concurrency`        | How many grep workers run in parallel. More workers scan a large dataset faster — at proportionally more compute. |
| `maxResults`         | Stops the search once this many matches have been collected.                                                      |

See the [API reference](/api-reference/serverless-execution/grep-disk) for defaults and limits.

## Completeness

A search doesn't always scan every file — it can stop early once it has collected `maxResults` matches or reached the `maxDurationSeconds` deadline. So a result isn't necessarily exhaustive: check the **`stoppedReason`** on every response to know whether the search scanned everything or returned a partial answer.

This matters because when a search stops early, the matches it returns are **a sample of whichever workers reported first — not the lexicographically first N**. If you need deterministic, complete results, give the search enough budget that it scans the whole directory before returning. See the [TypeScript SDK](/sdks/typescript#searching-files), the [Python SDK](/sdks/python#searching-files), or the [API reference](/api-reference/serverless-execution/grep-disk) for the individual `stoppedReason` values.

## Cost

Search runs on the same container runtime as `disk.exec` and bills the same way: on `computeSecondsUsed`, the summed execution time across every container it dispatched (1ms increments, 100ms minimum per container). A higher `concurrency` finishes faster but dispatches more containers, so wall-clock latency and compute cost trade off against each other. See the [pricing page](https://archil.com/pricing) for current rates.

## Next steps

* [TypeScript SDK: searching files](/sdks/typescript#searching-files) — the `disk.grep` API surface
* [Python SDK: searching files](/sdks/python#searching-files) — the `Disk.grep` API surface
* [API reference: Search Files](/api-reference/serverless-execution/grep-disk) — the raw HTTP endpoint
* [Serverless execution](/compute/serverless-execution) — run arbitrary commands when you need more than search
