> ## Documentation Index
> Fetch the complete documentation index at: https://docs.archil.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Sources

> Archil disks automatically synchronize data from one or more data sources

An Archil disk can optionally synchronize data with one or more data sources. **Data sources are not
required** — if you create a disk without one, Archil automatically tiers its data to storage we manage
for you, and you get a fully elastic file system with no bucket to set up. Add a data source only when
you want your disk to read from and write back to object storage you already own.

When you do attach a data source, you configure which data sources to attach as well as the location on
disk where each data source should be available. For example, you may have an Amazon S3 bucket labeled
`model-data`, which you want to make available in the `/models` directory of your Archil disk. When you
create a new Archil disk, you can simply "Add a Data Source", select "Amazon S3", and input `model-data`
as the bucket name and `/models` as the disk path. You can now access any data from the `model-data`
bucket at `/models` in your Archil disk, and any changes to that data on-disk will be synchronized back
to the Amazon S3 bucket.

<Note>
  Disks that support [branches and checkpoints](/concepts/branches-and-checkpoints) do not synchronize to a data
  source — a disk can have data sources or support branches and checkpoints, but not both.
</Note>

When you connect to non-public data sources, you'll need to configure credentials which allow Archil to
access the data source. Archil disks require permissions to read and list objects (for read-only access),
and write and delete objects (for read-write access).

## Supported data sources

Archil supports a wide variety of data sources out of the box. Each data source has specific configuration requirements and credential setup procedures.

For detailed configuration instructions for each supported data source, see the dedicated pages:

* [**S3 Object Storage**](/data-sources/s3-object-storage) - Configure Amazon S3, Google Cloud Storage, Cloudflare R2, and other S3-compatible providers

## Synchronization details

Archil is designed to provide eventually consistent, high-performance synchronization between the disk and data sources. Archil is always
strongly read-after-write consistent for clients connected directly to the Archil disk, [read more here](/details/consistency).

Archil automatically attempts to combine multiple writes into a single PutObject call to the data source. After an application
stops writing to a file, it can take 10s of seconds for the data source to reflect the new data while Archil combines the writes.
There is no hard upper bound on this lag for a disk that mirrors its data source in native format — it can grow with workload
complexity, and directory renames in particular generate a large amount of background work as the underlying objects are rewritten.

Writes committed to the disk are not at risk during this lag. The Archil disk durably stores all committed writes redundantly across
multiple Availability Zones, so the failure of an individual node — or an entire Availability Zone — does not lose data that
has not yet reached the data source. See [Data durability](/details/architecture#data-durability) for details.

Synchronization is bidirectional. Because Archil automatically caches active data from S3 onto high-performance SSD storage, an
object written directly to the bucket can take minutes to appear in the corresponding directory on disk.

## Concurrent writes

While a data source is configured as part of an Archil disk, Archil supports concurrent writes to the disk and
the underlying data source. However, these writes must occur to different paths in the data source than those
which are being actively written to from the disk. If you attempt to write to the same file or directory which is
being actively changed from the disk, it may result in undefined behavior including data loss or corruption.

## Atomicity

Some data sources, such as Amazon S3, do not have atomic operations that mirror each of the possible POSIX file operations (for example,
directory renames). When executing these kinds of operations against an Archil disk, you may notice partial results for the operations
when reading directly from the underlying data source. Reads from the Archil disk itself are always strongly consistent.

## If the data source is unreachable

If the configured data source becomes unreachable while a disk is mounted, writes are not lost: Archil buffers them
durably in the disk's high-speed cache and flushes them to the data source once it is reachable again. This buffering
can continue indefinitely. Reads served from cache are unaffected; a read that misses the cache and requires the data
source blocks until it can be satisfied.

## POSIX metadata in the data source

When Archil synchronizes a disk to a data source, file contents are written as ordinary objects at the matching key.
POSIX metadata is represented as follows:

* **Mode bits, ownership, and extended attributes** are preserved by Archil, but are kept in Archil's internal system
  rather than written as object metadata — they are not visible by reading the bucket directly.
* **Symbolic links** are stored as a small text object containing the path the link points to.
* **Hard links** are stored as an object containing a full copy of the linked data.

## Deleting a disk

Deleting an Archil disk does **not** delete the data in its data source. An Archil disk is a synchronized
view over the bucket you own — when you delete the disk, Archil stops synchronizing and discards its cached
copy, but the objects already written to your data source remain in your bucket, under your control.
