Data Sources

Each Archil disk is configured to synchronize data from one or more data sources. When you create a new Archil disk, you configure which data sources you want to attach to your disk as well as the location on disk where the data source should be available. For example, you may have an Amazon S3 bucket labeled “model-data”, which you want to make available in the “/models” directory of your Archil disk. When you create a new Archil disk, you can simply “Add a Data Source”, select “Amazon S3”, and input “model-data” as the bucket name and “/models” as the disk path. You can now access any data from the “model-data” bucket at “/models” in your Archil disk, and any changes to that data on-disk will be synchronized back to the Amazon S3 bucket. When you connect to non-public data sources, you’ll need to configure credentials which allow Archil to access the data source. Archil disks require permissions to read and list objects (for read-only access), and write and delete objects (for read-write access).

Supported data sources

Archil supports a wide-variety of data sources out of the box, including:

Amazon S3

Amazon S3 supports authorization through either configuring the bucket IAM Resource Policy or by using static AWS credentials from an IAM user which can access the bucket.

Show Configuration Instructions

Login to your AWS S3 console
Browse to your specified bucket
Update the bucket permissions policy to allow Archil access to the bucket. The console will provide the exact policy to add. It will look something like this:

{
    "Version": "2012-10-17", 
    "Statement": [
        {
            "Sid": "AllowArchilAccess",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::123456789:role/archil-s3.prod.us-east-1"
            },
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::YOUR-BUCKET-NAME",
                "arn:aws:s3:::YOUR-BUCKET-NAME/*"
            ]
        }
    ]
}

Visit Provider

Google Cloud Storage

Google Cloud Storage supports authorization through static AWS-compatible HMAC credentials.

Show Credential Configuration Instructions

Login to your Google Cloud Storage console
Click Settings
Click Interoperability
Under Service account HMAC, click Create a key for another service account
Grant the Cloud Storage - Storage Object Admin role to the new service account
Record the Access key and Secret provided for the new service account HMAC key

Visit Provider

Cloudflare R2

Cloudflare R2 supports authorization through the provided, static AWS credentials.

Show Credential Configuration Instructions

Login to the Cloudflare console
Browse to R2
Click Manage R2 API Tokens
Create a new token with Object Read & Write permissions
Retrieve an Access Key ID, Secret Access Key, and the default endpoint.

Visit Provider

Generic S3 Compatible storage

Many other providers and clouds provide storage that is API-compatible with S3. Archil supports using these services as data sources, if you provide the API endpoint and static AWS credentials required to access the data.

Synchronization details

Archil is designed to provide eventually consistent, high-performance synchronization to data sources. Archil automatically attempts to combine multiple writes into a single PutObject call to the data source. After an application stops writing to a file, it can take 10s of seconds for the data source to reflect the new data while Archil combines the writes. Because Archil automatically caches active data from S3 onto high-performance SSD storage, it can take minutes for a newly written object to appear in the corresponding directory on disk.

Concurrent writes

While a data source is configured as part of an Archil disk, Archil supports concurrent writes to the disk and the underlying data source. However, these writes must occur to different paths in the data source than those which are being actively written to from the disk. If you attempt to write to same file or directory which is being actively changed from the disk, it may result in undefined behavior including data loss or corruption.

Atomicity

Some data sources, such as Amazon S3, do not have atomic operations that mirror each of the possible POSIX file operations (for example, directory renames). When executing these kinds of operations against an Archil disk, you may notice partial results for the operations when reading directly from the underlying data source. Reads from the Archil disk itself are always strongly consistent.

Getting started

Concepts

Details

Reference

Supported data sources

Amazon S3

Google Cloud Storage

Cloudflare R2

Generic S3 Compatible storage

Synchronization details

Concurrent writes

Atomicity

Getting started

Concepts

Details

Reference

​Supported data sources

Amazon S3

Google Cloud Storage

Cloudflare R2

Generic S3 Compatible storage

​Synchronization details

​Concurrent writes

​Atomicity

Supported data sources

Synchronization details

Concurrent writes

Atomicity