Logging

Configure your applications with robust logging using Archil disks that automatically flow to Amazon S3 storage. Simply mount an Archil disk and point your regular application loggers to it - no S3 APIs required.

Transparent S3 Integration

Write to normal filesystem paths and logs automatically sync to S3

Instance Isolation

Each instance creates its own directory for proper ownership separation

Infinite Capacity

Never run out of capacity because your logs are going to S3

Standard Logging Libraries

Use your existing logging setup - Python logging, winston, etc.

Create and mount an Archil disk

First, create an Archil disk for your logging infrastructure. Follow the Quickstart Guide to create a disk, then mount it in shared mode to allow multiple instances to write logs simultaneously.

Mount the logging disk

# Create the mount directory
sudo mkdir -p /mnt/logs

# Mount the Archil disk in shared mode
sudo archil mount dsk-0123456789abcdef /mnt/logs --region aws-us-east-1 --shared

Create instance-specific directories

Each application instance should create its own directory within the mounted filesystem. We recommend doing this as part of a startup script or on application startup:
# Create instance-specific log directory using hostname
INSTANCE_DIR="/mnt/logs/$(hostname)"
mkdir -p "$INSTANCE_DIR"

# Or use a custom instance identifier
INSTANCE_ID="${INSTANCE_ID:-$(date +%s)-$$}"
INSTANCE_DIR="/mnt/logs/instance-${INSTANCE_ID}"
mkdir -p "$INSTANCE_DIR"

Application configuration

Configure your application loggers to write to the mounted Archil filesystem. Your logs will automatically flow to S3 without any S3 API calls.
import logging
import os
import socket
from datetime import datetime

def setup_logging():
    # Create instance-specific directory
    instance_id = socket.gethostname()
    log_dir = f"/mnt/logs/{instance_id}"
    os.makedirs(log_dir, exist_ok=True)
    
    # Configure standard Python logging
    logger = logging.getLogger('app')
    logger.setLevel(logging.INFO)
    
    # Create file handler with timestamp in filename
    timestamp = datetime.utcnow().strftime('%Y%m%dT%H_%M_00Z')
    log_file = f"{log_dir}/app-{timestamp}.log"
    
    file_handler = logging.FileHandler(log_file)
    file_handler.setFormatter(
        logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    )
    
    logger.addHandler(file_handler)
    return logger

# Usage
logger = setup_logging()
logger.info("Application started - logs automatically flowing to S3")
logger.error("Sample error message")

Log organization and management

With Archil, you don’t need traditional log rotation because logs are automatically offloaded to S3, where there’s infinite capacity. The application configuration examples above already demonstrate time-based file creation for chronological organization.

S3 lifecycle management

One of the key advantages of using Archil for logging is that your logs automatically flow to S3, giving you access to all of S3’s built-in cost optimization features. You can leverage S3’s intelligent tiering and lifecycle policies to automatically move your log data to cheaper storage classes over time without any application changes. Configure lifecycle policies on your S3 bucket to automatically transition logs to cheaper storage classes:
{
  "Rules": [
    {
      "ID": "LogRetentionPolicy",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Querying logs with Amazon Athena

Once your logs are flowing to S3 through Archil, you can use Amazon Athena to run SQL queries directly against your log data without needing to load it into a database.

Setting up Athena for log analysis

First, create a table in Athena that maps to your S3 log structure. Since logs are organized by instance ID with timestamps in filenames, we’ll use a simpler partitioning approach:
CREATE EXTERNAL TABLE application_logs (
  timestamp string,
  level string,
  message string,
  module string,
  function string,
  line int
)
PARTITIONED BY (
  instance_id string
)
STORED AS TEXTFILE
LOCATION 's3://your-bucket/logs/'
TBLPROPERTIES (
  'projection.enabled' = 'true',
  'projection.instance_id.type' = 'injected',
  'storage.location.template' = 's3://your-bucket/logs/${instance_id}/'
)
Alternatively, if you want time-based partitioning for better query performance, you can reorganize your log structure or use a non-partitioned table:
-- Non-partitioned approach (simpler, works with current structure)
CREATE EXTERNAL TABLE application_logs_simple (
  timestamp string,
  level string,
  message string,
  instance_id string,
  module string,
  function string,
  line int
)
STORED AS TEXTFILE
LOCATION 's3://your-bucket/logs/'

Example Athena queries

-- Find all errors in the last 24 hours
SELECT timestamp, instance_id, message
FROM application_logs
WHERE level = 'ERROR'
  AND timestamp >= date_format(date_add('hour', -24, now()), '%Y-%m-%d %H:%i:%s')
ORDER BY timestamp DESC
LIMIT 100;

-- Count errors by instance
SELECT instance_id, COUNT(*) as error_count
FROM application_logs
WHERE level = 'ERROR'
  AND timestamp >= date_format(date_add('day', -1, now()), '%Y-%m-%d')
GROUP BY instance_id
ORDER BY error_count DESC;

Optimizing Athena performance

For better query performance with large log volumes:
  1. Partition your data by time (year/month/day/hour) as shown in the table definition
  2. Use columnar formats like Parquet for better compression and query speed:
CREATE TABLE application_logs_parquet
WITH (
  format = 'PARQUET',
  external_location = 's3://your-bucket/logs-parquet/',
  partitioned_by = ARRAY['year', 'month', 'day']
) AS
SELECT *,
  year(from_iso8601_timestamp(timestamp)) as year,
  month(from_iso8601_timestamp(timestamp)) as month,
  day(from_iso8601_timestamp(timestamp)) as day
FROM application_logs
WHERE timestamp >= '2024-01-01'
  1. Use projection to avoid expensive partition discovery operations
  2. Compress your log files using gzip or other compression formats