Configure your applications with robust logging using Archil disks that automatically flow to Amazon S3 storage. Simply mount an Archil disk and point your regular application loggers to it - no S3 APIs required.

Transparent S3 Integration

Write to normal filesystem paths and logs automatically sync to S3

Instance Isolation

Each instance creates its own directory for proper ownership separation

Infinite Capacity

Never run out of capacity because your logs are going to S3

Standard Logging Libraries

Use your existing logging setup - Python logging, winston, etc.

Create and mount an Archil disk

First, create an Archil disk for your logging infrastructure. Follow the Quickstart Guide to create a disk, then mount it in shared mode to allow multiple instances to write logs simultaneously.

Mount the logging disk

# Create the mount directory
sudo mkdir -p /mnt/logs

# Mount the Archil disk in shared mode
sudo archil mount <disk-name> /mnt/logs --region aws-us-east-1 --shared

Create instance-specific directories

Each application instance should create its own directory within the mounted filesystem. We recommend doing this as part of a startup script or on application startup:
# Create instance-specific log directory using hostname
INSTANCE_DIR="/mnt/logs/$(hostname)"
mkdir -p "$INSTANCE_DIR"

# Or use a custom instance identifier
INSTANCE_ID="${INSTANCE_ID:-$(date +%s)-$$}"
INSTANCE_DIR="/mnt/logs/instance-${INSTANCE_ID}"
mkdir -p "$INSTANCE_DIR"

Application configuration

Configure your application loggers to write to the mounted Archil filesystem. Your logs will automatically flow to S3 without any S3 API calls.
import logging
import os
import socket
import threading
import time
from datetime import datetime

class MinuteRotatingFileHandler(logging.FileHandler):
    def __init__(self, log_dir, service_name):
        self.log_dir = log_dir
        self.service_name = service_name
        self.current_minute = None
        super().__init__(self._get_current_filename())
        
    def _get_current_filename(self):
        now = datetime.utcnow()
        timestamp = now.strftime('%Y%m%dT%H_%M_00Z')
        return f"{self.log_dir}/{self.service_name}-{timestamp}.log"
        
    def emit(self, record):
        now = datetime.utcnow()
        current_minute = now.strftime('%Y%m%dT%H_%M')
        
        if self.current_minute != current_minute:
            self.close()
            self.baseFilename = self._get_current_filename()
            self.stream = self._open()
            self.current_minute = current_minute
            
        super().emit(record)

def setup_logging():
    # Create instance-specific directory
    instance_id = socket.gethostname()
    log_dir = f"/mnt/logs/{instance_id}"
    os.makedirs(log_dir, exist_ok=True)
    
    # Configure standard Python logging
    logger = logging.getLogger('app')
    logger.setLevel(logging.INFO)
    
    # Create minute-rotating file handler
    file_handler = MinuteRotatingFileHandler(log_dir, 'app')
    file_handler.setFormatter(
        logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    )
    
    logger.addHandler(file_handler)
    return logger

# Usage
logger = setup_logging()
logger.info("Application started - logs automatically flowing to S3")
logger.error("Sample error message")

Log organization and management

With Archil, you don’t need traditional log rotation because logs are automatically offloaded to S3, where there’s infinite capacity. The standardized logging approach implements automatic minute-based rotation across all languages, creating new log files every minute with the consistent naming pattern service_YYYYMMDDTHH_MM_00Z.log for natural chronological organization while maintaining optimal query performance.

S3 lifecycle management

One of the key advantages of using Archil for logging is that your logs automatically flow to S3, giving you access to all of S3’s built-in cost optimization features. You can leverage S3’s intelligent tiering and lifecycle policies to automatically move your log data to cheaper storage classes over time without any application changes. Configure lifecycle policies on your S3 bucket to automatically transition logs to cheaper storage classes:
{
  "Rules": [
    {
      "ID": "LogRetentionPolicy",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Querying logs with Amazon Athena

Once your logs are flowing to S3 through Archil, you can use Amazon Athena to run SQL queries directly against your log data without needing to load it into a database.

Setting up Athena for log analysis

Create a table in Athena that leverages the standardized filename format for efficient time-based partitioning. The standardized service_YYYYMMDDTHH_MM_00Z.log format enables Athena to automatically partition data by time:
CREATE EXTERNAL TABLE application_logs (
  timestamp string,
  level string,
  message string,
  module string,
  function string,
  line int
)
PARTITIONED BY (
  year string,
  month string,
  day string,
  hour string,
  minute string,
  instance_id string
)
STORED AS TEXTFILE
LOCATION 's3://your-bucket/logs/'
TBLPROPERTIES (
  'projection.enabled' = 'true',
  'projection.year.type' = 'integer',
  'projection.year.range' = '2020,2030',
  'projection.year.interval' = '1',
  'projection.month.type' = 'integer', 
  'projection.month.range' = '1,12',
  'projection.month.interval' = '1',
  'projection.day.type' = 'integer',
  'projection.day.range' = '1,31', 
  'projection.day.interval' = '1',
  'projection.hour.type' = 'integer',
  'projection.hour.range' = '0,23',
  'projection.hour.interval' = '1',
  'projection.minute.type' = 'integer',
  'projection.minute.range' = '0,59',
  'projection.minute.interval' = '1',
  'projection.instance_id.type' = 'injected',
  'storage.location.template' = 's3://your-bucket/logs/${instance_id}/app-${year}${month}${day}T${hour}_${minute}_00Z.log'
)
This time-based partitioning approach provides several advantages:
  • Efficient time-range queries: Athena can skip entire partitions when filtering by time
  • Automatic partition discovery: No need to manually add partitions as new log files are created
  • Cost optimization: Only scan the data you need for time-based analysis
  • Scalability: Performance remains consistent as log volume grows
For simpler use cases where you don’t need fine-grained time partitioning, you can use a non-partitioned table:
-- Non-partitioned approach (simpler setup, less efficient for large datasets)
CREATE EXTERNAL TABLE application_logs_simple (
  timestamp string,
  level string,
  message string,
  instance_id string,
  module string,
  function string,
  line int
)
STORED AS TEXTFILE
LOCATION 's3://your-bucket/logs/'

Example Athena queries

-- Find all errors in the last 24 hours (leverages time partitioning)
SELECT timestamp, instance_id, message
FROM application_logs
WHERE level = 'ERROR'
  AND year = '2024' AND month = '07' AND day = '29'
  AND timestamp >= date_format(date_add('hour', -24, now()), '%Y-%m-%d %H:%i:%s')
ORDER BY timestamp DESC
LIMIT 100;

-- Count errors by instance (with partition pruning)
SELECT instance_id, COUNT(*) as error_count
FROM application_logs
WHERE level = 'ERROR'
  AND year = '2024' AND month = '07' AND day >= '28'
  AND timestamp >= date_format(date_add('day', -1, now()), '%Y-%m-%d')
GROUP BY instance_id
ORDER BY error_count DESC;