Persistent Storage
All notebooks, datasets, and trained models automatically stored in S3, persisting across compute sessions
Cost Efficiency
Pay only for high-speed storage when active. Data remains in S3 at standard costs when idle
Scalability
Scale compute resources up or down without data migration or storage limits
Collaboration
Multiple team members access same datasets and notebooks by mounting the same Archil disk
Create an Archil disk
First, follow the Archil Getting Started Guide to create an Archil disk that will serve as your data science workspace.Mount your Archil disk
Mount your Archil disk to create your data science workspace:Set up Python virtual environment on Archil disk
Create a virtual environment directly on your Archil disk to ensure all dependencies persist in S3:Configure Jupyter to use the virtual environment
Set up Jupyter to use your virtual environment and Archil workspace:Download the MNIST training notebook
Instead of creating the notebook from scratch, download our pre-built tutorial notebook:Start Jupyter and run the tutorial
Start Jupyter notebook server with your virtual environment:mnist-pytorch-tutorial.ipynb notebook ready to run.
The notebook includes:
- Data loading: Downloads MNIST dataset directly to your Archil disk
- Model definition: Simple neural network for digit classification
- Training loop: Complete training pipeline with progress tracking
- Evaluation: Model accuracy assessment on test data
- Persistence: Automatic model saving to S3 via Archil
- Visualization: Training progress and prediction examples
Advanced workflows
Loading pre-trained models
Working with larger datasets
Monitoring and optimization
You can monitor your data science workspace and optimize performance by checking disk usage and managing your virtual environment dependencies as needed.Cleanup
When you’re done with your session, you can safely stop Jupyter:Next steps
This tutorial demonstrated the basics of serverless Jupyter notebooks with PyTorch. You can extend this setup for more complex workflows:- Distributed training across multiple compute instances
- Hyperparameter tuning with automated experiment tracking
- Model serving by deploying trained models from S3
- Data pipelines that process large datasets stored in object storage