> ## Documentation Index > Fetch the complete documentation index at: https://docs.archil.com/llms.txt > Use this file to discover all available pages before exploring further. # Jupyter Notebooks > Learn how to run Jupyter notebooks with PyTorch and MNIST training directly on Amazon S3 using Archil [Jupyter Notebook](https://jupyter.org/) is an open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. Archil makes it simple to run Jupyter notebooks in a completely serverless manner, storing all data and notebooks directly on Amazon S3 while maintaining interactive performance through intelligent caching. All notebooks, datasets, and trained models automatically stored in S3, persisting across compute sessions Pay only for high-speed storage when active. Data remains in S3 at standard costs when idle Scale compute resources up or down without data migration or storage limits Multiple team members access same datasets and notebooks by mounting the same Archil disk This guide will walk you through setting up a serverless Jupyter environment for data science workflows, including training a PyTorch model on the MNIST dataset with all data stored in S3. ## Create an Archil disk First, follow the Archil [Getting Started Guide](/getting-started/quickstart) to create an Archil disk that will serve as your data science workspace. ## Mount your Archil disk Mount your Archil disk to create your data science workspace: ```bash theme={null} # Create mount directory sudo mkdir -p /mnt/archil # Mount Archil disk export ARCHIL_MOUNT_TOKEN="" sudo --preserve-env=ARCHIL_MOUNT_TOKEN archil mount /mnt/archil --region aws-us-east-1 # Create datascience workspace directories sudo mkdir -p /mnt/archil/datascience/notebooks sudo mkdir -p /mnt/archil/datascience/datasets sudo mkdir -p /mnt/archil/datascience/models sudo mkdir -p /mnt/archil/datascience/venv sudo chown -R $USER:$USER /mnt/archil/datascience ``` ## Set up Python virtual environment on Archil disk Create a virtual environment directly on your Archil disk to ensure all dependencies persist in S3: ```bash theme={null} # Create virtual environment on Archil disk cd /mnt/archil/datascience python3 -m venv venv # Activate the virtual environment source venv/bin/activate # Upgrade pip and install data science packages pip install --upgrade pip pip install jupyter torch torchvision matplotlib numpy pandas # Verify installation python -c "import torch; print(f'PyTorch version: {torch.__version__}')" ``` ## Configure Jupyter to use the virtual environment Set up Jupyter to use your virtual environment and Archil workspace: ```bash theme={null} # Generate Jupyter config (while venv is activated) jupyter notebook --generate-config # Create Jupyter configuration cat > ~/.jupyter/jupyter_notebook_config.py << 'EOF' c.NotebookApp.notebook_dir = '/mnt/archil/datascience/notebooks' c.NotebookApp.ip = '0.0.0.0' c.NotebookApp.port = 8888 c.NotebookApp.open_browser = False c.NotebookApp.allow_root = True EOF ``` ## Download the MNIST training notebook Instead of creating the notebook from scratch, download our pre-built tutorial notebook: ```bash theme={null} # Navigate to notebooks directory cd /mnt/archil/datascience/notebooks # Download the MNIST PyTorch tutorial notebook curl -O https://s3.amazonaws.com/archil-client/docs/artifacts/guides/data-science/mnist-pytorch-tutorial.ipynb # Verify the download ls -la mnist-pytorch-tutorial.ipynb ``` ## Start Jupyter and run the tutorial Start Jupyter notebook server with your virtual environment: ```bash theme={null} # Make sure you're in the Archil disk and virtual environment is activated cd /mnt/archil/datascience source venv/bin/activate # Start Jupyter notebook jupyter notebook --notebook-dir=/mnt/archil/datascience/notebooks ``` Open your browser and navigate to the Jupyter interface. You'll see the `mnist-pytorch-tutorial.ipynb` notebook ready to run. The notebook includes: * **Data loading**: Downloads MNIST dataset directly to your Archil disk * **Model definition**: Simple neural network for digit classification * **Training loop**: Complete training pipeline with progress tracking * **Evaluation**: Model accuracy assessment on test data * **Persistence**: Automatic model saving to S3 via Archil * **Visualization**: Training progress and prediction examples ## Advanced workflows ### Loading pre-trained models ```python theme={null} # Load a previously saved model checkpoint = torch.load('../models/mnist_model.pth') model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) print(f'Loaded model with accuracy: {checkpoint["accuracy"]:.2f}%') ``` ### Working with larger datasets ```python expandable theme={null} # For larger datasets, you can stream data directly from S3 # The Archil cache will intelligently manage frequently accessed data # Example: Custom dataset class for large image datasets class LargeImageDataset(torch.utils.data.Dataset): def __init__(self, data_dir, transform=None): self.data_dir = data_dir # Points to Archil-mounted directory self.transform = transform self.image_files = os.listdir(data_dir) def __len__(self): return len(self.image_files) def __getitem__(self, idx): # Files are loaded on-demand from S3 via Archil cache img_path = os.path.join(self.data_dir, self.image_files[idx]) # ... load and process image ``` ## Monitoring and optimization You can monitor your data science workspace and optimize performance by checking disk usage and managing your virtual environment dependencies as needed. ## Cleanup When you're done with your session, you can safely stop Jupyter: ```bash theme={null} # Stop Jupyter # Ctrl+C in the terminal where Jupyter is running ``` Your data remains safely stored in S3 and can be accessed again by mounting the same disk in future sessions. ## Next steps This tutorial demonstrated the basics of serverless Jupyter notebooks with PyTorch. You can extend this setup for more complex workflows: * **Distributed training** across multiple compute instances * **Hyperparameter tuning** with automated experiment tracking * **Model serving** by deploying trained models from S3 * **Data pipelines** that process large datasets stored in object storage All while maintaining the serverless benefits of Archil's intelligent caching and S3 integration.