Skip to content

Commit

Permalink
rework examples for a broader audience
Browse files Browse the repository at this point in the history
  • Loading branch information
wbbaker committed Jul 31, 2024
1 parent d890092 commit 2b01af7
Show file tree
Hide file tree
Showing 2 changed files with 122 additions and 48 deletions.
167 changes: 119 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,53 +2,124 @@

Files & notes for a series of docker tutorials for coworkers.

## Session 1: Run a Jupyter Notebook

1. Getting things working
1. Start Docker Desktop
2. Open a terminal — either your built-in terminal or from Docker Desktop itself.
3. Run `docker run hello-world` to make sure Docker is installed and working.
4. `docker run <your-favorite-distro>` such as debian, rocky, alpine, etc. — what happens?
5. `docker run -it <your-favorite-distro>` — just add `-it`. Better?
6. Quiz: What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/.
## 1. Getting Started with Docker

1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/)
2. Open a terminal. Some options:
* Windows: Lots of options
* Use the Docker Desktop terminal
* Or use the Windows Subsystem for Linux (WSL) with Ubuntu or another distro
* Or Windows Terminal (new default in Windows 11)
* Or Git Bash
* Or PowerShell
* Mac: Use the Terminal app
* Linux: I think you know already. By the way, is anyone using desktop Linux?
3. Run `docker run hello-world` to make sure Docker is installed and working.

## 2. Running Python inside Docker

1. Run a Python container
```shell
$ docker run -it python:3.9
```
1. Try a python command such as `print("Hello, world!")`
* What happens when you `exit`?
* What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/.
2. How about `print(f"List: {",".join(["one", "two", "three"])}")`?
* What's an easy way to fix that, without changing the code?
3. Try a command that requires a library such as `import pandas`.
* What happens? Any ideas how to fix it?

## 3. Theory

* How are containers and virtual machines different? How are they similar?
* What are images and containers?
* What other container systems exist?

## 4. Dockerfiles

Dockerfiles let you customize images systematically, so they're convenient to reuse and share.

1. Create a file in your current directory named `Dockerfile`
```Dockerfile
FROM python:3.12
RUN pip install pandas
```
1. What does `FROM` do?
* What is the grandparent image — in other words, what is `python:3.12` based on?
2. What does `RUN` do?

2. Use it to build a custom Docker image and tag "python-pandas".
```shell
$ docker build . -t python-pandas
```

2. Running something useful: Jupyter Notebooks
1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook`
2. Find the login URL in the output and open it in your browser. Should see the notebook interface.
3. Do something.
4. Try something with a missing library such as `import scikit-learn`.
3. Run a container from the new image.
```shell
$ docker run -it python-pandas
```
```python
>>> import pandas as pd
>>> print(pd.__version__)
```

3. Dockerfiles
1. Create a `Dockerfile` that installs the missing library.
```Dockerfile
from quay.io/jupyter/datascience-notebook:latest
RUN conda install --quiet --yes scikit-learn tensorflow
```
1. Note: Instead, you could `docker exec -it <container-id> bash` and install the library manually, but then it wouldn't be sharable or saved for future work.
2. Think of Docker images as snapshots that you can save for later or share. _"Infrastructure as code."_
2. Build a new local Docker image
```shell
$ docker build . -t my-notebook
$ docker run -p 8888:8888 my-notebook
```
3. [Try again with the missing library](https://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_biclustering.html#generate-sample-data).
```python
from matplotlib import pyplot as plt
from sklearn.datasets import make_checkerboard
n_clusters = (4, 3)
data, rows, columns = make_checkerboard(
shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=42
)
plt.matshow(data, cmap=plt.cm.Blues)
plt.title("Original dataset")
_ = plt.show()
```
* Recommendations for other libraries to try out: https://www.mygreatlearning.com/blog/open-source-python-libraries/
## Session 2: Run multiple containers together
See this repo: [mq-experiments](https://github.itap.purdue.edu/AgIT/mq-experiments)
## 5. More Dockerfile details

Dockerfiles can do a lot for you. For example, grabbing a local file and then using it internally — here, `requirements.txt`. This is an example of [infrastructure as code](https://en.wikipedia.org/wiki/Infrastructure_as_code).

For more details, see the
[Dockerfile reference](https://docs.docker.com/reference/dockerfile/).

```Dockerfile
FROM python:3.12
WORKDIR /work
COPY requirements.txt .
RUN pip install -r requirements.txt --no-cache-dir
```

## 6. Mounting a volume

Docker containers don't have access to the host filesystem, and also they are ephemeral.
```shell
$ echo 'print("Hello, world!")' > hello.py
$ docker run -v $(pwd):/work -w /work python-pandas python hello.py
```
1. What does `-v $(pwd):/work` do?
2. What does `-w /work` do? What happens if you don't use it?
3. What does the `python hello.py` part do?

## 7. Docker images as software packaging

Open source software is often available as a Docker image, which makes it easy to install and run. For example:

* [TensorFlow](https://www.tensorflow.org/install/docker)
* [PostgreSQL](https://hub.docker.com/_/postgres)
* [ollama](https://hub.docker.com/r/ollama/ollama)

### Example: Jupyter Lab

[Jupyter Lab](https://jupyter.org/) (formerly "Jupyter Notebooks") is a popular tool for data science and machine learning. It's used by Purdue researchers to share code and reproduce results.
1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook:latest`
2. Find the login URL in the output and open it in your browser. Should see the notebook interface.
3. Open a new notebook and draw a graph:
```python
import matplotlib.pyplot as plt
plt.plot([0, 1, 2, 3, 5], [0, 1, 4, 9, 25]).show()
```
4. [A cooler plot](https://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_biclustering.html#generate-sample-data).
```python
from matplotlib import pyplot as plt
from sklearn.datasets import make_checkerboard
n_clusters = (4, 3)
data, rows, columns = make_checkerboard(
shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=42
)
plt.matshow(data, cmap=plt.cm.Blues)
plt.title("Original dataset")
_ = plt.show()
```
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pandas@2.2
scikit-learn@1.5
matplotlib@3.9

0 comments on commit 2b01af7

Please sign in to comment.