From aec33c6e7ee776b98dd319dff5f9964b61b4c5e4 Mon Sep 17 00:00:00 2001 From: W Beecher Baker Date: Tue, 2 Jul 2024 14:17:15 -0400 Subject: [PATCH 1/7] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5059ea4..e862c87 100644 --- a/README.md +++ b/README.md @@ -42,4 +42,5 @@ Files & notes for a series of docker tutorials for coworkers. plt.matshow(data, cmap=plt.cm.Blues) plt.title("Original dataset") _ = plt.show() - ``` + ``` + * Recommendations for other libraries to try out: https://www.mygreatlearning.com/blog/open-source-python-libraries/ From 207271b21518b49bc5240f4be36dad021daa372b Mon Sep 17 00:00:00 2001 From: W Beecher Baker Date: Tue, 2 Jul 2024 14:17:33 -0400 Subject: [PATCH 2/7] fix bullet format --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e862c87..1911284 100644 --- a/README.md +++ b/README.md @@ -43,4 +43,5 @@ Files & notes for a series of docker tutorials for coworkers. plt.title("Original dataset") _ = plt.show() ``` - * Recommendations for other libraries to try out: https://www.mygreatlearning.com/blog/open-source-python-libraries/ + + * Recommendations for other libraries to try out: https://www.mygreatlearning.com/blog/open-source-python-libraries/ From cd24cb74d9cb4d31ac847b97e11b4fd5ab9d5169 Mon Sep 17 00:00:00 2001 From: W Beecher Baker Date: Fri, 26 Jul 2024 14:55:20 -0400 Subject: [PATCH 3/7] Update README.md --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1911284..1044a17 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,9 @@ Files & notes for a series of docker tutorials for coworkers. 1. Getting things working 1. Everybody run `docker run hello-world` to make sure Docker is installed and working. - 2. `docker run -it ` such as debian, rocky, alpine, etc. - 3. Quiz: What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/. + 2. `docker run ` such as debian, rocky, alpine, etc. — what happens? + 3. `docker run -it ` — just add `-it`. Better? + 4. Quiz: What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/. 2. Running something useful: Jupyter Notebooks 1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook:latest` From 27b7437826eb8e0384b1ad5390004b12dc8f9fc8 Mon Sep 17 00:00:00 2001 From: W Beecher Baker Date: Fri, 26 Jul 2024 14:57:57 -0400 Subject: [PATCH 4/7] Update README.md --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 1044a17..6f75cdc 100644 --- a/README.md +++ b/README.md @@ -46,3 +46,7 @@ Files & notes for a series of docker tutorials for coworkers. ``` * Recommendations for other libraries to try out: https://www.mygreatlearning.com/blog/open-source-python-libraries/ + +## Session 2: Run multiple containers together + +See this repo: [mq-experiments](https://github.itap.purdue.edu/AgIT/mq-experiments) From 7bb2211d1d5ebeaaf9024e38f14eeacf514caec1 Mon Sep 17 00:00:00 2001 From: W Beecher Baker Date: Fri, 26 Jul 2024 15:27:43 -0400 Subject: [PATCH 5/7] Update README.md --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 6f75cdc..cb21773 100644 --- a/README.md +++ b/README.md @@ -5,10 +5,12 @@ Files & notes for a series of docker tutorials for coworkers. ## Session 1: Run a Jupyter Notebook 1. Getting things working - 1. Everybody run `docker run hello-world` to make sure Docker is installed and working. - 2. `docker run ` such as debian, rocky, alpine, etc. — what happens? - 3. `docker run -it ` — just add `-it`. Better? - 4. Quiz: What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/. + 1. Start Docker Desktop + 2. Open a terminal — either your built-in terminal or from Docker Desktop itself. + 3. Run `docker run hello-world` to make sure Docker is installed and working. + 4. `docker run ` such as debian, rocky, alpine, etc. — what happens? + 5. `docker run -it ` — just add `-it`. Better? + 6. Quiz: What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/. 2. Running something useful: Jupyter Notebooks 1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook:latest` From d8900928e937f3109ccc0c252feb204c72b15e6a Mon Sep 17 00:00:00 2001 From: W Beecher Baker Date: Fri, 26 Jul 2024 16:01:12 -0400 Subject: [PATCH 6/7] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index cb21773..ba66ad5 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ Files & notes for a series of docker tutorials for coworkers. 6. Quiz: What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/. 2. Running something useful: Jupyter Notebooks - 1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook:latest` + 1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook` 2. Find the login URL in the output and open it in your browser. Should see the notebook interface. 3. Do something. 4. Try something with a missing library such as `import scikit-learn`. From 2b01af7b5e1d7ace81cb11280324805315dbec29 Mon Sep 17 00:00:00 2001 From: Beecher Baker Date: Wed, 31 Jul 2024 12:26:21 -0400 Subject: [PATCH 7/7] rework examples for a broader audience --- README.md | 167 +++++++++++++++++++++++++++++++++-------------- requirements.txt | 3 + 2 files changed, 122 insertions(+), 48 deletions(-) create mode 100644 requirements.txt diff --git a/README.md b/README.md index ba66ad5..3416946 100644 --- a/README.md +++ b/README.md @@ -2,53 +2,124 @@ Files & notes for a series of docker tutorials for coworkers. -## Session 1: Run a Jupyter Notebook - -1. Getting things working - 1. Start Docker Desktop - 2. Open a terminal — either your built-in terminal or from Docker Desktop itself. - 3. Run `docker run hello-world` to make sure Docker is installed and working. - 4. `docker run ` such as debian, rocky, alpine, etc. — what happens? - 5. `docker run -it ` — just add `-it`. Better? - 6. Quiz: What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/. +## 1. Getting Started with Docker + +1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) +2. Open a terminal. Some options: + * Windows: Lots of options + * Use the Docker Desktop terminal + * Or use the Windows Subsystem for Linux (WSL) with Ubuntu or another distro + * Or Windows Terminal (new default in Windows 11) + * Or Git Bash + * Or PowerShell + * Mac: Use the Terminal app + * Linux: I think you know already. By the way, is anyone using desktop Linux? +3. Run `docker run hello-world` to make sure Docker is installed and working. + +## 2. Running Python inside Docker + +1. Run a Python container + ```shell + $ docker run -it python:3.9 + ``` + 1. Try a python command such as `print("Hello, world!")` + * What happens when you `exit`? + * What does `-it` do? Could check with `docker run --help` or at https://docs.docker.com/reference/. + 2. How about `print(f"List: {",".join(["one", "two", "three"])}")`? + * What's an easy way to fix that, without changing the code? + 3. Try a command that requires a library such as `import pandas`. + * What happens? Any ideas how to fix it? + +## 3. Theory + +* How are containers and virtual machines different? How are they similar? +* What are images and containers? +* What other container systems exist? + +## 4. Dockerfiles + +Dockerfiles let you customize images systematically, so they're convenient to reuse and share. + +1. Create a file in your current directory named `Dockerfile` + ```Dockerfile + FROM python:3.12 + RUN pip install pandas + ``` + 1. What does `FROM` do? + * What is the grandparent image — in other words, what is `python:3.12` based on? + 2. What does `RUN` do? + +2. Use it to build a custom Docker image and tag "python-pandas". + ```shell + $ docker build . -t python-pandas + ``` -2. Running something useful: Jupyter Notebooks - 1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook` - 2. Find the login URL in the output and open it in your browser. Should see the notebook interface. - 3. Do something. - 4. Try something with a missing library such as `import scikit-learn`. +3. Run a container from the new image. + ```shell + $ docker run -it python-pandas + ``` + ```python + >>> import pandas as pd + >>> print(pd.__version__) + ``` -3. Dockerfiles - 1. Create a `Dockerfile` that installs the missing library. - ```Dockerfile - from quay.io/jupyter/datascience-notebook:latest - RUN conda install --quiet --yes scikit-learn tensorflow - ``` - 1. Note: Instead, you could `docker exec -it bash` and install the library manually, but then it wouldn't be sharable or saved for future work. - 2. Think of Docker images as snapshots that you can save for later or share. _"Infrastructure as code."_ - 2. Build a new local Docker image - ```shell - $ docker build . -t my-notebook - $ docker run -p 8888:8888 my-notebook - ``` - 3. [Try again with the missing library](https://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_biclustering.html#generate-sample-data). - ```python - from matplotlib import pyplot as plt - - from sklearn.datasets import make_checkerboard - - n_clusters = (4, 3) - data, rows, columns = make_checkerboard( - shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=42 - ) - - plt.matshow(data, cmap=plt.cm.Blues) - plt.title("Original dataset") - _ = plt.show() - ``` - - * Recommendations for other libraries to try out: https://www.mygreatlearning.com/blog/open-source-python-libraries/ - -## Session 2: Run multiple containers together - -See this repo: [mq-experiments](https://github.itap.purdue.edu/AgIT/mq-experiments) +## 5. More Dockerfile details + +Dockerfiles can do a lot for you. For example, grabbing a local file and then using it internally — here, `requirements.txt`. This is an example of [infrastructure as code](https://en.wikipedia.org/wiki/Infrastructure_as_code). + +For more details, see the +[Dockerfile reference](https://docs.docker.com/reference/dockerfile/). + + ```Dockerfile + FROM python:3.12 + WORKDIR /work + COPY requirements.txt . + RUN pip install -r requirements.txt --no-cache-dir + ``` + +## 6. Mounting a volume + +Docker containers don't have access to the host filesystem, and also they are ephemeral. + + ```shell + $ echo 'print("Hello, world!")' > hello.py + $ docker run -v $(pwd):/work -w /work python-pandas python hello.py + ``` + 1. What does `-v $(pwd):/work` do? + 2. What does `-w /work` do? What happens if you don't use it? + 3. What does the `python hello.py` part do? + +## 7. Docker images as software packaging + +Open source software is often available as a Docker image, which makes it easy to install and run. For example: + +* [TensorFlow](https://www.tensorflow.org/install/docker) +* [PostgreSQL](https://hub.docker.com/_/postgres) +* [ollama](https://hub.docker.com/r/ollama/ollama) + +### Example: Jupyter Lab + +[Jupyter Lab](https://jupyter.org/) (formerly "Jupyter Notebooks") is a popular tool for data science and machine learning. It's used by Purdue researchers to share code and reproduce results. + +1. `docker run -p 8888:8888 quay.io/jupyter/datascience-notebook:latest` +2. Find the login URL in the output and open it in your browser. Should see the notebook interface. +3. Open a new notebook and draw a graph: + ```python + import matplotlib.pyplot as plt + plt.plot([0, 1, 2, 3, 5], [0, 1, 4, 9, 25]).show() + ``` + +4. [A cooler plot](https://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_biclustering.html#generate-sample-data). + ```python + from matplotlib import pyplot as plt + from sklearn.datasets import make_checkerboard + + n_clusters = (4, 3) + data, rows, columns = make_checkerboard( + shape=(300, 300), n_clusters=n_clusters, noise=10, shuffle=False, random_state=42 + ) + + plt.matshow(data, cmap=plt.cm.Blues) + plt.title("Original dataset") + _ = plt.show() + ``` diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..d2a2d2c --- /dev/null +++ b/requirements.txt @@ -0,0 +1,3 @@ +pandas@2.2 +scikit-learn@1.5 +matplotlib@3.9 \ No newline at end of file