Setting up Anaconda environment for Plant Science data pipelines
- Manual by Sungchan Oh (Sun)
- First Created: November 13, 2024
- Last Updated: November 13, 2024
- Contact: oh231@purdue.edu
- Prerequisite
- Access to Purdue Compute Resources (e.g., Negishi)
- Terminal software (MobaXterm (Win), iTerm2 (mac OS), Terminator (Linux))
-
Manually create Anaconda environment
- Construct an Anaconda environment step-by-step, customizing it to specific requirements.
- This method offers granular control but requires careful package management.
- Note: This approach has been successfully tested on Negishi (Linux-64) systems.
-
Create Anaconda environment using
environment.yml
- Leverage a
environment.yml
file to define the environment's dependencies. - This approach is more automated and reproducible, especially for those using Negishi.
- Note: This method has also been tested on Negishi (Linux-64) systems.
- Leverage a
-
Use prebuilt Apptainer/Docker container
- Utilize a pre-configured Apptainer/Docker container encapsulating the entire pipelines and the environment.
- This method is convenient for immediate use and portability.
- It eliminates the need for manual environment setup and ensures consistency across different systems.
- Environment Compatibility: While these methods have been tested on Negishi (Linux-64), compatibility with other systems may vary. Users should be aware that successful installation and execution may depend on specific system configurations and package versions.
- Anaconda and Python Version Compatibility: If manual environment setup is preferred, it's crucial to identify compatible versions of Anaconda and Python that support the required Python packages.
- Apptainer/Docker Container Adoption: As the pre-built Apptainer container is under development, it's recommended to stay updated on its availability and usage instructions. This approach promises a simplified and standardized environment for running plant science data pipelines.
By carefully considering these factors, users can effectively establish the necessary environment for their plant science data analysis tasks.
-
Launch a terminal emulator and connect to the Negishi login node.
ssh [your_id]@negishi.rcac.purdue.edu
- Replace
your_id
with your actualPurdue ID
(e.g.,oh231
). - Remove the square brackets from all placeholders.
- For the remaining placeholders like
["something"]
, replacesomething
with the appropriate value and remove square brackets.
- Replace
-
Load Anaconda moduele and create a new Anaconda environment.
module load anaconda/2024.02-py311.lua conda create --name ps37 python=3.7.9 -y
- Note: Python 3.7 is highly compatible with the Python dependencies used in Plant Science pipelines.
-
Activate the new Anaconda environment.
conda activate ps37
-
Install Python dependencies.
conda install -c conda-forge ipython numpy gdal pandas matplotlib -y conda install -c conda-forge asteval pyproj scikit-image opencv -y conda install -c conda-forge rasterio fiona spectral openpyxl -y conda install -c conda-forge laspy lazrs-python lastools python-pdal pdal -y
- Installation may take several minutes.
-
Launch a terminal emulator and connect to the Negishi login node as instructed in 1. Manually create Anaconda environment.
-
Load Anaconda moduele
module load anaconda/2024.02-py311.lua
-
To proceed, either navigate to the directory containing the
environment.yml
file or download the file to your current working directory. -
Create a new Anaconda environment and install required dependencies using the following command:
conda env create -f environment.yml
- TODO
- Run Python and import
laspy
.
import laspy
-
If the
laspy
import fails, the following steps detail a potential workaround to address potential incompatibility issues between laspy and the queue module. -
Locate
copc.py
(Environment-Specific): The exact path tocopc.py
may vary depending on your Python installation and environment. We recommend the following general guidance. Navigate to your active Python environment's site-packages directory using the commandconda env list
orwhich python
to find the Python executable and its associated site-packages path.- Common locations for the site-packages directory include:
~/.conda/envs/<module_name>?/<env_name>/lib/python<version>/site-packages
- Once in the site-packages directory, search for the
laspy
package and locate thecopc.py
file within its subdirectories. Example:
~/.conda/envs/2024.02-py311/ps37/lib/python3.7/site-packages/laspy/copc.py
-
Edit
copc.py
(with Caution ❗)- Warning: Improper modification of system files can lead to unintended consequences. Proceed with caution and back up the file before making changes.
- Assuming you've found
copc.py
, open it in atext editor (Notepad, vim, etc.)
. Locate the following line:
from queue import Queue, SimpleQueue
- Comment out this lines with
#
. Here's the modified code:
#from queue import Queue, SimpleQueue
- Before or after the commented out line, add the following code:
from queue import Queue
- Run Python and import laspy
import laspy