Skip to content

Setting up anaconda environment for Plant Science data pipelines

Notifications You must be signed in to change notification settings

PlantScience/ps-conda-environment

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
November 14, 2024 11:20
November 14, 2024 11:41

ps-conda-environment

Setting up Anaconda environment for Plant Science data pipelines


Three ways to get your environment ready for plant science data pipelines

  1. Manually create Anaconda environment

    • Construct an Anaconda environment step-by-step, customizing it to specific requirements.
    • This method offers granular control but requires careful package management.
    • Note: This approach has been successfully tested on Negishi (Linux-64) systems.
  2. Create Anaconda environment using environment.yml

    • Leverage a environment.yml file to define the environment's dependencies.
    • This approach is more automated and reproducible, especially for those using Negishi.
    • Note: This method has also been tested on Negishi (Linux-64) systems.
  3. Use prebuilt Apptainer/Docker container

    • Utilize a pre-configured Apptainer/Docker container encapsulating the entire pipelines and the environment.
    • This method is convenient for immediate use and portability.
    • It eliminates the need for manual environment setup and ensures consistency across different systems.

Important Considerations

  • Environment Compatibility: While these methods have been tested on Negishi (Linux-64), compatibility with other systems may vary. Users should be aware that successful installation and execution may depend on specific system configurations and package versions.
  • Anaconda and Python Version Compatibility: If manual environment setup is preferred, it's crucial to identify compatible versions of Anaconda and Python that support the required Python packages.
  • Apptainer/Docker Container Adoption: As the pre-built Apptainer container is under development, it's recommended to stay updated on its availability and usage instructions. This approach promises a simplified and standardized environment for running plant science data pipelines.

By carefully considering these factors, users can effectively establish the necessary environment for their plant science data analysis tasks.

1. Manually create Anaconda environment

  • Launch a terminal emulator and connect to the Negishi login node.

    ssh [your_id]@negishi.rcac.purdue.edu
    • Replace your_id with your actual Purdue ID (e.g., oh231).
    • Remove the square brackets from all placeholders.
    • For the remaining placeholders like ["something"], replace something with the appropriate value and remove square brackets.
  • Load Anaconda moduele and create a new Anaconda environment.

    module load anaconda/2024.02-py311.lua
    conda create --name ps37 python=3.7.9 -y
    • Note: Python 3.7 is highly compatible with the Python dependencies used in Plant Science pipelines.
  • Activate the new Anaconda environment.

    conda activate ps37
  • Install Python dependencies.

    conda install -c conda-forge ipython numpy gdal pandas matplotlib -y
    conda install -c conda-forge laspy lastools rasterio fiona python-pdal pdal -y
    conda install -c conda-forge asteval pyproj scikit-image spectral opencv -y
    • Installation may take several minutes.

2. Create anaconda Environment Using environment.yml

  • Launch a terminal emulator and connect to the Negishi login node as instructed in 1. Manually create Anaconda environment.

  • Load Anaconda moduele

    module load anaconda/2024.02-py311.lua
  • To proceed, either navigate to the directory containing the environment.yml file or download the file to your current working directory.

  • Create a new Anaconda environment and install required dependencies using the following command:

    conda env create -f environment.yml

3. Use prebuilt Apptainer/Docker container

  • TODO

Verify laspy dependency and potential workaround

  • Run Python and import laspy.
import laspy
  • If the laspy import fails, the following steps detail a potential workaround to address potential incompatibility issues between laspy and the queue module.

  • Locate copc.py (Environment-Specific): The exact path to copc.py may vary depending on your Python installation and environment. We recommend the following general guidance. Navigate to your active Python environment's site-packages directory using the command conda env list or which python to find the Python executable and its associated site-packages path.

    • Common locations for the site-packages directory include:
    ~/.conda/envs/<module_name>?/<env_name>/lib/python<version>/site-packages 
    • Once in the site-packages directory, search for the laspy package and locate the copc.py file within its subdirectories. Example:
    ~/.conda/envs/2024.02-py311/ps37/lib/python3.7/site-packages/laspy/copc.py
  • Edit copc.py (with Caution ❗)

    • Warning: Improper modification of system files can lead to unintended consequences. Proceed with caution and back up the file before making changes.
    • Assuming you've found copc.py, open it in a text editor (Notepad, vim, etc.). Locate the following line:
    from queue import Queue, SimpleQueue
    • Comment out this lines with #. Here's the modified code:
    #from queue import Queue, SimpleQueue
    • Before or after the commented out line, add the following code:
    from queue import Queue
    • Run Python and import laspy
    import laspy

About

Setting up anaconda environment for Plant Science data pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published