diff --git a/PrimerInclusivity/LAMPAnalysisWorkflow.ipynb b/PrimerInclusivity/LAMPAnalysisWorkflow.ipynb new file mode 100644 index 0000000..6cb4a60 --- /dev/null +++ b/PrimerInclusivity/LAMPAnalysisWorkflow.ipynb @@ -0,0 +1,666 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# LAMP Primer Inclusivity Workflow\n", + "\n", + "This code contains v1.0 of the LAMP inclusivity workflow.
\n", + "Author: Josiah Davidson
\n", + "Release data: 6 Jan 2026\n", + "\n", + "\n", + "

© 2026 - Purdue Research Foundation. LAMP Primer Inclusivity Workflow by Josiah Davidson is licensed under CC BY-NC 4.0\"\"\"\"\"\"

" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Accessing and Condensing target query sequences\n", + "\n", + "### Dependenices\n", + "This code is intended to be used on a linux OS running python 3.9 or above. The code has not been tested for other configurations. \n", + "\n", + "To properly function, this notebook requires the following dependencies:\n", + "- [Biopython](https://biopython.org/wiki/Download) (for handling sequences)\n", + "- msa.sh (for alignment of sequences; contained inside [BBMap](https://sourceforge.net/projects/bbmap/))\n", + "- [pandas](https://pypi.org/project/pandas/) (for construction of dataframes)\n", + "- [xlwt](https://pypi.org/project/xlwt/) (for writing of excel files)\n", + "\n", + "### Configuration of query sequences\n", + "Query sequences for this analysis should be placed inside the folder `data\\query`. The following script wil then concatenate all fasta files together into one ouput `QUERY_LIBRARY.fasta` file in the `data` folder. Ensure files are named by intended target as this will be used when sorting sequences for inclusivity.\n", + "\n", + "Note: This script does not differentiate between target type. Therefore, ensure that all primers are intentionally aligned to sequences in this folder. This is of particular importance for closely related sequences when one or more primers may align to unintended target sequences." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "from Bio import SeqIO\n", + "\n", + "def build_query_library(query_fp: str = \"data/query\", output_fp: str = \"data/QUERY_LIBRARY.fasta\") -> None:\n", + " \"\"\"\n", + " Concatenate all FASTA files in data/query into data/QUERY_LIBRARY.fasta,\n", + " and append the source filename (without extension) to each sequence header.\n", + " Prints a confirmation upon completion.\n", + " \"\"\"\n", + " #Define query paths\n", + " query_dir = Path(query_fp)\n", + " output_file = Path(output_fp)\n", + " output_file.parent.mkdir(parents=True, exist_ok=True)\n", + "\n", + " #Define fasta extensions\n", + " fasta_exts = {\".fasta\", \".fa\", \".fna\", \".fas\"}\n", + " fasta_files = sorted(f for f in query_dir.iterdir() if f.is_file() and f.suffix.lower() in fasta_exts)\n", + "\n", + " #Return if no FASTA files found\n", + " if not fasta_files:\n", + " print(\"No FASTA files found in data/query.\")\n", + " return\n", + "\n", + " #Loop through all fasta files in directory\n", + " records_out = []\n", + " for fasta in fasta_files:\n", + " #Define stem as the filename which will be used to sort by target later one.\n", + " stem = fasta.stem # filename without extension\n", + "\n", + " #Loop through all records in fasta file\n", + " for rec in SeqIO.parse(fasta, \"fasta\"):\n", + " old_id = rec.id\n", + " new_id = f\"{stem}|{old_id}\"\n", + " rec.id = new_id\n", + " rec.name = new_id\n", + " # If description starts with old_id, replace that prefix; otherwise append.\n", + " if rec.description.startswith(old_id):\n", + " rec.description = new_id + rec.description[len(old_id):]\n", + " else:\n", + " rec.description = f\"{new_id} {rec.description}\"\n", + " records_out.append(rec)\n", + "\n", + " SeqIO.write(records_out, output_file, \"fasta\")\n", + " print(f\"✅ Concatenated {len(fasta_files)} files into {output_file}\")\n", + " print(f\"Total sequences written: {len(records_out)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "build_query_library()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Configuration of primers\n", + "\n", + "Place all primer files in fasta format in the folder `data\\primers`. The following script will then concatenate all fasta files together into one ouput `PRIMER_LIBRARY.fasta` file in the `data` folder.\n", + "\n", + "After creating the resulting compiled `fasta` file, we will read these into a dictionary. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "def create_primer_dict(fasta_path: str) -> dict[str, str]:\n", + " \"\"\"Return {primer_name: sequence} from a FASTA file.\"\"\"\n", + " return {rec.id: str(rec.seq) for rec in SeqIO.parse(fasta_path, \"fasta\")}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "output_fasta_path = \"[INSERT FULL PATH OF OUTPUT FILE HERE]\"\n", + "primerDict = create_primer_dict(output_fasta_path)\n", + "primerDict" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Align primers to sequences\n", + "\n", + "Primer identity can be returned by looking at the CIGAR string from the `.sam` file resulting from the ouput of `msa.sh` in `BBMap`. For more details on the installation of `msa.sh`, see the [BBMap documentation](https://sourceforge.net/projects/bbmap/).
\n", + "\n", + "To run `msa.sh` on multiple primers, execute the following code (**Note: Primer library and query library must be defined as above**):" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "from Bio import SeqIO\n", + "import subprocess, shlex\n", + "from concurrent.futures import ThreadPoolExecutor, as_completed\n", + "\n", + "\"\"\"\n", + "Align each primer with the query library and save output\n", + "\"\"\"\n", + "def run_msa_from_fasta(primers_library, query_library, outdir,msa_path, max_workers=None):\n", + " # Ensure output directory exists\n", + " Path(outdir).mkdir(parents=True, exist_ok=True)\n", + "\n", + " # Load primers from input fasta library\n", + " primers = [(rec.id, str(rec.seq)) for rec in SeqIO.parse(primers_library, \"fasta\")]\n", + "\n", + " # Get total number of primers.\n", + " total = len(primers)\n", + " print(f\"▶ Starting {total} mappings from {primers_library}...\")\n", + "\n", + " # Construct commands to run msa.sh against all primers and targets\n", + " cmds = {\n", + " f\"{msa_path} in={shlex.quote(query_library)} literal={shlex.quote(seq)} \"\n", + " f\"out={shlex.quote(f'{outdir}/{name}_Mapping.sam')}\": name\n", + " for name, seq in primers\n", + " }\n", + "\n", + " # Execute msa.sh in parallel\n", + " with ThreadPoolExecutor(max_workers=max_workers) as ex:\n", + " futures = {ex.submit(subprocess.run, cmd, shell=True): name for cmd, name in cmds.items()}\n", + " for i, fut in enumerate(as_completed(futures), 1):\n", + " name = futures[fut]\n", + " code = fut.result().returncode\n", + " status = \"✅\" if code == 0 else \"❌\"\n", + " print(f\"[{i}/{total}] {status} {name}\")\n", + "\n", + " print(f\"🏁 Completed all {total} mappings → {outdir}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\n", + "Modify the below paths to point to the correct input/output. \n", + "\n", + "primers_fasta := FASTA file containing all primers\n", + "query_fasta := FASTA file containing all targets for query\n", + "output_dir := Output directory for output alignment files (SAM)\n", + "msa_path := Path to msa.sh script\n", + "\n", + "max_workers := Number of threads to use (defaults to 10, change in function header to modify.)\n", + "\"\"\"\n", + "\n", + "\n", + "primers_fasta = \"[INSERT FULL PATH HERE]\"\n", + "query_fasta = \"[INSERT FULL PATH HERE]\"\n", + "output_dir = \"[INSERT FULL PATH HERE]\"\n", + "msa_progPath = \"[INSERT FULL PATH HERE]\"\n", + "run_msa_from_fasta(primers_fasta, query_fasta, output_dir, msa_progPath, max_workers=10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Alignment processing\n", + "\n", + "The following code processes all output SAM files to cacluate inclusivity. For proper functioning, do not modify output file paths or file names." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path\n", + "import pandas as pd\n", + "\n", + "# Parse all .sam files in sam_dir and return a DataFrame\n", + "def parse_sam_directory(sam_dir: str | Path) -> pd.DataFrame:\n", + " \"\"\"\n", + " Parse all .sam files in sam_dir and return a DataFrame with columns:\n", + " Primer Set, Primer Type, Query, Accession, CIGAR, sequence.\n", + "\n", + " Rules:\n", + " - Ignore first line (header)\n", + " - entry[2]: split at '|'\n", + " * before '|' → Query\n", + " * between first '|' and first ' ' → Accession\n", + " - A line is considered if Query == \".\" from file name split by '.'.\n", + " - Primer Set = everything before last '.' in \n", + " - Primer Type = everything after last '.' in \n", + " \"\"\"\n", + " # Construct path for input SAM directory\n", + " sam_dir = Path(sam_dir)\n", + "\n", + " # Construct empty array for iterative creation of rows in DF.\n", + " rows = []\n", + "\n", + " # Loop through all SAM files in alphabetical order\n", + " for sam_path in sorted(sam_dir.glob(\"*.sam\")):\n", + "\n", + " # Extract file name from path\n", + " fname = sam_path.name\n", + " print(f\"Now processing {fname}\")\n", + "\n", + " # Split file name along delimiter according to naming format (\".\")\n", + " parts_dot = fname.split(\".\")\n", + "\n", + " # Skip improperly named files\n", + " if len(parts_dot) < 2:\n", + " continue\n", + "\n", + " # Extract target from filename\n", + " file_key = f\"{parts_dot[0]}.{parts_dot[1]}\" # e.g., BAV-3.E1b\n", + "\n", + " # Extract primer name and primer type from file name\n", + " primer_full = fname.split(\"_\", 1)[0] # e.g., BAV-3.E1b.1_B2\n", + "\n", + " # Extract primer name components\n", + " primer_parts = primer_full.split(\".\")\n", + "\n", + " # Construct primer_set and primer type from primer name components.\n", + " if len(primer_parts) > 1:\n", + " primer_set = \".\".join(primer_parts[:-1])\n", + " primer_type = primer_parts[-1]\n", + " else:\n", + " primer_set = primer_full\n", + " primer_type = \"\"\n", + "\n", + " # Open SAM file for reading\n", + " with sam_path.open(\"r\", encoding=\"utf-8\", errors=\"replace\") as f:\n", + " # Skip header row\n", + " next(f, None)\n", + "\n", + " # loop through all lines\n", + " for line in f:\n", + " # Skip empty lines\n", + " if not line.strip():\n", + " continue\n", + "\n", + " # Separate by fields\n", + " fields = line.rstrip(\"\\n\").split(\"\\t\")\n", + "\n", + " # Skip improperly formatted SAM files for alignments\n", + " if len(fields) <= 9:\n", + " continue\n", + "\n", + " # Extract entry and query\n", + " entry2 = fields[2]\n", + " query = entry2.split(\"|\", 1)[0].strip()\n", + "\n", + " # Skip queries not part of current primer target\n", + " if query != file_key:\n", + " continue\n", + "\n", + " # Split name from primer set\n", + " after_pipe = entry2.split(\"|\", 1)[1] if \"|\" in entry2 else \"\"\n", + "\n", + " # Extract accession field, CIGAR string for alignment, and sequence fields for determination of sequences\n", + " accession = after_pipe.split(\" \", 1)[0] if after_pipe else \"\"\n", + " cigar = fields[5]\n", + " seq = fields[9]\n", + "\n", + " # Append rows to dataframe\n", + " rows.append({\n", + " \"Primer Set\": primer_set,\n", + " \"Primer Type\": primer_type,\n", + " \"Query\": query,\n", + " \"Accession\": accession,\n", + " \"CIGAR\": cigar,\n", + " \"sequence\": seq\n", + " })\n", + "\n", + " # Return dataframe\n", + " return pd.DataFrame(rows, columns=[\"Primer Set\", \"Primer Type\", \"Query\", \"Accession\", \"CIGAR\", \"sequence\"])\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\"\"\"\"\n", + "Modify path below to point to input directory containing SAM alignment files.\n", + "\"\"\"\n", + "input_sam_dir = '[INSERT FULL PATH HERE]'\n", + "alignment_df = parse_sam_directory(input_sam_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code will pivot the processed alignments to match each primer set and accession with all available primer alignments for the respective pair." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "# Define primers for LAMP primer set. In theory, this can also be modified to account for other primer set types.\n", + "PRIMERS = [\"F3\",\"B3\",\"F2\",\"F1C\",\"B1C\",\"B2\",\"LF\",\"LB\"]\n", + "\n", + "def pivot_cigar_table(df: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Create a wide table:\n", + " Columns: ['Primer Set','Accession'] + one column per Primer Type\n", + " Values: CIGAR string for each (Primer Set, Accession, Primer Type)\n", + "\n", + " If duplicates exist per cell, the first CIGAR is taken.\n", + " Missing entries are left blank.\n", + " \"\"\"\n", + " # Pivot table create table width wise, reset index.\n", + " wide = (\n", + " df.pivot_table(\n", + " index=[\"Primer Set\", \"Accession\"],\n", + " columns=\"Primer Type\",\n", + " values=\"CIGAR\",\n", + " aggfunc=\"first\",\n", + " )\n", + " .reindex(columns=PRIMERS, fill_value=None)\n", + " .reset_index()\n", + " )\n", + "\n", + " # Ensure all requested Primer Type columns exist in order (fill with empty string)\n", + " for col in PRIMERS:\n", + " if col not in wide.columns:\n", + " wide[col] = \"\"\n", + "\n", + " # Replace NaNs with empty strings for cleaner display/CSV\n", + " return wide.fillna(\"\")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we will write the Data Frame to a csv file to mitigate the chances of needing to run the analysis again." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# The kernel must not have been reset before running this cell (or else alignment_df variable is no longer defined.)\n", + "output_csv_path = '[INSERT FULL PATH HERE]'\n", + "\n", + "# Construct pivot table, save to CSV, and display.\n", + "alignment_pivot_df = pivot_cigar_table(alignment_df)\n", + "alignment_pivot_df.to_csv(output_csv_path, sep='\\t')\n", + "alignment_pivot_df" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Calculating number of mismatches from CIGAR strings.\n", + "\n", + "To handle the `CIGAR` strings from `msa.sh`, we will use a Bioconda package called `cigar` which can be installed using the following code:
\n", + "\n", + "`conda install -c bioconda cigar`
\n", + "\n", + "Instructions for usage can be found __[here.](https://github.com/brentp/cigar)__" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "from cigar import Cigar" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We must first calculate the number of sequences that we have aligned our primer sets to for each primer set. Ideally, this will correspond to the number of sequences in our `QUERY_LIBRARY.fasta` file for each target, but this may not necessarily be the case if each all primers in a primer set did not align with a given query sequence (as may be the case for very low sequence identity)." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "# Define function to count primer sets given input dataframe. Returns dataframe with count.\n", + "def count_primer_sets(df: pd.DataFrame) -> pd.DataFrame:\n", + " return (\n", + " df[\"Primer Set\"]\n", + " .value_counts(dropna=False)\n", + " .rename_axis(\"Primer Set\")\n", + " .reset_index(name=\"count\")\n", + " .sort_values(\"Primer Set\", ignore_index=True)\n", + " )\n", + "\n", + "df_Counts = count_primer_sets(alignment_pivot_df)\n", + "df_Counts" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have the total count we now need to determine the number of mismatches from the CIGAR string." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "# Helper function to count mismatches. Given the operations of msa.sh, this function counts the matches and subtracts it from the total length to get the mismatches. Using CIGAR functions to count mismatches does not work.\n", + "def getTotalMismatches_CountM(cigarString, primerLen):\n", + " totalMatches = 0\n", + " # Split cigar string into components and loop through all components.\n", + " for item in list(Cigar(cigarString).items()):\n", + " # If component is an M or = character, add it to the total.\n", + " if (item[1] == \"=\") or (item[1] == \"M\"):\n", + " totalMatches = totalMatches + item[0]\n", + " # Return total mismatches as primer length minus the total matches.\n", + " return primerLen - totalMatches" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "from typing import Callable, Mapping, Sequence\n", + "\n", + "# Build mismatch table by using callable.\n", + "def build_mismatch_table(\n", + " df: pd.DataFrame,\n", + " primerDict: Mapping[str, str],\n", + " PRIMERS: Sequence[str],\n", + " getTotalMismatches_CountM: Callable[[str, int], int],\n", + ") -> pd.DataFrame:\n", + " out = df[[\"Primer Set\", \"Accession\"]].copy()\n", + "\n", + " def _cell_mm(row, p):\n", + " cigar = row.get(p, \"\")\n", + " if not isinstance(cigar, str) or cigar == \"\":\n", + " return pd.NA\n", + " key = f\"{row['Primer Set']}.{p}\"\n", + " seq = primerDict.get(key)\n", + " if not seq:\n", + " return pd.NA\n", + " return getTotalMismatches_CountM(cigar, len(seq))\n", + "\n", + " for p in PRIMERS:\n", + " out[p] = df.apply(lambda r, p=p: _cell_mm(r, p), axis=1)\n", + "\n", + " numeric = out[PRIMERS].apply(pd.to_numeric, errors=\"coerce\")\n", + " out[\"Total_Mismatches\"] = numeric.fillna(0).sum(axis=1).astype(int)\n", + " return out" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "# Construct mismatch table using previously define variables.\n", + "mm_table = build_mismatch_table(alignment_pivot_df, primerDict, PRIMERS, getTotalMismatches_CountM)\n", + "mm_table" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Construct primer set total counts across all sequences\n", + "\n", + "The following code will count the number of sequences with a given number of mismatches from 0 to 20+. Then, it will construct a second dataframe containing the proportion of sequences for a given target containing a given number of mismatches.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "# Define function to summary mismatch table for all sequences for each given primer set.\n", + "def summarize_mismatch_histogram(mm_table: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Build a summary table per Primer Set:\n", + " Columns: ['Primer Set', 'NumSeqs', '0','1',...,'19','20+']\n", + " - NumSeqs = total rows in mm_table for that Primer Set (including NaNs)\n", + " - 'k' = count of rows with Total_Mismatches == k\n", + " - '20+' = count of rows with Total_Mismatches >= 20\n", + " \"\"\"\n", + " mis_col = \"Total_Mismatches\" if \"Total_Mismatches\" in mm_table.columns else \"Total_Mismatch\"\n", + " if mis_col not in mm_table.columns:\n", + " raise KeyError(\"Mismatch column not found. Expected 'Total_Mismatches' or 'Total_Mismatch'.\")\n", + "\n", + " # Total rows per set (including rows with NaN mismatches)\n", + " numseqs = mm_table.groupby(\"Primer Set\").size()\n", + "\n", + " # Histogram bins per set (exclude NaNs for binning)\n", + " binned = (\n", + " mm_table.assign(_mm=pd.to_numeric(mm_table[mis_col], errors=\"coerce\"))\n", + " .dropna(subset=[\"_mm\"])\n", + " .assign(bin=lambda d: d[\"_mm\"].where(d[\"_mm\"] < 20, 20).astype(int))\n", + " .groupby([\"Primer Set\", \"bin\"])\n", + " .size()\n", + " .unstack(fill_value=0)\n", + " )\n", + "\n", + " # Ensure all desired columns exist (0..19, 20)\n", + " desired_bins = list(range(20)) + [20]\n", + " binned = binned.reindex(columns=desired_bins, fill_value=0)\n", + "\n", + " # Assemble result\n", + " out = binned.copy()\n", + " out.insert(0, \"NumSeqs\", numseqs) # aligned by index\n", + " # Rename columns to strings and '20' -> '20+'\n", + " rename_map = {i: str(i) for i in range(20)}\n", + " rename_map[20] = \"20+\"\n", + " out = out.rename(columns=rename_map).reset_index()\n", + "\n", + " # Column order\n", + " ordered_cols = [\"Primer Set\", \"NumSeqs\"] + [str(i) for i in range(20)] + [\"20+\"]\n", + " return out[ordered_cols]\n", + "\n", + "# Construct mismatch histogram dataframe and display.\n", + "mm_hist = summarize_mismatch_histogram(mm_table)\n", + "mm_hist\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [], + "source": [ + "# Convert mismatch histogram counts to proportions\n", + "def mismatch_histogram_proportions(mm_hist: pd.DataFrame) -> pd.DataFrame:\n", + " \"\"\"\n", + " Convert mismatch histogram counts to proportions per Primer Set.\n", + " Keeps ['Primer Set','NumSeqs'] and divides each mismatch bin by NumSeqs.\n", + " \"\"\"\n", + " out = mm_hist.copy()\n", + " mm_cols = [c for c in ([str(i) for i in range(20)] + [\"20+\"]) if c in out.columns]\n", + " out[mm_cols] = out[mm_cols].apply(pd.to_numeric, errors=\"coerce\")\n", + " denom = out[\"NumSeqs\"].replace(0, pd.NA).astype(float)\n", + " out[mm_cols] = out[mm_cols].div(denom, axis=0).fillna(0.0)\n", + " return out[[\"Primer Set\", \"NumSeqs\"] + mm_cols]\n", + "\n", + "# Construct dataframe and display\n", + "mm_inclusivity = mismatch_histogram_proportions(mm_hist)\n", + "mm_inclusivity" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import xlwt\n", + "\n", + "# Modify the variable below to define output file path\n", + "output_excel_path = '[INSERT FULL PATH HERE]'\n", + "\n", + "# Save to excel\n", + "with pd.ExcelWriter(output_excel_path) as writer:\n", + " mm_hist.to_excel(writer, sheet_name='SequenceNumber')\n", + " mm_inclusivity.to_excel(writer, sheet_name='Percentage')" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "biopy", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.16" + }, + "orig_nbformat": 4 + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/PrimerInclusivity/README.md b/PrimerInclusivity/README.md new file mode 100644 index 0000000..95f98e4 --- /dev/null +++ b/PrimerInclusivity/README.md @@ -0,0 +1,5 @@ + # Primer Inclusivity + +This folder contains a jupyter notebook documenting the primer inclusivity calculation workflow. Please refer to the jupyter notebook (`LAMPAnalysisWorkflow.ipynb`) for details on usage and setup. + +

© 2026 - Purdue Research Foundation. Primer Inclusivity Documentation by Josiah Davidson is licensed under CC BY-NC 4.0

\ No newline at end of file