Skip to content

Code used in the scoring of primers developed by the Verma Lab

License

Notifications You must be signed in to change notification settings

VermaLab/PrimerScoring

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

PrimerScoring

Contains scripts used to score primer sets.

Setup

Dependencies

  • Pandas
  • Numpy

Installation

Copy the PrimerScore.py file to either a valid location in the search path or to the current director.

Usage

Data formatting

The input data file must be in .xls format NOT .XLSX. Additionally, the file MUST be formmatted in the following manner, with each column representing the data for a single reaction for a specific primer set:

  • First row should contain primer set IDs in the form of {Target}.{SetID}. For example, orf1ab.2.
  • Second row should contain reaction type annotation.
    • For negative reaction, put '-' in the cell.
    • For positive reactions, put '+' in the cell.
  • Rows three and onward should contain data.

NOTE:

  1. The data should be rectangular; this means that all data is of the same length. This has not been tested for data of different lengths.
  2. There should be an equal amount of positive and negatives for a speific primer set.
  3. For proper scoring, each primer set should have the same number of replicates

Execution

Importing

If the PrimerScore.py file is located in a valid path or the current directory, simply execute the following line to import the module.

import PrimerScore

Initialization

To use the primer scoring method, the initialization method must first be called to set weights and tolerances.

To initialize with default weights, simply execute the following line of code:

PrimerScore.intialize()

This will initialize weights to the following value:

Metric Weight
Average Maximum Intensity 5
Max. Intensity Std. Dev. 5
Reaction Time Std. Dev. 10
Average Reaction Time 20
Number of False Positives 60

Additionally, the thresholds will be set at:

Threshold Value
Exponential Phase 3000
Plateau Phase 200
Positive Reaction Threshold 0.9
False Positive Threshold 0.2

Finally, the number of replicates will be set at 4:

Parameter Value
Number of Replicates 4
Instrument Saturation/Maximum Intensity 140000
Positive Threshold Percentage 0.1

The thresholds above are presently non-functional, but are retained for potential future use. Reaction time is determined as the maximum of the second derivative of the intensity over time.

To use custom weightings, execute the following line of code with the given arguments.

PrimerScoring.initialize(set_weights, set_thresholds, set_replicates, set_instrumentMax, set_threshold_perc)

The arguments in the above expression are defined as:

  • set_weights: An array containing weights for the following metrics in the following order:
    • Avg. Max. Intensity
    • Max. Intensity Std. Dev.
    • Reaction time Std. Dev.
    • Avg. Reaction Time
    • Number of False Positives
    • THE ARRAY MUST CONTAIN WEIGHTS FOR ALL OF THE ABOVE METRICS.
  • set_thresholds: An array contianing weights for the threshold tolerances of the exponential and plateau phases, positive reaction threshold, and false positive threshold, respectively. Whereas this value can be set, it is currently not in use.
  • set_replicates: An integer value greater than 3 for the number of replicates. Must be the same for all primer sets scored.

An example to initialize to default settings as above would be as follows:

PrimerScoring.intialize([5,5,10,20,60], [3000, 200, 0.9, 0.2], 4, 140000, 0.1)

Scoring

To calculate the scores for each primer, execute the following line of code with the given arguments:

PrimerScore.scorePrimers(primerData, output)

The arguments in the above line of code are defined as follow:

  • primerData: Path (relative or absolute) or file name of excel (.xls) file containing primer data formated properly as outlined above.
  • output: Output excel file (.xlsx) name or path containing primer scores and metrics. Please include .xslx extension.

An example to take an input file "WaterScore.xls" and write values to "PrimerScores_Water.xlsx" is as follows:

PrimerScore.scorePrimer('WaterScore.xls', 'PrimerScores_Water.xlsx')

Output

After scoring primers as outlined above, the output file will have the following column headers:

  • Primer Set: Column contains Primer Set ID (copied from input)
  • TruePos: Column contains the number of true positives (completed reaction)
  • Intensity_Avg: Calculated average maximum intensity.
  • Intensity_StdDev: Calculated maximum intensity standard deviation.
  • RxnTime_Avg: Calculated average reaction time.
  • RxnTime_StdDev: Calculated reaction time standard deviation.
  • False_Positives: Number of total false positives.
  • FP_...: There will be a false positive column for each replicate containing the contribution of that false positive to the overall score.
  • Overall Score: Calculated overall score for each primer set.

Methodology

Positive amplification detection

LAMP amplification reactions typically produce a sigmoidal amplification; however, given fluorometric methods typically have some background auto-flourescence or variable response over time, it is not sufficient to simply check for an increase in signal over time. To this end, the following methodology was used to determine a "positive amplification", regardless of designation (true positive or No Template Control (NTC)):

  1. The series containing the fluorometric reads over time was duplicated and reversed.
  2. The intersection of the series and the reversed series was determined by the time point at which the forward time-series first exceeded the reverse time-series.
  3. Two vectors were created, one for the forward time series and one for the reverse, using the following definition:
    • $\overrightarrow{\text{Forward/Reverse Data Vector}} = <\text{Time of Intersection }, y_{Intersection} - y_0>$
  4. The cosine between to two vectors was calculated
    • $\theta = \arccos(\frac{\overrightarrow{\text{Forward}}\cdot\overrightarrow{\text{Reverse}}}{|\overrightarrow{\text{Forward}}| | \overrightarrow{\text{Reverse}}|})$
  5. Using the understanding that $\cos(x) \approx 1$ if $x \approx 0 $, we check that the $\cos{\theta} > 0.95$ assuming a 95% error in the approximatation. Essentially, these steps are checking to see if our data is "flat" or relatively constant within error.
    • If it is, we will return that this reaction is not a positive amplification
    • If it is not, continue to step 6.
  6. Check if the maximum of the time series is above some threshold percentage of the maximum fluorescent intensity of the instrument, and if it does return a positive amplification.
    • Default parameters for maximum fluorescent intensity taken on an Analytik-Jena qTower 3G is approximately 140000 Relative Fluorescence Units.
    • Default threshold percentage is 10%.

Reaction time

Reaction time is determined as the maximum of the 2nd derivative of the fluorescent time series data. This is implemented using numpy.gradient.

Weighting of False Positives

False positives are undesireable in the context of the developed diagnostics and hence are weighted very strongly to filter out primer sets that produce false positives. Additionally, it is possible to have a one-off or rare occurrence false positive due to operator error or contamination, rather than an inherent interaction of the primers in the primer set, which should be strongly discouraged.

To this end, a "progressive" penalty for increasing occurrence of false positives was implemented to select for primer sets with less "persistent" false positives.

About

Code used in the scoring of primers developed by the Verma Lab

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages