Contains scripts used to score primer sets.
- Pandas
- Numpy
Copy the PrimerScore.py file to either a valid location in the search path or to the current director.
The input data file must be in .xls format NOT .XLSX. Additionally, the file MUST be formmatted in the following manner, with each column representing the data for a single reaction for a specific primer set:
- First row should contain primer set IDs in the form of {Target}.{SetID}. For example, orf1ab.2.
- Second row should contain reaction type annotation.
- For negative reaction, put '-' in the cell.
- For positive reactions, put '+' in the cell.
- Rows three and onward should contain data.
NOTE:
- The data should be rectangular; this means that all data is of the same length. This has not been tested for data of different lengths.
- There should be an equal amount of positive and negatives for a speific primer set.
- For proper scoring, each primer set should have the same number of replicates
If the PrimerScore.py file is located in a valid path or the current directory, simply execute the following line to import the module.
import PrimerScore
To use the primer scoring method, the initialization method must first be called to set weights and tolerances.
To initialize with default weights, simply execute the following line of code:
PrimerScore.intialize()
This will initialize weights to the following value:
Metric | Weight |
---|---|
Average Maximum Intensity | 5 |
Max. Intensity Std. Dev. | 5 |
Reaction Time Std. Dev. | 10 |
Average Reaction Time | 20 |
Number of False Positives | 60 |
Additionally, the thresholds will be set at:
Threshold | Value |
---|---|
Exponential Phase | 3000 |
Plateau Phase | 200 |
Positive Reaction Threshold | 0.9 |
False Positive Threshold | 0.2 |
Finally, the number of replicates will be set at 4:
Parameter | Value |
---|---|
Number of Replicates | 4 |
Instrument Saturation/Maximum Intensity | 140000 |
Positive Threshold Percentage | 0.1 |
The thresholds above are presently non-functional, but are retained for potential future use. Reaction time is determined as the maximum of the second derivative of the intensity over time.
To use custom weightings, execute the following line of code with the given arguments.
PrimerScoring.initialize(set_weights, set_thresholds, set_replicates, set_instrumentMax, set_threshold_perc)
The arguments in the above expression are defined as:
- set_weights: An array containing weights for the following metrics in the following order:
- Avg. Max. Intensity
- Max. Intensity Std. Dev.
- Reaction time Std. Dev.
- Avg. Reaction Time
- Number of False Positives
- THE ARRAY MUST CONTAIN WEIGHTS FOR ALL OF THE ABOVE METRICS.
- set_thresholds: An array contianing weights for the threshold tolerances of the exponential and plateau phases, positive reaction threshold, and false positive threshold, respectively. Whereas this value can be set, it is currently not in use.
- set_replicates: An integer value greater than 3 for the number of replicates. Must be the same for all primer sets scored.
An example to initialize to default settings as above would be as follows:
PrimerScoring.intialize([5,5,10,20,60], [3000, 200, 0.9, 0.2], 4, 140000, 0.1)
To calculate the scores for each primer, execute the following line of code with the given arguments:
PrimerScore.scorePrimers(primerData, output)
The arguments in the above line of code are defined as follow:
- primerData: Path (relative or absolute) or file name of excel (.xls) file containing primer data formated properly as outlined above.
- output: Output excel file (.xlsx) name or path containing primer scores and metrics. Please include .xslx extension.
An example to take an input file "WaterScore.xls" and write values to "PrimerScores_Water.xlsx" is as follows:
PrimerScore.scorePrimer('WaterScore.xls', 'PrimerScores_Water.xlsx')
After scoring primers as outlined above, the output file will have the following column headers:
- Primer Set: Column contains Primer Set ID (copied from input)
- TruePos: Column contains the number of true positives (completed reaction)
- Intensity_Avg: Calculated average maximum intensity.
- Intensity_StdDev: Calculated maximum intensity standard deviation.
- RxnTime_Avg: Calculated average reaction time.
- RxnTime_StdDev: Calculated reaction time standard deviation.
- False_Positives: Number of total false positives.
- FP_...: There will be a false positive column for each replicate containing the contribution of that false positive to the overall score.
- Overall Score: Calculated overall score for each primer set.
LAMP amplification reactions typically produce a sigmoidal amplification; however, given fluorometric methods typically have some background auto-flourescence or variable response over time, it is not sufficient to simply check for an increase in signal over time. To this end, the following methodology was used to determine a "positive amplification", regardless of designation (true positive or No Template Control (NTC)):
- The series containing the fluorometric reads over time was duplicated and reversed.
- The intersection of the series and the reversed series was determined by the time point at which the forward time-series first exceeded the reverse time-series.
- Two vectors were created, one for the forward time series and one for the reverse, using the following definition:
$\overrightarrow{\text{Forward/Reverse Data Vector}} = <\text{Time of Intersection }, y_{Intersection} - y_0>$
- The cosine between to two vectors was calculated
$\theta = \arccos(\frac{\overrightarrow{\text{Forward}}\cdot\overrightarrow{\text{Reverse}}}{|\overrightarrow{\text{Forward}}| | \overrightarrow{\text{Reverse}}|})$
- Using the understanding that
$\cos(x) \approx 1$ if$x \approx 0 $ , we check that the$\cos{\theta} > 0.95$ assuming a 95% error in the approximatation. Essentially, these steps are checking to see if our data is "flat" or relatively constant within error.- If it is, we will return that this reaction is not a positive amplification
- If it is not, continue to step 6.
- Check if the maximum of the time series is above some threshold percentage of the maximum fluorescent intensity of the instrument, and if it does return a positive amplification.
- Default parameters for maximum fluorescent intensity taken on an Analytik-Jena qTower 3G is approximately 140000 Relative Fluorescence Units.
- Default threshold percentage is 10%.
Reaction time is determined as the maximum of the 2nd derivative of the fluorescent time series data. This is implemented using numpy.gradient
.
False positives are undesireable in the context of the developed diagnostics and hence are weighted very strongly to filter out primer sets that produce false positives. Additionally, it is possible to have a one-off or rare occurrence false positive due to operator error or contamination, rather than an inherent interaction of the primers in the primer set, which should be strongly discouraged.
To this end, a "progressive" penalty for increasing occurrence of false positives was implemented to select for primer sets with less "persistent" false positives.