Completed methodology of documentation

VermaLab · Apr 2, 2025 · db7380e · db7380e
1 parent 5c8e6d8
commit db7380e
Showing 1 changed file with 26 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -115,6 +115,9 @@ The arguments in the above line of code are defined as follow:
 
 # Methodology
 
+## Nomenclature
+The weights input during the initialization procedue are inidcated by $\omega_{x}$ where $x$ is a given feature and can be $\bar{I}$ for average maximum intensity, $\sigma_{I}$ for maximum intensity standard deviation, $\sigma_{t_{rxn}}$ for reaction time standard deviation, $\bar{t_{rxn}}$ for average reaction time, FP for false positives.
+
 ## Positive amplification detection
 LAMP amplification reactions typically produce a sigmoidal amplification; however, given fluorometric methods typically have some background auto-flourescence or variable response over time, it is not sufficient to simply check for an increase in signal over time. To this end, the following methodology was used to determine a "positive amplification", regardless of designation (true positive or No Template Control (NTC)):
 
@@ -131,10 +134,31 @@ LAMP amplification reactions typically produce a sigmoidal amplification; howeve
     - Default parameters for maximum fluorescent intensity taken on an Analytik-Jena qTower 3G is approximately 140000 Relative Fluorescence Units.
     - Default threshold percentage is 10%. 
 
+## False negatives
+If any reaction labelled "+" is detected as a negative amplification, it is labelled as a false negative. Given that there are no checks on the number of replicates a user inputs in this script to ensure that the statistical "power" for averages and standard deviations is comparable across all compared primer sets for scoring, any false negative reaction automatically results in a score of 0 for that primer set and it is removed from consideration for scoring metrics.
+
 ## Reaction time
 Reaction time is determined as the maximum of the 2nd derivative of the fluorescent time series data. This is implemented using [`numpy.gradient`](https://numpy.org/devdocs/reference/generated/numpy.gradient.html).
 
+Penalties are incurred for later reaction times, thus the feature that is used for scoring is the value $60 - t_{rxn}$
+
+## Average and standard deviation of reaction features
+All averages and standard deviations are calculated from individual reaction metrics over all replicates that are labelled as positive reactions in input data and detected as positives.
+
+
 ## Weighting of False Positives
-False positives are undesireable in the context of the developed diagnostics and hence are weighted very strongly to filter out primer sets that produce false positives. Additionally, it is possible to have a one-off or rare occurrence false positive due to operator error or contamination, rather than an inherent interaction of the primers in the primer set, which should be strongly discouraged. 
+False positives are undesireable in the context of the developed diagnostics and hence are weighted very strongly to filter out primer sets that produce false positives. Additionally, it is possible to have a one-off or rare occurrence false positive due to operator error or contamination, rather than an inherent interaction of the primers in the primer set, which should be strongly discouraged. When a reaction is labelled as "-" in the input data, but is detected as a positive during the positive amplification detection, it is labelled as a false positive.  
+
+To this end, a "progressive" penalty for increasing occurrence of false positives was implemented to select for primer sets with less "persistent" false positives. This is accomplished by dividing the total weight allocated to false positives during initialization by a factor, $\alpha = \omega_{FP} / \sum_{i=1}^{n} (i)$ where $n$ is the number of replicates. This factor is then increased linearly for increasing numbers of False Positives in a given reaction by multiplying the false positive order by $\alpha$ (i.e. the first positive receives a penalty of $\alpha$, the second false positive receives a penalty of $2 \cdot \alpha$, etc.). 
+
+Furthermore, an overall "reaction penalty" $\left( \Omega \right)$ is calculated by multiplying the maximum intensity of a replicate by the reaction time of a replicate. This penalty is on a per reaction basis, *not* averaged across all replicates. In this manner, if a reaction is detected as a false positive, but only amplifies a small amount compared to other reactions, it is not penalized as heavily. Likewise, late stage false postives are penalized less. 
+
+Lastly, for all false positive calculations, primer sets being compared must all have at least $i$ false positives. Therefore, a primer set with 3 primer sets will only be compared against all primer sets also with at least 3 false positives. This analysis is conducted for all numbers between 1 and the number of replicates. The value that is weighted is the reaction penalty for the $i$ th false positive when all reaction penalties are sorted in ascending order for each primer set. The resulting value from each primer set is then compared and weighted in a manner similar to other reaction features.
+
+## Scoring
 
-To this end, a "progressive" penalty for increasing occurrence of false positives was implemented to select for primer sets with less "persistent" false positives.
+Once all primer sets have had primer set performance features calculated, an overall score is calculated by ordering primer sets and weighting a primer sets individual score according to its placement in the resulting order amongst all primer sets. This is achieve using the following formulation:
+$$
+S_k = \omega_{\bar{I}} \cdot \left( 1 - \frac{max \left( \bar{I} \right) - \bar{I}_k}{Range \left( \bar(I) \right)} \right) + \sum _x \omega _x \cdot \left( 1 - \frac{\text{min}(x) - x_k}{\text{Range}(x))} \right) + \sum _{i=0} ^n \left( i  \cdot \alpha \cdot \left( 1 - \frac{\phi \left( \Omega \right)_i}{\text{max}(\phi (\Omega))_i} \right) \right)
+$$
+where $k$ is a given primer set, $x$ indicates a given feature, $\omega_x$ is the weight allocated to feature $x$, $x_k$ is the feature value for primer set $k$, $\alpha$ is the false positive weighting factor, $n$ is the number of replicates, $\Omega_i$ is the reaction penalty for reaction $i$, and $\phi$ is the set of reaction penalities for each *false positive* reaction for a specific primer set ordered from smallest to largest such that an element $\Omega \in \phi$ if a given primer set has at least $i$ false positives, and $\text{max}(\phi (\Omega))_i $ is the maximum value of the $i$th reaction penalty of each primer set containing at least $i$ false positives.