Initial README.md edit

nlichti · Jan 28, 2026 · 55b8dec · 55b8dec
1 parent 5328336
commit 55b8dec
Showing 1 changed file with 63 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -1,2 +1,65 @@
 # Beckett_et_al_2026-HF-Diet-FSR-code
 R code for Beckett at al (2026) The impact of high fat diet on global protein abundance and fractional synthetic rate in liver and mammary gland of peak lactation ICR mice.  
+
+## Contents
+There are two source code files:
+1. `run_fsr.R` executes the analysis.
+2. `Beckett_et_al_2026_fsr_functions.R` defines the top-level `maxQuantFsr()` function used to run the analysis, along with its internal functions and utilities to read data files, calculate theoretical isotope patterns, and perform other tasks. It
+   also includes data frames with information on amino acids and stable isotopes.
+
+## Basic usage
+1. If necessary, install [R](https://cran.r-project.org/) and (optionally) [RStudio](https://posit.co/downloads/).
+2. Download the code files to a directory on your local system (e.g., `~/fsr`, where `~` represents a path from your home directory).
+3. Place output files from maxQuant in a second directory (e.g., `~/fsr/mq_data`).
+4. In RStudio, call:
+   ```
+   install.packages(c("dplyr", "ggplot2", "gtools", "magrittr", "minpack.lm", "mvtnorm", "openxlsx", "pbapply", "readxl", "stringr"))
+   ```
+   to install required packages. Optionally, if you wish to use the `buildHyperCube()` function in `fsr_simulation_functions.R`, `"lhs"` should also be included in the list of packages above.
+6. Next, to run an analysis, call:
+   ```
+   setwd("~/fsr") # substitute your path here
+   source("Beckett_et_al_2026_fsr_functions.R")
+   result <- maxQuantFsr()
+   ```
+   By default, maxQuantFsr will call `choose.dir()` (from base R) and prompt you to select the folder that contains your maxQuant files.
+
+## Results
+Results are returned in the R workspace as a list object containing tibbles for rate estimates and tests, as well as auxillary metadata and quality control information. By default, the results are not saved to disk (i.e., `save = FALSE`). However, we **strongly** recommend setting `save = "xlsx"`.  This will post the results to an Excel workbook in the folder `./mq_folder/fsr_estimation`. The workbook contains the following sheets:
+* readme - Contains basic metadata information about the dataset and analysis.
+* rates - rate estimates, standard errors, and confidence intervals for each protein (subdivided by experimental group if they exist). If rates are obtained for maxQuant protein groups (not recommended), then these groups are flagged as `TRUE` in the "Group" column.
+* tests - hypothesis testing results for each protein, including estimated differences in rate, standard errors and confidence intervals, effective degrees of freedom, t-values, raw p-values, and adjusted p-values.
+* metadata - Information on the type of model, whether the optimizer converged, and the number of time points, specimens, sequences, isotope peaks, and measurements for each protein.
+* quality control - information summarizing the results of quality control filters
+* not run - a list of proteins that are present in the maxQuant data but that were excluded from analyses by QC filters.
+* session - metadata on the R session and computing platform, including the random number seed used in the analysis.
+
+## Cautions
+Please note that changes in quality control criteria directly affect the data that are used for analyses. As a result, analyses run using different criteria can produce substantially different results and are not directly comparable. In addition, nonlinear least squares optimization algorithms are generally sensitive to initial parameter values. Minor variation from run-to-run is expected (this can be eliminated by setting a random number seed). Optimization algorithms may also occassionally become stuck in local (rather than global) minima. The code here uses multiple restarts with random initial values to minimize this risk.
+
+## Optional arguments
+There are several arguments to `maxQuantFsr()` that can be used to adjust the analysis:  
+* `mq_folder`: the path for a folder where the maxQuant files are located, as a charcater value. By default (`NULL`), the user will be prompted to select a folder.
+* `sample_data`: a file path or a data frame containing sample times and ' BWE estimates for each sample (i.e., each MaxQuant Raw file), as well as any variables referenced in `formula`. If `NULL` (th default), the sample data are assumed to be in a comma-separated values file named `"sampleData.csv"`, located in the `mq_folder`.
+* formula is an optional right-hand model formula for a linear predictor that relates FSR to experimental variables in `sample_data`. Note that this `formula` defines the study design, not the functional form of the FSR model to be fit. By default, `formula = ~ 1`, which assumes that all samples come from the same treatment group.
+* `model` - a keyword specifying the functional form of the FSR model to be fit. The default (`"e1c"`) is a one-compartment, exponential model. It is currently the only available option.
+* `contrasts` - a list providing a contrast specification. Defaults to all pairwise comparisons among unique experimental conditions defined by `formula`.
+* `time_unit` - the unit for time measurements in `sample_data`.
+* `protein_aliases` - an optional file or data frame containing two columns, `Alias` and `Name`, which identify proteins that should be combined or renamed. The value in the `Name` column will be retained. By default, no aliasing is performed.
+* `msms_count` - a numeric value giving the minimum proportion of samples with an isotope pattern for a given sequence. Only peptides that meet the criteria are analyzed. Defaults to `0.5`.
+* `scan_corr` - a numeric value setting the minimum within-sample correlation of isotope patterns for a given sequence, when > 1 pattern is present. Only peptides that meet the criteria are analyzed. Defaults to `0.8`.
+* `corr_method` the correlation method used to compare to `scan_corr`.  See `stats::cor` for details.  Defualts to `"spearman"`. 
+* `use_single_scans` - a logical value defauling to `FALSE`, which excludes peptides for which isotope distributions are available in only one scan. If `TRUE`, these are used for estimation.
+* `minSeq` an integer value $\ge$ 1, setting the minimum number of uniquely associated peptides that are required to do FSR calculation for a given protein. Defaults to 1.
+* `unique_only` - a logical value. By default (`TRUE`), estimation will be restricted to peptides that are uniquely associated with a single protein. If `FALSE`, estimates will also be generated for maxQuant protein groups.
+* `mMax` an integer value $ge$ 1 that sets the maximum number of isotopic peaks to consider. Smaller numbers may improve computational performance, but may also sacrifice information. Defaults to 10. 
+* `p_adjustment` - a method to be used for p-value adjustments. See the R function `stats::p.adjust()`
+* `plot` - a logical value defaulting to `FALSE` and indication whether distribution plots be drawn
+* `save` - a logical value defaulting to `FALSE`; If `xlsx`, results will be saved to disk as an Excel workbook located in a new folder in `~/mq_folder/fsr_estimation/fsr_results_<datestamp>.xlsx`,
+* `ncore` - an integer $\ge$ 1 setting the number of parallel processing threads
+* `seed` - an optional random number seed. By default, `seed` is set based on the system clock.
+* `.source` - the path to the R source file for the analysis functions. This is used only when `ncore` > 1. It defaults to the file name as published, placed in the current working directory (i.e., `./Beckett_et_at_2026_fsr_functions.R`). *You will need to change it if you alter the file name or place it in a different location and want to use parallel processing*. 
+* `...` - optional additional arguments passed to internal functions. Of these, the most useful are `level`, which sets the probability level for confidence intervals (default = 95%) and `imethod = "delta"` or `"simulation"`, to control the method of standard error estimation (the default delta method or a parametric bootstrap). See the source code for other options.
+
+
+