Fits TOP model with M5 bins — fit_TOP_M5

Fits TOP model with M5 bins. By default, it runs Gibbs sampling for all 10 partitions in parallel on 10 CPU cores, and returns a list of posterior samples for each of the 10 partitions. Alternatively, you may fit model for each of the 10 the partitions on separate machines by specifying which partition to run.

fit_TOP_M5_model(
  all_training_data,
  all_training_data_files,
  model_file,
  logistic_model = FALSE,
  transform = c("asinh", "log2", "sqrt", "none"),
  partitions = 1:10,
  n_iter = 2000,
  n_burnin = floor(n_iter/2),
  n_chains = 3,
  n_thin = max(1, floor((n_iter - n_burnin)/1000)),
  n_cores = length(partitions),
  save = TRUE,
  outdir = "TOP_fit",
  return_type = c("samples", "jagsfit", "samplefiles"),
  quiet = FALSE
)

Arguments

all_training_data: A list of the assembled training data of all partitions.
all_training_data_files: A vector of the assembled training data files of all partitions. If all_training_data is missing, it will load the training data from all_training_data_files.
model_file: TOP model file written in JAGS. By default, use the model file included in the TOP package.
logistic_model: Logical; whether to use the logistic version of TOP model. If logistic_model = TRUE, use the logistic version of TOP model. If logistic_model = FALSE, use the quantitative occupancy model (default).
transform: Type of transformation for ChIP-seq read counts. Options are: ‘asinh’(asinh transformation), ‘log2’ (log2 transformation), ‘sqrt’ (square root transformation), and ‘none’(no transformation). This only applies when logistic_model = FALSE.
partitions: A vector of selected partition(s) to run. Default: all 10 partitions. If you specify a few partitions, it will only fit models to data in those selected partitions.
n_iter: Number of total iterations per chain, including burn-in iterations.
n_burnin: Length of burn-in iterations, i.e. number of samples to discard at the beginning. Default is n_iter/2, discarding the first half of the samples.
n_chains: Number of Markov chains (default: 3).
n_thin: Thinning rate, must be a positive integer. Default is max(1, floor(n_chains * (n_iter-n_burnin) / 1000)) which will only thin if there are at least 2000 simulations. No thinning will be performed if n_thin = 1.
n_cores: Number of cores to use in parallel (default: equal to the number of partitions, i.e. length(partitions)).
save: Logical, if TRUE, saves posterior samples as ‘.rds’ files in outdir.
outdir: Directory to save TOP model posterior samples.
return_type: Type of result to return. Options: ‘samples’(posterior samples), ‘jagsfit’ (jagsfit object), or ‘samplefiles’ (file names of posterior samples).
quiet: Logical, if TRUE, suppress model fitting messages. Otherwise, only show progress bars.

Value

A list of posterior samples or jagsfit object for each partition.

Examples

if (FALSE) {
# Example to train TOP quantitative occupancy model:

# The example below first performs 'asinh' transform to the ChIP-seq counts
# in 'assembled_training_data', then runs Gibbs sampling
# for each of the 10 partitions in parallel.
# The following example runs 5000 iterations of Gibbs sampling in total,
# including 1000 burn-ins, with 3 Markov chains, at a thinning rate of 2,
# and saves the posterior samples to the 'TOP_fit' directory.
all_TOP_samples <- fit_TOP_M5_model(assembled_training_data,
                                    logistic_model = FALSE,
                                    transform = 'asinh',
                                    n_iter = 5000,
                                    n_burnin = 1000,
                                    n_chains = 3,
                                    n_thin = 2,
                                    out_dir = 'TOP_fit')

# We can also obtain the posterior samples separately for each partition,
# For example, to obtain the posterior samples for partition #3 only:
TOP_samples_part3 <- fit_TOP_M5_model(assembled_training_data,
                                      logistic_model = FALSE,
                                      transform = 'asinh',
                                      partitions = 3,
                                      n_iter = 5000,
                                      n_burnin = 1000,
                                      n_chains = 3,
                                      n_thin = 2,
                                      out_dir = 'TOP_fit')


# Example to train TOP logistic (binary) model:
all_TOP_samples <- fit_TOP_M5_model(assembled_training_data,
                                    logistic_model = TRUE,
                                    n_iter = 5000,
                                    n_burnin = 1000,
                                    n_chains = 3,
                                    n_thin = 2,
                                    out_dir = 'TOP_fit')

}