R/predict_occupancy.R
predict_TOP.Rd
Predicts quantitative TF occupancy or TF binding probability using TOP model trained from ChIP-seq read counts or binary labels.
A data frame containing motif PWM score and DNase (or ATAC) bins.
A list containing the posterior mean of TOP regression coefficients.
TF name to make predictions for.
It will find the model parameters trained for this TF.
This is not needed (not used) when level = 'top'
.
Cell type to make predictions for.
It will find the model parameters trained for this cell type.
This is not needed (not used) when level = 'middle'
or level = 'top'
.
Uses pretrained model if TOP_coef
is not supplied.
Options: ‘ATAC’, ‘DukeDNase’, ‘UwDNase’.
TOP model level to use.
Options: ‘best’, ‘bottom’, ‘middle’, or ‘top’.
When level = 'best'
, uses the best (lowest available) level of the
hierarchy for the TF x cell type combination.
If the TF motif and cell type is available in the training data,
then uses the bottom level (TF- and cell-type-specific model).
otherwise, if TF motif (but not cell type) is available in the training data,
chooses the middle level (TF-specific model) of that TF motif;
otherwise, uses the top level TF-generic model.
When level = 'bottom'
, uses the bottom level (TF- and cell-type-specific model),
if the TF motif and cell type is available in the training data.
When level = 'middle'
, uses the middle level (TF-specific model) of that TF.
When level = 'top'
, uses the top level TF-generic model.
Logical. Whether to use the logistic version of TOP model.
If logistic_model = TRUE
,
uses the logistic version of TOP model to predict TF binding probability.
If logistic_model = FALSE
, uses the quantitative occupancy model (default).
Type of transformation performed for ChIP-seq read counts
when preparing the input training data.
Options are: ‘asinh’(asinh transformation),
‘log2’ (log2 transformation),
‘sqrt’ (sqrt transformation),
and ‘none’ (no transformation).
This only applies when logistic_model = FALSE
.
Returns a list with the following elements,
TOP model name.
selected hierarchy level.
posterior mean of regression coefficients.
a data frame with the data and predicted values.
if (FALSE) {
# Predicts CTCF occupancy in K562 using the quantitative occupancy model:
# Predicts using the 'bottom' level model
result <- predict_TOP(data, TOP_coef,
tf_name = 'CTCF', cell_type = 'K562',
level = 'bottom',
logistic_model = FALSE,
transform = 'asinh')
# Predicts using the 'best' model
# Since CTCF in K562 cell type is included in training,
# the 'best' model is the 'bottom' level model.
result <- predict_TOP(data, TOP_coef,
tf_name = 'CTCF', cell_type = 'K562', level = 'best',
logistic_model = FALSE,
transform = 'asinh')
# We can use the 'middle' model to predict CTCF in K562
# or other cell types or conditions
result <- predict_TOP(data, TOP_coef,
tf_name = 'CTCF', level = 'middle',
logistic_model = FALSE,
transform = 'asinh')
# Predicts CTCF binding probability using the logistic version of the model:
# No need to set the argument for 'transform' for the logistic model.
# Predicts using the 'bottom' level model
result <- predict_TOP(data, TOP_coef,
tf_name = 'CTCF', cell_type = 'K562',
level = 'best',
logistic_model = TRUE)
# Predicts using the 'middle' level model
result <- predict_TOP(data, TOP_coef,
tf_name = 'CTCF', level = 'middle',
logistic_model = TRUE)
# If TOP_coef is not specified, it will automatically use the
# pretrained models included in the package.
# Predicts using pretrained ATAC quantitative occupancy model
result <- predict_TOP(data,
tf_name = 'CTCF', cell_type = 'K562',
use_model = 'ATAC', level = 'best',
logistic_model = FALSE,
transform = 'asinh')
# Predicts using pretrained ATAC logistic model
result <- predict_TOP(data,
tf_name = 'CTCF', cell_type = 'K562',
use_model = 'ATAC', level = 'best',
logistic_model = TRUE)
}