Title: | Fitting Discrete Distribution Models to Count Data |
---|---|
Description: | Provides functions for fitting discrete distribution models to count data. Included are the Poisson, the negative binomial, the Poisson-inverse gaussian and, most importantly, a new implementation of the Poisson-beta distribution (density, distribution and quantile functions, and random number generator) together with a needed new implementation of Kummer's function (also: confluent hypergeometric function of the first kind). Three different implementations of the Gillespie algorithm allow data simulation based on the basic, switching or bursting mRNA generating processes. Moreover, likelihood functions for four variants of each of the three aforementioned distributions are also available. The variants include one population and two population mixtures, both with and without zero-inflation. The package depends on the 'MPFR' libraries (<https://www.mpfr.org/>) which need to be installed separately (see description at <https://github.com/fuchslab/scModels>). This package is supplement to the paper "A mechanistic model for the negative binomial distribution of single-cell mRNA counts" by Lisa Amrhein, Kumar Harsha and Christiane Fuchs (2019) <doi:10.1101/657619> available on bioRxiv. |
Authors: | Lisa Amrhein [aut, cre] , Kumar Harsha [aut] , Christiane Fuchs [aut] , Pavel Holoborodko [ctb] (Author and copyright holder of 'mpreal.h') |
Maintainer: | Lisa Amrhein <[email protected]> |
License: | GPL-3 |
Version: | 1.0.4 |
Built: | 2024-11-25 04:24:11 UTC |
Source: | https://github.com/cran/scModels |
Kummer's function (also: confluent hypergeometric function of the first kind) for numeric (non-complex) values and input parameters in log-scale.
chf_1F1(x, a, b)
chf_1F1(x, a, b)
x |
numeric value or vector |
a , b
|
numeric parameters of the Kummer function |
Note that the output is in log-scale. So the evaluated function is:
where and
describe the rising factorial.
x <- chf_1F1(-100:100, 5, 7) plot(-100:100, x, type='l')
x <- chf_1F1(-100:100, 5, 7) plot(-100:100, x, type='l')
Functions to estimate parameters of probability distributions by fitting the distributions using optim()
fit_params(x, type, optim_control = list(maxit = 1000))
fit_params(x, type, optim_control = list(maxit = 1000))
x |
Vector containing the discrete observations |
type |
Keyword for the probability distribution the data is to be fitted against. Possible values are ("pois", "nb", "del", pig", "pb", "pois2", "nb2", "del2", "pig2" "pb2", "zipois", "zinb", "zidel", "zipg", zipb", "zipois2", "zinb2", "zidel2", "zipig2", zipb2") |
optim_control |
List of options to override presets in the optim function; Set to list(maxit = 1000) by default. For more details, please refer to the 'control' parameter in the standard 'optim' function in package 'stats'. |
x1 <- rnbinom(100, size = 13, mu = 9) p1 <- fit_params(x1, "nb") s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x2 <- s*x1 + (1-s) * rnbinom(100, size = 15, mu = 53) p2 <- fit_params(x2, "nb2")
x1 <- rnbinom(100, size = 13, mu = 9) p1 <- fit_params(x1, "nb") s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x2 <- s*x1 + (1-s) * rnbinom(100, size = 15, mu = 53) p2 <- fit_params(x2, "nb2")
Gillespie algorithms allow synthetic data simulation via three different underlying mRNA generating processes: the basic process consists of a simple death-birth model of mRNA transcription and degradation; the switching process considers additionally gene activation and deactivation, with mRNA transcription only happening in active gene states; the bursting process, transcribes mRNA in bursts with geometrically distributed burst sizes. The basic_burst model combines both the basic and the burst model. The IGbasic burst model describes the basic model with non-constant transcription rates, but transcription rates follow an inverse Gaussian distribution governed by one parameter, the mean parameter of the inverse Gaussian distribution. Additionally a burst transcription occures (with NB distributed burst sizes), the whole burst (rate and burst sizes) are determined by the rate parameter.
gmRNA_basic(n, r.on, r.degr) gmRNA_switch(n, r.act, r.deact, r.on, r.degr) gmRNA_burst(n, r.burst, s.burst, r.degr) gmRNA_basic_burst(n, r.on, r.burst, s.burst, r.degr) gmRNA_IGbasic_burst(n, r.mu, r.burst, r.degr)
gmRNA_basic(n, r.on, r.degr) gmRNA_switch(n, r.act, r.deact, r.on, r.degr) gmRNA_burst(n, r.burst, s.burst, r.degr) gmRNA_basic_burst(n, r.on, r.burst, s.burst, r.degr) gmRNA_IGbasic_burst(n, r.mu, r.burst, r.degr)
n |
Number of observations |
r.on |
Transcription rate during gene activation (Switching model) |
r.degr |
mRNA degradation rate (all models) |
r.act |
DNA activation rate (Switching Model) |
r.deact |
DNA deactivation rate (Switching Model) |
r.burst |
Bursty transcription rate (Bursting model, Basic Burst model and IG Basic Burst model) |
s.burst |
Mean burst size (Bursting Model and Basic Burst model) |
r.mu |
Mean parameter for the inverse Gaussian distribution (IG Basic Burst model) |
x <- gmRNA_basic(100, 0.75, 0.001) plot(density(x)) x <- gmRNA_switch(100, 0.23, 0.15, 0.75, 0.001) plot(density(x)) x <- gmRNA_burst(10, 0.15, 0.75, 0.001) plot(density(x)) x <- gmRNA_basic_burst(10, 0.75, 0.15, 0.5, 0.001) plot(density(x)) x <- gmRNA_IGbasic_burst(10, 2, 0.5, 0.1) plot(density(x))
x <- gmRNA_basic(100, 0.75, 0.001) plot(density(x)) x <- gmRNA_switch(100, 0.23, 0.15, 0.75, 0.001) plot(density(x)) x <- gmRNA_burst(10, 0.15, 0.75, 0.001) plot(density(x)) x <- gmRNA_basic_burst(10, 0.75, 0.15, 0.5, 0.001) plot(density(x)) x <- gmRNA_IGbasic_burst(10, 2, 0.5, 0.1) plot(density(x))
random generation function for the inverse Gaussian distribution: Mu and lambda are the parameters of this distribution.
rInvGaus(n, mu, lambda)
rInvGaus(n, mu, lambda)
n |
Number of observations |
mu , lambda
|
Non-negative parameters of the inverse Gaussian distribution (mean and shape) |
RV <- rInvGaus(n = 100, mu = 10, lambda = 2)
RV <- rInvGaus(n = 100, mu = 10, lambda = 2)
The negative log Likelihood functions for Poisson, negative binomial, Delaporte, Poisson-inverse Gaussian and Poisson-beta distributions. Mixing two distributions of the same kind and/or adding zero-inflation allows to take characteristics of real data into account. Additionally, one population and two population mixtures - with and without zero-inflation - allow distribution fitting of the Poisson, negative binomial, Delaporte, Poisson-inverse Gaussian and the Poisson-beta distribution.
nlogL_pois(data, par.pois) nlogL_nb(data, par.nb) nlogL_del(data, par.del) nlogL_pig(data, par.pig) nlogL_pb(data, par.pb) nlogL_pois2(data, par.pois2) nlogL_nb2(data, par.nb2) nlogL_del2(data, par.del2) nlogL_pig2(data, par.pig2) nlogL_pb2(data, par.pb2) nlogL_zipois(data, par.zipois) nlogL_zinb(data, par.zinb) nlogL_zidel(data, par.zidel) nlogL_zipig(data, par.zipig) nlogL_zipb(data, par.zipb) nlogL_zipois2(data, par.zipois2) nlogL_zinb2(data, par.zinb2) nlogL_zidel2(data, par.zidel2) nlogL_zipig2(data, par.zipig2) nlogL_zipb2(data, par.zipb2)
nlogL_pois(data, par.pois) nlogL_nb(data, par.nb) nlogL_del(data, par.del) nlogL_pig(data, par.pig) nlogL_pb(data, par.pb) nlogL_pois2(data, par.pois2) nlogL_nb2(data, par.nb2) nlogL_del2(data, par.del2) nlogL_pig2(data, par.pig2) nlogL_pb2(data, par.pb2) nlogL_zipois(data, par.zipois) nlogL_zinb(data, par.zinb) nlogL_zidel(data, par.zidel) nlogL_zipig(data, par.zipig) nlogL_zipb(data, par.zipb) nlogL_zipois2(data, par.zipois2) nlogL_zinb2(data, par.zinb2) nlogL_zidel2(data, par.zidel2) nlogL_zipig2(data, par.zipig2) nlogL_zipb2(data, par.zipb2)
data |
Vector containing the discrete observations |
par.pois |
Scalar containing the lambda parameter of the Poisson distribution |
par.nb |
Vector of length 2, containing the size and the mu parameter of the negative binomial distribution |
par.del |
Vector of length 3, containing the mu, sigma and the nu parameter of the Delaporte distribution |
par.pig |
Vector of length 2, containing the mu and the sigma parameter of the Poisson-inverse Gaussian distribution |
par.pb |
Vector of length 3, containing the alpha, beta and c parameter of the Poisson-beta distribution |
par.pois2 , par.nb2 , par.del2 , par.pig2 , par.pb2
|
Vector containing the parameters of the two mixing distributions. First entry represents the fraction of the first distribution, followed by all parameters of the first, then all of the second distribution. |
par.zipois , par.zinb , par.zidel , par.zipig , par.zipb
|
Vector containing the respective zero-inflated distribution parameters. The additional first entry is the inflation parameter for all cases. |
par.zipois2 , par.zinb2 , par.zidel2 , par.zipig2 , par.zipb2
|
Parameters for the zero-inflated two population model. |
Functions nlogL_pois, nlogL_nb, nlogL_del, nlogL_pig, nlogL_pb compute the negative log-likelihood of Poisson, negative binomial, Poisson-inverse Gaussian and the Poisson-beta distributions given the data. Functions nlogL_pois2, nlogL_nb2, nlogL_del2, nlogL_pig2 and nlogL_pb2 compute the negative log-likelihood values for a two population mixture of distributions whereas nlogL_zipois, nlogL_zinb, nlogL_zidel, nlogL_zipig, nlogL_zipb compute the same for the zero-inflated distributions. Furthermore, nlogL_zipois2, nlogL_zinb2, nlogL_zidel2, nlogL_zipig2 and nlogL_zipb2 are for two population mixtures with zero-inflation.
x <- rpois(100, 11) nl1 <- nlogL_pois(x, 11) nl2 <- nlogL_pois(x, 13) x <- rnbinom(100, size = 13, mu = 9) nl <- nlogL_nb(x, c(13, 9)) x <- gamlss.dist::rDEL(100, mu = 5, sigma = 0.2, nu= 0.5) nl <- nlogL_del(x, c(5, 0.2, 0.5)) x <- gamlss.dist::rPIG(100, mu = 5, sigma = 0.2) nl <- nlogL_pig(x, c(5, 0.2)) x <- rpb(n = 1000, alpha=5, beta= 3, c=20) nl <- nlogL_pb(x, c(5, 3, 20)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s*rpois(100, 7) + (1-s)*rpois(100, 13) nl <- nlogL_pois2(x, c(0.3, 13, 7)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s*rnbinom(100, size = 13, mu = 9) + (1-s)*rnbinom(100, size = 17, mu = 29) nl <- nlogL_nb2(x, c(0.3, 17, 29, 13, 9)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s * gamlss.dist::rDEL(100, mu = 5, sigma = 0.2, nu = 0.5) + (1 - s) * gamlss.dist::rDEL(100, mu = 20, sigma = 2, nu = 0.1) nl <- nlogL_del2(x, c(0.7,5, 0.2, 20, 2)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s * gamlss.dist::rPIG(100, mu = 5, sigma = 0.2) + (1 - s) * gamlss.dist::rPIG(100, mu = 20, sigma = 2) nl <- nlogL_pig2(x, c(0.7, 5, 0.2, 20, 2)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s*rpb(100, 5, 3, 20) + (1-s)*rpb(100, 7, 13, 53) nl <- nlogL_pb2(x, c(0.7, 7, 13, 53, 5, 3, 20)) x <- c(rep(0, 10), rpois(90, 7)) nl <- nlogL_zipois(x, c(0.1, 7)) x <- c(rep(0,10), rnbinom(90, size = 13, mu = 9)) nl <- nlogL_zinb(x, c(0.1, 13, 9)) x <- c(rep(0,10), gamlss.dist::rDEL(90, mu = 13, sigma = 2, nu = 0.5)) nl <- nlogL_zidel(x, c(0.1, 13, 2, 0.5)) x <- c(rep(0,10), gamlss.dist::rPIG(90, mu = 13, sigma = 2)) nl <- nlogL_zipig(x, c(0.1, 13, 2)) x <- c(rep(0, 10), rpb(n = 90, alpha=5, beta= 3, c=20)) nl <- nlogL_zipb(x, c(0.1, 5, 3, 20)) s <- sample(x = c(0, 1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * rpois(90, 7) + (1 - s) * rpois(90, 13)) nl1 <- nlogL_zipois2(x, c(0.1, 0.63, 7, 13)) s <- sample(x = c(0, 1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * rnbinom(90, size = 13, mu = 9) + (1 - s) * rnbinom(90, size = 17, mu = 29)) nl <- nlogL_zinb2(x, c(0.1, 0.63, 13, 9, 17, 29)) s <- sample(x = c(0, 1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * gamlss.dist::rDEL(90, mu = 13, sigma = 9, nu = 0.5) + (1 - s) * gamlss.dist::rDEL(90, mu = 17, sigma = 29, nu = 0.1)) nl <- nlogL_zidel2(x, c(0.1, 0.63, 13, 9, 0.5, 17, 29, 0.1)) s <- sample(x = c(0,1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * gamlss.dist::rPIG(90, mu = 13, sigma = 0.2) + (1-s) * gamlss.dist::rPIG(90, mu = 17, sigma = 2)) nl <- nlogL_zipig2(x, c(0.1, 0.63, 13, 0.2, 17, 2)) s <- sample(x = c(0,1), size = 90, replace = TRUE, prob = c(0.3,0.7)) x <- c(rep(0,10), s*rpb(90, 5, 3, 20) + (1-s)*rpb(90, 7, 13, 53)) nl <- nlogL_zipb2(x, c(0.1, 0.63, 7, 13, 53, 5, 3, 20))
x <- rpois(100, 11) nl1 <- nlogL_pois(x, 11) nl2 <- nlogL_pois(x, 13) x <- rnbinom(100, size = 13, mu = 9) nl <- nlogL_nb(x, c(13, 9)) x <- gamlss.dist::rDEL(100, mu = 5, sigma = 0.2, nu= 0.5) nl <- nlogL_del(x, c(5, 0.2, 0.5)) x <- gamlss.dist::rPIG(100, mu = 5, sigma = 0.2) nl <- nlogL_pig(x, c(5, 0.2)) x <- rpb(n = 1000, alpha=5, beta= 3, c=20) nl <- nlogL_pb(x, c(5, 3, 20)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s*rpois(100, 7) + (1-s)*rpois(100, 13) nl <- nlogL_pois2(x, c(0.3, 13, 7)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s*rnbinom(100, size = 13, mu = 9) + (1-s)*rnbinom(100, size = 17, mu = 29) nl <- nlogL_nb2(x, c(0.3, 17, 29, 13, 9)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s * gamlss.dist::rDEL(100, mu = 5, sigma = 0.2, nu = 0.5) + (1 - s) * gamlss.dist::rDEL(100, mu = 20, sigma = 2, nu = 0.1) nl <- nlogL_del2(x, c(0.7,5, 0.2, 20, 2)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s * gamlss.dist::rPIG(100, mu = 5, sigma = 0.2) + (1 - s) * gamlss.dist::rPIG(100, mu = 20, sigma = 2) nl <- nlogL_pig2(x, c(0.7, 5, 0.2, 20, 2)) s <- sample(x = c(0,1), size = 100, replace = TRUE, prob = c(0.3,0.7)) x <- s*rpb(100, 5, 3, 20) + (1-s)*rpb(100, 7, 13, 53) nl <- nlogL_pb2(x, c(0.7, 7, 13, 53, 5, 3, 20)) x <- c(rep(0, 10), rpois(90, 7)) nl <- nlogL_zipois(x, c(0.1, 7)) x <- c(rep(0,10), rnbinom(90, size = 13, mu = 9)) nl <- nlogL_zinb(x, c(0.1, 13, 9)) x <- c(rep(0,10), gamlss.dist::rDEL(90, mu = 13, sigma = 2, nu = 0.5)) nl <- nlogL_zidel(x, c(0.1, 13, 2, 0.5)) x <- c(rep(0,10), gamlss.dist::rPIG(90, mu = 13, sigma = 2)) nl <- nlogL_zipig(x, c(0.1, 13, 2)) x <- c(rep(0, 10), rpb(n = 90, alpha=5, beta= 3, c=20)) nl <- nlogL_zipb(x, c(0.1, 5, 3, 20)) s <- sample(x = c(0, 1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * rpois(90, 7) + (1 - s) * rpois(90, 13)) nl1 <- nlogL_zipois2(x, c(0.1, 0.63, 7, 13)) s <- sample(x = c(0, 1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * rnbinom(90, size = 13, mu = 9) + (1 - s) * rnbinom(90, size = 17, mu = 29)) nl <- nlogL_zinb2(x, c(0.1, 0.63, 13, 9, 17, 29)) s <- sample(x = c(0, 1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * gamlss.dist::rDEL(90, mu = 13, sigma = 9, nu = 0.5) + (1 - s) * gamlss.dist::rDEL(90, mu = 17, sigma = 29, nu = 0.1)) nl <- nlogL_zidel2(x, c(0.1, 0.63, 13, 9, 0.5, 17, 29, 0.1)) s <- sample(x = c(0,1), size = 90, replace = TRUE, prob = c(0.3, 0.7)) x <- c(rep(0, 10), s * gamlss.dist::rPIG(90, mu = 13, sigma = 0.2) + (1-s) * gamlss.dist::rPIG(90, mu = 17, sigma = 2)) nl <- nlogL_zipig2(x, c(0.1, 0.63, 13, 0.2, 17, 2)) s <- sample(x = c(0,1), size = 90, replace = TRUE, prob = c(0.3,0.7)) x <- c(rep(0,10), s*rpb(90, 5, 3, 20) + (1-s)*rpb(90, 7, 13, 53)) nl <- nlogL_zipb2(x, c(0.1, 0.63, 7, 13, 53, 5, 3, 20))
Density, distribution function, quantile function and random generation for the Poisson-beta distribution: a Poisson distribution whose parameter itself follows a beta distribution. Alpha and beta are the parameters of this specific beta distribution which is scaled on (0, c) in contrast to the usual scaling of the standard beta distribution on (0,1).
dpb(x, alpha, beta, c = 1, log = FALSE) ppb(q, alpha, beta, c = 1, lower.tail = TRUE, log.p = FALSE) qpb(p, alpha, beta, c = 1, lower.tail = TRUE, log.p = FALSE) rpb(n, alpha, beta, c = 1)
dpb(x, alpha, beta, c = 1, log = FALSE) ppb(q, alpha, beta, c = 1, lower.tail = TRUE, log.p = FALSE) qpb(p, alpha, beta, c = 1, lower.tail = TRUE, log.p = FALSE) rpb(n, alpha, beta, c = 1)
x , q
|
Vector of (non-negative integer) quantiles |
alpha , beta
|
Non-negative parameters of the beta distribution (shape1 and shape2) |
c |
Numeric scaling parameter of the beta distribution. The standard beta is scaled on (0,1) (default) and can be transformed to (0,c). |
log , log.p
|
Logical; if TRUE, probabilities p are given as log(p) |
lower.tail |
Logical; if TRUE (default), probabilities are |
p |
Vector of probabilities |
n |
Number of observations |
X <- dpb(x=0:200, alpha=5, beta=3, c=20) plot(0:200, X, type='l') Y <- dpb(0:10, seq(10.0,11.0,by=0.1), seq(30.0,31.0,by=0.1), seq(10.2,11.2,by=0.1)) Y <- ppb(q= 0 :200, alpha=5, beta= 3, c=20) plot(0:200, Y, type="l") Z <- qpb(p= seq(0,1, by= 0.01), alpha=5, beta= 3, c=20) plot(seq(0,1, by= 0.01),Z, type="l") RV <- rpb(n = 1000, alpha=5, beta= 3, c=20) plot(0 : 200, X, type="l") lines(density(RV), col="red") R2 <- rpb(11, seq(10.0,11.0,by=0.1), seq(30.0,31.0,by=0.1), seq(10.2,11.2,by=0.1))
X <- dpb(x=0:200, alpha=5, beta=3, c=20) plot(0:200, X, type='l') Y <- dpb(0:10, seq(10.0,11.0,by=0.1), seq(30.0,31.0,by=0.1), seq(10.2,11.2,by=0.1)) Y <- ppb(q= 0 :200, alpha=5, beta= 3, c=20) plot(0:200, Y, type="l") Z <- qpb(p= seq(0,1, by= 0.01), alpha=5, beta= 3, c=20) plot(seq(0,1, by= 0.01),Z, type="l") RV <- rpb(n = 1000, alpha=5, beta= 3, c=20) plot(0 : 200, X, type="l") lines(density(RV), col="red") R2 <- rpb(11, seq(10.0,11.0,by=0.1), seq(30.0,31.0,by=0.1), seq(10.2,11.2,by=0.1))