| Title: | Constrained Mixture of Generalized Normal Distributions |
|---|---|
| Description: | The 'cmgnd' implements the constrained mixture of generalized normal distributions model, a flexible statistical framework for modelling univariate data exhibiting non-normal features such as skewness, multi-modality, and heavy tails. By imposing constraints on model parameters, the 'cmgnd' reduces estimation complexity while maintaining high descriptive power, offering an efficient solution in the presence of distributional irregularities. For more details see Duttilo and Gattone (2025) <doi:10.1007/s00180-025-01638-x> and Duttilo et al (2025) <doi:10.48550/arXiv.2506.03285>. |
| Authors: | Pierdomenico Duttilo [aut, cre] (ORCID: <https://orcid.org/0000-0002-2036-7163>), Stefano Antonio Gattone [aut] (ORCID: <https://orcid.org/0000-0002-6143-9012>), Alfred Kume [aut] |
| Maintainer: | Pierdomenico Duttilo <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-05-24 06:57:02 UTC |
| Source: | https://github.com/pierdutt/cmgnd |
Fits univariate constrained mixture of generalized normal distribution models by imposing mixture partitions. Models are estimated by the ECM algorithm initialized by k-means.
cmgnd( x, K = 2, Cmu = rep(0, K), Csigma = rep(0, K), Cnu = rep(0, K), nstart = 50, theta = FALSE, nustart = rep(2, K), nustartype = "random", gauss = FALSE, laplace = FALSE, scale = FALSE, eps = 10^-4, maxit = 999, verbose = TRUE, sigbound = c(0.1, 5), sr = "like", eta = 0.5, seed = 12345, seed.nstart = seq(1:nstart) )cmgnd( x, K = 2, Cmu = rep(0, K), Csigma = rep(0, K), Cnu = rep(0, K), nstart = 50, theta = FALSE, nustart = rep(2, K), nustartype = "random", gauss = FALSE, laplace = FALSE, scale = FALSE, eps = 10^-4, maxit = 999, verbose = TRUE, sigbound = c(0.1, 5), sr = "like", eta = 0.5, seed = 12345, seed.nstart = seq(1:nstart) )
x |
A numeric vector of observations. |
K |
An integer specifying the number of mixture components to fit. Default is 2. |
Cmu |
A binary vector indicating mixture components
for the location parameter. The k-th element is set to 1 if the
k-th mixture component belongs to the Cr partition, and 0 otherwise.
Default is |
Csigma |
A binary vector indicating mixture components
for the scale parameter. The k-th element is set to 1 if the k-th
mixture component belongs to the Cr partition, and 0 otherwise.
Default is |
Cnu |
A binary vector indicating mixture components
for the shape parameter. The k-th element is set to 1 if the k-th mixture component belongs to the Cr partition, and 0 otherwise.
Default is |
nstart |
An integer specifying the number of starting points for the shape parameter. Default is 10. |
theta |
A parameter matrix used to initialize the estimation for the first starting point. |
nustart |
A numeric vector containing the starting values for the shape parameter |
nustartype |
A character string indicating whether the initialization of |
gauss |
A logical value indicating if the algorithm should use the Gaussian distribution.
Default is |
laplace |
A logical value indicating if the algorithm should use the Laplace distribution.
Default is |
scale |
A logical value indicating whether the function should scale the data. Default is |
eps |
A numeric value specifying the tolerance level of the ECM algorithm. |
maxit |
An integer specifying the maximum number of iterations. |
verbose |
A logical value indicating whether to display running output. Default is |
sigbound |
A numeric vector of length two specifying the lower and upper bounds for resetting the sigma estimates.
Default value is |
sr |
A character string specifying the type of convergence criterion to use.
The default is |
eta |
A numeric value specifying the tolerance level for the likelihood-based convergence.
Default value is |
seed |
Optional integer used to set the random seed via |
seed.nstart |
Optional numeric vector used to set the random seed via |
The constrained mixture of generalized normal distributions (CMGND) model is an advanced statistical tool designed for analyzing univariate data characterized by non-normal features such as asymmetry, multi-modality, leptokurtosis, and heavy tails. This model extends the mixture of generalized normal distributions (MGND) by incorporating constraints on the parameters, thereby reducing the number of parameters to be estimated and improving model performance. The CMGND model is defined by the following components:
where:
are the mixture weights, satisfying and .
is the Generalized Normal Distribution for the k-th component with mean ,
scale , and shape parameter .
The parameter space can be constrained by imposing equality constraints
such as , , and/or
for all , where is a partition of the set .
The partition for each parameter can be specified
by the binary vectors Cmu, Csigma and Cnu.
ll |
The log-likelihood corresponding to the estimated model. |
nobs |
Number of observations. |
parameters |
Data frame of the estimated parameters. |
ic |
Data frame of information criteria. AIC, BIC, HQIC and EDC are returned. |
res |
Matrix of posterior probabilities or responsibilities. |
clus |
Vector of group classifications. |
op_it |
List containing three integers: |
cputime |
A numeric value indicating the cpu time employed. |
info |
List containing a few of the original user inputs,
for use by other dedicated functions of the |
Bazi, Y., Bruzzone, L., and Melgani, F. (2006). Image thresholding based on the em algorithm and the generalized gaussian distribution. Pattern Recognition, 40(2), pp 619–634.
Wen, L., Qiu, Y., Wang, M., Yin, J., and Chen, P. (2022). Numerical characteristics and parameter estimation of finite mixed generalized normal distribution. Communications in Statistics - Simulation and Computation, 51(7), pp 3596–3620.
Duttilo, P. (2024). Modelling financial returns with mixtures of generalized normal distributions. PhD Thesis, University “G. d’Annunzio” of Chieti-Pescara, pp. 1-166, doi:10.48550/arXiv.2411.11847
Duttilo, P. and Gattone, S.A. (2025). Enhancing parameter estimation in finite mixture of generalized normal distributions, Computational Statistics, pp. 1-28, doi:10.1007/s00180-025-01638-x
Duttilo, P., Gattone, S.A., and Kume A. (2025). Constrained mixtures of generalized normal distributions, pp. 1-34, doi:10.48550/arXiv.2506.03285
# Old Faithful dataset x=faithful$eruptions # Unconstrained model estimation Cmu <- c(0, 0) Csigma <- c(0, 0) Cnu <- c(0, 0) model_unc <- cmgnd(x, nstart = 2, K = 2, Cmu, Csigma, Cnu) model_unc$parameters plot_cmgnd(x, model_unc) # Constrained model estimation with common scale parameters Csigma <- c(1, 1) model_con <- cmgnd(x, nstart = 2, K =2, Cmu, Csigma, Cnu) model_con$parameters plot_cmgnd(x, model_con)# Old Faithful dataset x=faithful$eruptions # Unconstrained model estimation Cmu <- c(0, 0) Csigma <- c(0, 0) Cnu <- c(0, 0) model_unc <- cmgnd(x, nstart = 2, K = 2, Cmu, Csigma, Cnu) model_unc$parameters plot_cmgnd(x, model_unc) # Constrained model estimation with common scale parameters Csigma <- c(1, 1) model_con <- cmgnd(x, nstart = 2, K =2, Cmu, Csigma, Cnu) model_con$parameters plot_cmgnd(x, model_con)
This function estimates the marginal density for univariate constrained mixture of generalized normal distribution (CMGND) models.
dcmgnd(x, parameters)dcmgnd(x, parameters)
x |
A numeric vector representing the observed data points. |
parameters |
A matrix or data.frame containing the parameters of the CMGND model. This can also be an object returned from the 'cmgnd()' function, representing a previously estimated CMGND model. |
The function computes the marginal density based on the provided parameters of the CMGND model. It can handle both newly supplied parameters or those extracted from an existing CMGND model object.
A vector of density estimates corresponding to the input data 'x'.
'cmgnd()' for estimating the model parameters.
Density function for the GND with location parameter mu,
scale parameter sigma and shape parameter nu.
dgnd(x, mu = 0, sigma = 1, nu = 2)dgnd(x, mu = 0, sigma = 1, nu = 2)
x |
A numeric vector of observations. |
mu |
A numeric value indicating the location parameter |
sigma |
A numeric value indicating the scale parameter |
nu |
A numeric value indicating the shape parameter |
If mu, sigma and nu are not specified
they assume the default values of 0, 1 and 2, respectively.
The GND distribution has density
The shape parameter controls both the peakedness and tail weights.
If the GND reduces to the Laplace distribution and if
it coincides with the normal distribution. It is noticed that
yields an intermediate distribution between the normal and the Laplace distribution.
As limit cases, for the distribution tends to a uniform
distribution, while for it will be impulsive.
dgnd returns the density.
Nadarajah, S. (2005). A generalized normal distribution.
Journal of Applied Statistics, .
This function generates a plot displaying both the marginal density and individual mixture component densities for univariate constrained mixture of generalized normal distribution (CMGND) models. It visually represents how the different components of the mixture model contribute to the overall density.
hist_cmgnd(x, parameters, bins = 80)hist_cmgnd(x, parameters, bins = 80)
x |
A numeric vector representing the observed data points. |
parameters |
A matrix or data.frame containing the parameters of the CMGND model. |
bins |
Number of bins. Defaults to 80. Alternatively, this can be an object returned from the 'cmgnd()' function, representing an estimated CMGND model. |
The function plots the overall (marginal) density curve for the CMGND model, as well as the density curves of each mixture component. This visualization helps in understanding how each component contributes to the model and provides insights into the data distribution.
A plot illustrating the marginal density along with the densities of the individual mixture components for the given data 'x'.
'cmgnd()' for estimating the model parameters.
Computes the mean, variance, skewness, and kurtosis of the marginal distribution of a univariate Constrained Mixture of Generalized Normal Distributions (CMGND) model, given the model parameters.
moments_cmgnd(parameters)moments_cmgnd(parameters)
parameters |
A matrix or data frame where each row corresponds to a component of the mixture. Columns must be ordered as follows:
|
The function assumes that the parameters define a valid CMGND model and uses
analytical expressions to compute the first four moments of the marginal distribution.
The shape parameter governs the kurtosis of each component.
A named list with the following elements:
The marginal mean of the CMGND distribution
The marginal variance
The marginal skewness
The marginal kurtosis
cmgnd for estimating CMGND model parameters.
This function generates a plot displaying both the marginal density and individual mixture component densities for univariate constrained mixture of generalized normal distribution (CMGND) models. It visually represents how the different components of the mixture model contribute to the overall density.
plot_cmgnd(x, parameters, model = "")plot_cmgnd(x, parameters, model = "")
x |
A numeric vector representing the observed data points. |
parameters |
A matrix or data.frame containing the parameters of the CMGND model. |
model |
A character indicating the model type name. Default value model="". Alternatively, this can be an object returned from the 'cmgnd()' function, representing an estimated CMGND model. |
The function plots the overall (marginal) density curve for the CMGND model, as well as the density curves of each mixture component. This visualization helps in understanding how each component contributes to the model and provides insights into the data distribution.
A plot illustrating the marginal density along with the densities of the individual mixture components for the given data 'x'.
'cmgnd()' for estimating the model parameters.
Simulate univariate constrained mixture of generalized normal distribution models. Remeber to set the set.seed() before the function sim_cmgnd().
sim_cmgnd( n = 1000, pi = rep(0.5, 2), mu = c(1, 5), sigma = c(1, 1), nu = c(2, 2) )sim_cmgnd( n = 1000, pi = rep(0.5, 2), mu = c(1, 5), sigma = c(1, 1), nu = c(2, 2) )
n |
A numeric value indicating the total number of observations to simulate. |
pi |
A numeric vector of the mixture weights |
mu |
A numeric vector of the location parameter |
sigma |
A numeric vector of the scale parameter |
nu |
A numeric vector of the shape parameter |
sim_data |
The simulated data. |
sim_clus |
The cluster indication of simulated data. |