NEWS
nmfkc 0.7.3
Documentation
- README and
nmf-sem-with-nmfkc.Rmd vignette code now reference the
canonical nmf.ffb.* aliases (nmf.ffb(), nmf.ffb.cv(),
nmf.ffb.DOT()) instead of the legacy nmf.sem.* names. Both
names continue to work; the change only affects what users see on
the GitHub Pages homepage and in the vignette source.
nmfkc 0.7.2
Headline: NMF-FFB rebrand and full bootstrap inference
nmf.ffb* family added as the canonical alias for nmf.sem* (Satoh
2025, arXiv:2512.18250 adopts "NMF-FFB" — Non-negative Matrix
Factorization with Feed-Forward + Feedback — as the model's
canonical name). nmf.sem* continues to work and shares the same
return classes (c("nmf.ffb", "nmf.sem") and
c("nmf.ffb.inference", "nmf.sem.inference", ...)), so existing
scripts are unaffected.
nmf.sem.inference() / nmf.ffb.inference(): replaced the legacy
1-step Newton wild bootstrap with a full X-fixed pair bootstrap.
Resamples columns of (Y1, Y2), refits (C1, C2) with X held at the
original fit, and reports per-element support_rate = mean(|c_b| > threshold) together with percentile CIs. Significance markers
(* / ** / *** at sup > 0.95 / 0.99 / 0.999) follow the lavaan
convention. Both Theta_1 (feedback) and Theta_2 (exogenous) are
inference targets (previous version covered only Theta_2).
nmf.sem() / nmf.ffb(): now runs nmfkc(Y1, A = Y2)
internally by default when X.init is a string method,
forwarding X.init, X.L2.ortho, epsilon, maxit, seed. The
feedforward fit is used both as the X warm-start and as the
baseline for SC.map. nmfkc.baseline = FALSE opts out.
Bug Fixes
nmf.sem.inference(): fixed dimension bug in the Leontief identity
matrix (I_mat <- diag(Q) should have been diag(P1)); previously
every replicate was silently marked invalid when P1 != Q.
nmfkc.net(): now auto-masks NA entries of Y (parity with the
other four NMF variants); previously errored at the min(Y) < 0
check when Y contained NA.
nmfkc(): Fixed C matrix asymmetry in tri-symmetric NMF (Y.symmetric = "tri"). The C update was using stale B and XB computed from the old X; now B and XB are recomputed after X is updated. Also fixed column reordering to permute both rows and columns of C. Previously the relative asymmetry could reach ~46%; now it is at machine precision (~1e-14).
Improvements
Y.weights semantics unified to lm()-style weighted least squares
across nmfkc(), nmfae(), nmfkc.net(), nmfkc.signed(),
nmfae.signed(): loss is now sum(W * (Y - Yhat)^2) (linear in W,
matching lm()'s weights argument). Binary masks (W ∈ {0, 1};
the standard ECV / NA-mask case) are unaffected since W = W^2.
- All MU functions now emit a
"maximum iterations (N) reached..."
warning when maxit is exhausted without meeting the relative-
tolerance criterion (previously silent in nmfae, nmfae.signed,
nmfkc.net, nmfkc.signed, nmfre, and nmf.sem).
- All MU functions now share
maxit = 5000 as the default (was 5000
/ 20000 / 50000 inconsistently). Together with the maxit warning
above, users see explicit feedback when 5000 is insufficient and
can opt into a larger cap.
- New shared internal helper
.init_X_method() for X initialization
via "nndsvd" / "kmeans" / "kmeansar" / "runif" / numeric
matrix. All NMF families now use the same dispatch logic; previous
ad-hoc inline implementations are removed.
nmf.sem() returns SC.map (input-output structural fidelity:
correlation between the equilibrium operator and the feedforward
baseline mapping; Satoh 2025 §4.SC.map) automatically when
nmfkc.baseline is supplied or computed internally.
summary.nmf.sem(): rewritten to display the full-bootstrap
inference output — separate Theta_1 / Theta_2 blocks with
Estimate | CI_low | CI_high | support | Pr(>0) | sig, plus a
bootstrap meta-info header.
coef.nmf.sem(): now returns a long-format data frame with rows
for every entry of both C1 and C2 (Type | Basis | Covariate | Estimate); previously returned only the C2 matrix when no
inference had been run. Schema matches the inference-augmented
output for uniformity.
plot.nmf.sem(): default trace is now objfunc.full (loss +
penalties — the actual monotonically-decreasing quantity that the
multiplicative updates minimize) instead of objfunc (reconstruction
only). New argument which = "full" | "reconstruction" | "both".
nmf.sem.DOT(): significance stars now appear on Theta_1 (feedback
Y1 → F) edges in addition to Theta_2 (exogenous Y2 → F); X (F → Y1)
edges remain unstarred since the basis is not the inference target.
plot.nmfae.ecv(): Heatmap cell text color is now always black for better readability on light-colored cells.
nmfkc(): X.init = "runif" now supports nstart > 1 for multi-start initialization. Multiple random starting points are evaluated with 10 standard NMF iterations, and the best (lowest Frobenius error) is selected.
nmfae(), nmfre(): r.squared is now computed as cor(Y, fitted)^2 (squared correlation between observed and fitted values), consistent with nmfkc(). Previously nmfae() used 1 - SS_res/SS_tot and nmfre() used the same regression-style R-squared, which can behave unexpectedly for intercept-free non-negative models.
nmfkc.kernel.beta.nearest.med(): added a candidates argument controlling the bandwidth grid. Options: "7points" (new default, t = {-1,-2/3,-1/3,0,1/3,2/3,1}), "4points" (t = {-1/2, 0, 1/2, 1}), or a user-supplied numeric vector of \eqn{t} values. Previously the grid silently differed between the no-landmark (Uk = NULL; 4 points) and landmark (7 points) branches.
New Functions (Signed NMF family)
nmfkc.signed(): NMF-KC with signed covariate/coefficient. Model \eqn{Y \approx X \Theta A} with \eqn{X \ge 0}, \eqn{\Theta = C_{+} - C_{-}} (signed), \eqn{A} real-valued. Uses Ding et al. (2010) sign-splitting + Direct MU; \eqn{Y} may also contain negative entries (semi-NMF regression). Supports Y.weights for element-wise masking.
nmfkc.signed.cv(), nmfkc.signed.ecv(): column-wise and element-wise k-fold CV for rank selection on signed data.
nmfae.signed(): Three-layer autoencoder with \strong{signed bottleneck} \eqn{Y_1 \approx X_1 (C_{+} - C_{-}) X_2 Y_2}. \eqn{X_1, X_2 \ge 0} preserve soft clustering on both decoder and encoder sides while the bottleneck \eqn{\Theta} can carry negative weights (e.g., anti-correlated properties). Hybrid warm-start (from nmfae()) + Direct MU with multi-restart.
nmfae.signed.ecv(): element-wise CV for (decoder-rank, encoder-rank) selection.
nmfae.signed.inference(): sandwich SE + wild bootstrap for \eqn{\Theta} (no non-negativity projection on \eqn{\Theta} since it is signed).
- S3 methods
predict.*.signed(), plot.*.signed(), summary.*.signed(), and nmfae.signed.rename() helper.
New Functions (Network NMF family)
nmfkc.net(): Single unified entry point for symmetric NMF of network data, with type = "tri" | "bi" | "signed". All three variants use the Frobenius-full bilateral gradient (supersedes the one-sided approximation in nmfkc(Y.symmetric = ...)). type = "signed" supports signed \eqn{C = C_{+} - C_{-}} via Ding et al. (2010) sign-splitting, preserving \eqn{X \ge 0} for soft clustering while allowing inter-cluster repulsion. The returned object's fields are uniform across types: \code{$Cp} and \code{$Cn} are \code{NULL} for tri/bi, and populated matrices for signed. \code{$C} is always populated (identity for bi, non-negative for tri, signed for signed).
nmfkc.net.ecv(): Element-wise cross-validation with upper-triangle folds (mirrored to the lower triangle to prevent symmetry leakage). Unified entry point for type = "tri" | "bi" | "signed" (calls nmfkc.net() with the matching type for each fold).
nmfkc.net.DOT(): Graphviz DOT visualization for symmetric NMF networks. Displays basis-to-node membership edges and inter-basis interaction edges (C matrix) with significance stars. Now has signed parameter (auto-detected from class) to render negative C entries as dashed edges.
nmfkc.net.inference(): Statistical inference for symmetric NMF. Wrapper around nmfkc.inference() with A = t(X). Returns off-diagonal C coefficients with sandwich SE and wild bootstrap.
Deprecations
nmfkc(Y, Y.symmetric = "bi"|"tri"): Deprecated in favor of nmfkc.net(Y, type = "bi"|"tri"). The old implementation uses a one-sided gradient approximation that empirically converges for \eqn{C \ge 0} but is theoretically incorrect and does not extend to signed \eqn{C}. The deprecated branch still works in v0.6.8 (with a deprecation warning) and will be removed in a future release.
Parameter Renames (old names remain usable for backward compatibility)
nmf.sem.DOT(): weight_scale_y2f → weight_scale_c2, weight_scale_fy1 → weight_scale_x1 (matrix-name-based naming, consistent with nmfae.DOT() and nmfkc.DOT()).
nmf.sem.DOT(): sig.level moved to after threshold for consistency with other .DOT functions.
Documentation
- README, vignettes, and roxygen
@title / @description updated to
use NMF-FFB as the canonical model name (with "(formerly
NMF-SEM)" attached on first mention for discoverability of the
legacy term). File names (R/nmf.sem.R, vignettes/nmf-sem-with- nmfkc.Rmd, man/nmf.sem.Rd), function names (nmf.sem*), and S3
classes ("nmf.sem") are unchanged so URLs and existing scripts
continue to work.
nmfkc 0.6.7 (2026-04-15)
Bug Fixes
- Added
fitted.nmfae() and residuals.nmfae() S3 methods; previously fitted() on an nmfae object silently returned NULL because the wrong field name ($XB instead of $Y1hat) was used.
Naming Unification (old names remain usable for backward compatibility)
- Coefficient tables: all inference functions now use
Basis / Covariate columns (was Factor/Exogenous in nmf.sem.inference(), Decoder/Encoder in nmfae.inference()).
- Wild bootstrap defaults unified:
wild.B = 500, wild.seed = 123 across all inference functions.
- First argument of all
.DOT functions renamed to result for consistency.
- CV tuning parameters (
nfolds, seed, shuffle) moved to ... in nmfkc.ecv(), nmfae.ecv(), nmfae.cv(), nmf.sem.cv(); div also accepted for backward compatibility.
nmfkc 0.6.6
New Functions
nmfkc.criterion(): Extracted criterion computation from nmfkc() as a standalone exported function. Supports detail = "full" / "fast" / "minimal" to control computation cost.
nmfre.inference(): Separated statistical inference from nmfre() optimization. Returns coefficient table with SE, z-values, and p-values via wild bootstrap.
nmf.sem.inference(): Statistical inference for the C2 parameter matrix in NMF-SEM. Uses sandwich SE and wild bootstrap.
- S3 methods
coef(), fitted(), residuals() for all model classes (nmfkc, nmfae, nmfre, nmf.sem).
- S3 methods
plot() for nmfre and nmf.sem (convergence diagnostics).
summary.nmf.sem(): Stability diagnostics, fit statistics, and C2 coefficient table.
Parameter Renames (old names remain usable for backward compatibility)
nmfkc(), nmfkc.rank(): save.time / save.memory → detail
nmfae(): Q → rank, R → rank.encoder
nmfre(): Q → rank, dfU.cap.rate → df.rate
nmfre.dfU.scan(), nmfkc.ar.degree.cv(): Q → rank
nmfkc.residual.plot(): Y_XB_palette → fitted.palette, E_palette → residual.palette
nmfkc.kernel.beta.nearest.med(): block_size → block.size, sample_size → sample.size
Other Improvements
hide.isolated option added to all .DOT functions (default TRUE).
nmf.sem.DOT(): Added sig.level parameter; C2 edges decorated with significance stars.
nmfkc(): Added X.restriction = "none" option and X.init = "kmeansar" initialization.
- Added arXiv/DOI references to roxygen documentation for all main functions.
@section Lifecycle: Experimental added to nmfae().
- Removed
mc.cores parallel option from nmfae.ecv() for CRAN compliance.
nmfkc 0.6.0
Bug Fixes
- Fixed variable
T shadowing TRUE in information criterion computation.
- Fixed
nmfkc.ecv() to use KL divergence for evaluation when method="KL".
- Added performance flags (
save.time=TRUE) to nmfkc.ecv() inner calls.
- Fixed zero-division in
nmfkc.rank() elbow normalization when R-squared values are identical.
- Fixed parameter name mismatch (
rank → Q) in nmfkc.rank() call to nmfkc.ecv().
- Fixed descending loop in
nmf.sem.split() when P=2.
- Added input validation for
n.exogenous in nmf.sem.split().
Documentation
- Added roxygen documentation for
summary.nmfkc() and print.summary.nmfkc().
- Added
@return for plot.nmfkc() and predict.nmfkc().
- Added missing
@return items (method, n.missing, n.total, rank, mae) to nmfkc().
Code Quality
- Replaced
T/F with TRUE/FALSE.
- Replaced
1:length() with seq_along().
- Changed default font from Meiryo to Arial in DOT functions.
- Aligned
nmf.sem.cv() defaults with nmf.sem().
nmfkc 0.5.8
Graphviz DOT Output Consolidation and Cleanup
- Harmonized all DOT-generating functions (
nmf.sem.DOT, nmfkc.DOT, nmfkc.ar.DOT) for consistent structure, naming conventions, and visualization logic.
- Standardized node and edge formatting rules, including unified cluster behavior, color schemes, and edge-scaling conventions.
- Implemented threshold-aware coefficient labeling so that displayed numerical precision aligns with the visualization threshold, preventing misleadingly detailed labels.
- Removed unused or redundant DOT fragments and improved compatibility across Graphviz engines.
- Enhanced layout readability through consistent indentation, node grouping, and suppression of isolated nodes in specific visualization modes (e.g.,
type = "YA" in nmfkc.DOT).
- Refactored and expanded internal DOT helper functions (
.nmfkc_dot_format_coef, .nmfkc_dot_digits_from_threshold, .nmfkc_dot_cluster_nodes, etc.) for better maintainability and uniform behavior.
- New Function: Implemented
nmfkc.ecv() for Element-wise Cross-Validation (Wold's CV).
- This function randomly masks elements of the observation matrix to evaluate structural reconstruction error.
- It provides a statistically robust criterion for rank selection, avoiding the monotonic error decrease often seen in standard column-wise CV.
- Supports vector input for
rank to evaluate multiple ranks simultaneously.
- Missing Value & Weight Support:
nmfkc() and nmfkc.cv() now fully support missing values (NA) and observation weights via the hidden argument Y.weights (passed through ...).
- If
Y contains NAs, they are automatically detected and masked (assigned a weight of 0) during optimization.
- Rank Selection Diagnostics (
nmfkc.rank):
- Dual-Axis Visualization: The plot now displays fitting metrics ($R^2$, etc.) on the left axis and ECV Sigma (RMSE) on the right axis (blue line).
- Automatic Best Rank labeling: The plot explicitly marks the "Best" rank based on two criteria:
- Elbow: Geometric elbow point of the $R^2$ curve.
- Min: Minimum error point of the Element-wise CV.
save.time defaults to FALSE, enabling the robust Element-wise CV calculation by default.
- Argument Standardization:
- Unified the rank argument name to
rank across all functions (nmfkc, nmfkc.cv, nmfkc.ecv, nmfkc.rank).
- The legacy argument
Q is still supported for backward compatibility but internally mapped to rank.
- Summary Improvements:
- Updated
summary() and print() methods to report:
- Sparsity of Basis ($X$) and Coefficients ($B$).
- Clustering Entropy (indicating "Crisp" vs "Ambiguous" clustering).
- Clustering Crispness (Mean Max Probability).
- Number and percentage of missing values in $Y$.
- Other Improvements:
- Added a validation check in
nmfkc.ar() to ensure the input Y has no missing values (as they cannot be propagated to the covariate matrix A in VAR models).
- Refined
nmfkc.residual.plot() layout margins for better visibility of titles.
- Updated documentation to reflect all changes.
- Regularization Update:
The regularization scheme has been revised from L2 (ridge) to L1 (lasso-type) penalties.
gamma now controls the L1 penalty on the coefficient matrix ( B = C A ), promoting sparsity in sample-wise coefficients.
- A new argument
lambda has been added to control the L1 penalty on the parameter matrix ( C ), encouraging sparsity in the shared template structure.
Both parameters can be passed through the ellipsis (...) to nmfkc() and related functions.
- Function Signature Simplification:** Many less-frequently used arguments in
nmfkc() (e.g., gamma, X.restriction, X.init) and in nmfkc.cv() (e.g., div, seed) have been moved into the ellipsis (...) for a cleaner function signature.
- Performance Improvement: The internal function
.silhouette.simple was vectorized and optimized to reduce computational cost, particularly for the calculation of a(i) and b(i).
- Removed the
fast.calc option from the nmfkc() function.
- Added the
X.init argument to the nmfkc() function, allowing selection between 'kmeans' and 'nndsvd' initialization methods.
- The penalty term has been changed from
tr(CC') to tr(BB') = tr(CAA'C').
- Implemented the internal
.z and xnorm functions.
- Added the fast.calc option to the
nmfkc() function.
- Optimized internal calculations for improved performance.
- Updated
citation("nmfkc") and added AIC/BIC to the output.
- Implemented the
nmfkc.ar.stationarity() function.
- Modified the
z() function.
- Used
crossprod() for faster matrix multiplication.
- Implemented the
nmfkc.ar.DOT() function.
- Added logic to sort the columns of
X to form a unit matrix in special cases.
- Implemented
nmfkc.kernel.beta.cv() and nmfkc.ar.degree.cv() functions.
- Set the default column names of
X to Basis1, Basis2, etc.
- Added
X.prob and X.cluster to the return object.
- Skipped CPCC and silhouette calculations when
save.time = TRUE.
- Added a prototype for the
nmfkc.ar() function.
- Added the
criterion argument to the nmfkc() function to support multiple criteria.
- Updated the
nmfkc.rank() function.
- Added the
criterion argument to the nmfkc.rank() function.
- Implemented the
save.time argument.
- Implemented the
nmfkc.rank() function.
- Implemented the
nstart option from the kmeans() function.
- Added an experimental implementation of the
nmfkc.rank() function.
- Removed zero-variance columns and rows with a warning.
- Added source and references to the documentation.
- Renamed several components for clarity:
nmfkcreg to nmfkc
create.kernel to nmfkc.kernel
nmfkcreg.cv to nmfkc.cv
P to B.prob
cluster to B.cluster
unit to X.column
trace to print.trace
dims to print.dims
- Added the
r.squared argument to the nmfkcreg.cv() function.
- In
nmfkcreg():
- Added the
dims argument to check matrix sizes.
- Added the
unit argument to normalize the basis matrix columns.
- Modified the
create.kernel() function to support prediction.
- Updated examples on GitHub.
- Removed the
YHAT return value; use XB instead.
- Added the
cluster return value for hard clustering.