## Abstract

The brain consists of specialized cortical regions that exchange information between each other, reflecting a combination of segregated (local) and integrated (distributed) processes that define brain function. Functional magnetic resonance imaging (fMRI) is widely used to characterize these functional relationships, although it is an ongoing challenge to develop robust, interpretable models for high-dimensional fMRI data. Gaussian mixture models (GMMs) are a powerful tool for parcellating the brain, based on the similarity of voxel time series. However, conventional GMMs have limited parametric flexibility: they only estimate segregated structure and do not model interregional functional connectivity, nor do they account for network variability across voxels or between subjects. To address these issues, this letter develops the functional segregation and integration model (FSIM). This extension of the GMM framework simultaneously estimates spatial clustering and the most consistent group functional connectivity structure. It also explicitly models network variability, based on voxel- and subject-specific network scaling profiles. We compared the FSIM to standard GMM in a predictive cross-validation framework and examined the importance of different model parameters, using both simulated and experimental resting-state data. The reliability of parcellations is not significantly altered by flexibility of the FSIM, whereas voxel- and subject-specific network scaling profiles significantly improve the ability to predict functional connectivity in independent test data. Moreover, the FSIM provides a set of interpretable parameters to characterize both consistent and variable aspects functional connectivity structure. As an example of its utility, we use subject-specific network profiles to identify brain regions where network expression predicts subject age in the experimental data. Thus, the FSIM is effective at summarizing functional connectivity structure in group-level fMRI, with applications in modeling the relationships between network variability and behavioral/demographic variables.

## 1 Introduction

The brain exhibits network organization, consisting of functionally distinct cortical regions that share information between each other. Thus, cognitive states are defined by two modes of brain function: specialized, local processing (segregation) and interactions between these segregated regions (integration) (Tononi, Sporns, & Edelman, 1994). A critical goal of neuroscience research is to learn the “segregative” and “integrative” properties of brain function that underlie healthy and impaired cognition (Friston, 2002; Fair et al., 2007; Rubinov & Sporns, 2010). Blood oxygenation level dependent functional magnetic resonance imaging (BOLD fMRI) provides a powerful technique for investigating the network properties of brain function, typically by measuring the covariance (or correlation) between BOLD time series of different brain regions. This approach to measuring functional connectivity has seen increasing use over the past decade, spurred by the identification of reliable functional networks in the brain (Raichle et al., 2001; Greicius, Krasnow, Reiss, & Menon, 2003), with the literature showing numerous applications, including aging (Grady, 2008), maturation (Power, Fair, Schlaggar, & Petersen, 2010), and characterizing clinical populations (Greicius, Srivastava, Reiss, & Menon, 2004; Garrity et al., 2007; Lowe et al., 2002; Grefkes et al., 2008).

Functional connectivity studies typically attempt to quantify integrative relationships between brain regions. However, it is an ongoing challenge to identify generalizable features of functional connectivity, as the number of brain voxels *V* usually exceeds the number of samples *N* by orders of magnitude, and we must summarize covariance relationships among a massive set of possible elements. Functional connectivity estimates may also be altered by noise confounds (Chang & Glover, 2009; Van Dijk, Sabuncu, & Buckner, 2012) and variability in functional networks, both within and between subjects. A simple modeling approach is to parcellate the cortex based on an anatomical atlas and measure functional connectivity between mean time series of these regions. However, atlas-based parcellations may be inconsistent, as they are highly sensitive to spatial normalization methods (Bohland, Bokil, Allen, & Mitra, 2009), and may poorly represent populations that deviate from the anatomical template, including child, aging, and clinical populations (Crinion et al., 2007; Rorden, Bonilha, Fridriksson, Bender, & Karnath, 2012). This is a major issue, as suboptimal parcels can lead to significant network modeling errors (Craddock, James, Holtzheimer, Hu, & Mayberg, 2012; Smith et al., 2013).

As an alternative, there is growing interest in unsupervised clustering models, which automatically parcellate brain regions based on the similarity of their BOLD time series. A variety of approaches have been successfully applied to fMRI data, including variants of k-means clustering (Goutte, Toft, Rostrup, Nielsen, & Hansen, 1999; Golland, Golland, Bentin, & Malach, 2008; Bellec, Rosa-Neto, Lyttelton, Benali, & Evans, 2010), spectral clustering (Van den Heuvel, Mandl, & Hulshoff Pol, 2008; Craddock et al., 2012), region-growing approaches (Lu, Jiang, & Zang, 2003; Bellec et al., 2006), hierarchical models (Cordes, Haughton, Carew, Arfanakis, & Maravilla, 2002; Thirion et al., 2006), and mixture models (Woolrich, Behrens, Beckmann, & Smith, 2005; Lashkari et al., 2012). Gaussian mixture models (GMMs) are particularly appealing, as they employ a probabilistic framework that does not require clustering threshold heuristics, and have a rich, well-developed theory for the analysis of large data sets (McLachlan & Peel, 2000; Fraley & Raftery, 2002; Melnykov & Maitra, 2010). This latent variable modeling approach assumes voxels originate from a set of *K* clusters, each with a mean time series and normally distributed variability.

When parcellating fMRI data, conventional GMMs have two significant limitations, which are common to all temporal clustering models. The first issue is that GMMs treat individual cluster means independently and only estimate “segregative” brain structure (i.e., within-cluster mean BOLD time series), requiring users to measure between-cluster functional connectivity as a post hoc exercise. This is potentially an oversimplified representation of the highly integrated human brain and does not exploit between-cluster functional connectivity information during model learning. The second issue is a lack of parametric flexibility in GMMs, as within-cluster mean BOLD time series (and post hoc connectivity estimates) represent the BOLD response across all voxels and subjects. They do not model voxel-dependent variability in signal amplitude or subject-dependent variations in functional connectivity. The inclusion of these parameters in GMMs would provide a more detailed description of functional structure in fMRI data and potentially make clustering models robust to outliers.

This letter addresses the above issues by extending the GMM framework to identify the most stable parcellation for group fMRI in terms of between-cluster functional connectivity. Termed the FSIM (functional segregation and integration model), the model reparameterizes cluster means using a constraint adapted from multilinear component modeling (Harshman, 1972; Kiers, Ten Berge, & Bro, 1999), in order to learn the most consistent group functional connectivity structure. Moreover, FSIM estimates voxel-specific and subject-specific network scaling profiles. This framework allows us to identify a core functional connectivity structure that is most consistent within the group, as well as spatial and subject-dependent variations in network scaling. Figure 1 depicts the features estimated in standard mixture models and compares them to the proposed FSIM framework.

The scaling profile approach of FSIM also bridges the gap between discrete mixture modeling and linear component-based profile models, which are well established in the functional neuroimaging literature. This includes bilinear techniques such as scaled subprofile models (Moeller, Strother, Sidtis, & Rottenberg, 1987) and partial least squares (McIntosh, Chau, & Protzner, 2004), which characterize structured covariance across subjects and task conditions. Multilinear models such as parallel factor analysis (PARAFAC) have also been applied to functional neuroimaging (Andersen & Rayens, 2004; Miwakeichi et al., 2004; Martínez-Montes, Vega-Hernández, Sánchez-Bornot, & Valdés-Sosa, 2008), providing scaling profiles that describe *voxel* × *time* × *subject* structure in group data. A more flexible extension of PARAFAC, termed PARAFAC2, was proposed by Harshman (1972) and further refined by Kiers et al. (1999). It allows for component time series to be asynchronous across subjects, under the constraint of a consistent between-component correlation matrix. The PARAFAC2 model forms the basis for the proposed FSIM by applying a constraint of consistent between-cluster functional connectivity to the GMM. The discrete latent variable approach to modeling functional connectivity has less flexibility compared to component-based models (which represent voxel time series as a linear combination of components) but is able to define discrete cortical subdivisions, making it possible to fully describe data in terms of the extracted connectivity networks. This is of particular interest for research domains associated with network and graph-theoretic models of brain function.

In this letter, we develop the FSIM framework and an inference procedure based on expectation-maximization (EM). We then implement a cross-validation framework in order to determine how different parameters of the FSIM affect the model’s ability to predict between-voxel connectivity relationships and the stability of model parameters, including brain parcellations and the between-cluster connectivity matrix. The results are evaluated for both simulated and experimental resting-state data sets. In addition, the FSIM generates a set of parameters that can be used to characterize the functional organization of group fMRI data. As an example of the utility of these parameters, we demonstrate how network scaling profiles can be used to predict age for the experimental data set.

This letter focuses on the unsupervised modeling of functional connectivity, quantifying the linear dependencies between brain regions based on their BOLD time series. This is distinct from models of directed or effective connectivity, of the sort commonly applied in dynamic causal modeling (e.g., Penny, Stephan, Mechelli, & Friston, 2004). These approaches serve complementary roles, as functional connectivity methods are useful exploratory tools that are able to identify structure in high-dimensional data, whereas models of effective connectivity are better suited for directed hypothesis testing. (See section 4 for further review of these issues.)

## 2 Materials and Methods

The FSIM framework is established in section 2.1, based on the GMM formalism and solved via the expectation-maximization (EM) algorithm. Different possible FSIM implementations and parameter choices are reviewed in section 2.2, followed in section 2.3 by the cross-validation framework for measuring model generalizability. Finally, we describe the experimental data and preprocessing parameters used to evaluate the models in section 2.4. For this letter, the EM algorithm is used to optimize model log likelihood as the most direct extension of the PARAFAC2 fitting approach (Harshman, 1972; Kiers et al., 1999) to the mixture-modeling framework. This approach, combined with the proposed cross-validation framework, allows us to explore trade-offs in both predictive generalization and parameter reproducibility as a function of FSIM parameter choices. Alternative Bayesian approaches to model estimation, along with implementation challenges in the FSIM framework, are reviewed in section 4.

### 2.1 The FSIM Framework

#### 2.1.1 Model Definition

*V*voxels, acquired over

*T*time points), and it is assumed that time-series vectors are generated from a mixture of

*K*multivariate gaussian distributions. Each voxel has a set of binary latent variables defining the cluster it originates from: for voxel

*v*, a value indicates it originates from cluster

*k*, and thus all other . In this model, subjects are governed by the same latent variables—that is, they have the same spatial clustering structure but different model parameters. Our goal is to estimate maximum-likelihood model parameters for the full observed data set by marginalizing

*z*of the likelihood on observed data: where is the prior likelihood of the

_{kv}*k*th cluster and is the likelihood of voxel time-series originating from the

*k*th cluster under an assumed Gaussian model. The mean cluster time-series are subject dependent. In analogy to component-based models, we also assume that the cluster means have a voxel-dependent scaling factor

*w*. In this GMM representation, we now enforce a consistent covariance structure between the cluster means over all subjects. This is done by reparameterizing them using the approach devised by Harshman (1972) and Kiers et al. (1999) for component-based group covariance estimation in the PARAFAC2 tensor model. For each subject, the (column) matrix of cluster means is defined by where are constrained to be columnwise orthonormal (), is fixed across subjects, and is a diagonal matrix of cluster scalings. For mean-centered time series, the covariance between clusters is therefore .

_{kv}*w*and columns of to unit norm and absorb subject-specific scaling into the elements of . This defines an explicit network-driven clustering model, where individual clusters form network elements. We directly estimate the most stable intercluster correlation across subjects while also learning subject- and voxel-specific network scaling. The FSIM therefore clusters voxels into groups (segregation) in terms of consistent between-cluster correlations (integration).

_{kv}#### 2.1.2 Expectation-Maximization Solution

*Q*that we must iteratively optimize. If we define cluster responsibilities and replace with the joint gaussian probability over all subjects, this expands into the expression where . We now solve for expectation and maximization steps of this function. The parameter update equations are defined below (refer to appendix A for full solution details). We first update cluster responsibilities (E-step), then compute maximum-likelihood estimates of all model parameters for the auxiliary function

*Q*(see equation 2.3; M-step). The M-step updates are obtained by solving for each parameter. For notational convenience the solutions to the updated equations are simply referred to as below. Resolving each in turn:

**E-Step**

**Cluster responsibilities**: The probability of assigning voxel time series to cluster*k*, conditional on the posterior expectations of . It is obtained via Bayes’ rule: Specifically, these values reflect the likelihood that the*v*th voxel belongs to cluster*k*. The estimates are used to parcellate brain voxels into*k*subregions by assigning each voxel to the cluster with the highest responsibility. (See the paragraphs following the M-step list for further details.)

**M-Step**

**Voxel-specific network scaling**: Computed as variance-weighted regression of voxel time series against cluster means (with all terms except*w*included): This measure therefore has a practical interpretation; it produces a set of_{kv}*K*voxel scaling profiles (brain maps), reflecting the similarity between each voxel time series and the*k*th cluster mean.**Subject-specific network scaling**: The elements of the diagonal network scaling matrices can be solved independently. They are computed as variance-weighted regressions of voxel time series against cluster means ( with all terms except*c*included): This subject scaling profile also has direct interpretation, as it quantifies the relative amplitude of network expression for each subject._{ks}**Group connectivity matrix and orthonormal subject matrices**: We define the following terms: , , , where, for example, refers to the*K*-dimensional vector of all values for a fixed*v*, converted into a*K*-dimensional diagonal matrix. If we also define**1**as a vector of ones, the group connectivity-forming matrix is given by For the orthonormal subject matrices, the singular value decomposition is obtained, such that the solution is given by with further details in appendix A. From the above equations, we see that the parameters have complex interdependencies. Due to EM convergence properties, it is sufficient to increase the likelihood of individual parameters for each cycle in order to converge to a (local) maximum, in an approach sometimes termed “generalized EM” (Laird, Lange, & Stram, 1987). To fit the FSIM, we iteratively perform an E-step update, followed by cyclic updating of the model parameters during the M-step.

To facilitate model interpretation at the group level, nonnegativity constraints are imposed on voxel scaling *w _{kv}* and subject scaling

*c*terms. Unlike component models, where individual elements are interdependent, necessitating numerical optimization methods (Gillis, 2014), the FSIM update equations show independence between elements of

_{ks}*w*and between elements of

_{kv}*c*when keeping other parameters fixed Thus, the max-likelihood solution at a stationary point

_{ks}*x*

_{0}is given simply by .

For GMMs, the brain is parcellated into *K* regions, based on the assumption that each voxel originates from one of *K* latent clusters. The binary latent variables dictate cluster assignments, with indicating that voxel *v* originates from cluster *k*, and for all . However the latent variables are unobservable quantities. Thus, GMMs typically use responsibilities to infer the most probable parcellation of brain voxels . Voxels are assigned to their highest-probability clusters: for the *v*th voxel, we set for the cluster *k* with the highest-responsibility (and for all ). In principle, responsibilities freely range between 0 and 1. However, for fMRI data, we generally observe that responsibilities converge to binary values , which directly form the expected latent variable assignments. For notational convenience, we therefore denote the expected latent variable assignments of a clustering solution although they are not the “true” latent variables.

#### 2.1.3 Variance Prior and Model Convergence

*k*and voxel

*v*, we now obtain a nonzero . In order to minimize the influence of the prior on our solution, the hyperparameter is set to a minimal value of This provides a maximum a posteriori solution (MAP) to the problem, based on an augmented M-step. In more general variational Bayes settings, the prior would be naturally absorbed into variational free energy.

### 2.2 Model Validation

We examined how the addition of flexible modeling parameters affects model convergence, predictive generalization of functional connectivity, and parameter reproducibility compared to standard GMMs. The following four models were examined, ordered by increasing number of fitted parameters:

**Segregative model, no voxel scaling (SX)**: The standard GMM signal model. Cluster means are distributed as .**Segregative model, voxel scaling (SV)**: This model includes the voxel-specific network scaling parameter*w*. Cluster means are distributed as ._{kv}**Integrative model, no subject scaling (IX)**: The cluster means are now constrained to have a consistent connectivity matrix over all subjects. Cluster means are distributed as .**Integrative model, subject scaling (IS)**: This model admits subject-specific network scaling terms . Cluster means are distributed as .

The optimization space for GMMs is nonconvex. While the continuous latent variable model of PARAFAC2 has been shown to find a global optimum under most conditions (Kiers et al., 1999), this is not guaranteed for FSIM. For all models, to reduce the risk of identifying local minima, clustering was initialized by 10 runs of *k*-means clustering with random seed initialization, and the maximum-likelihood *k*-means solution was used to initialize the considered mixture model, which was then iteratively updated until convergence (see section 2.2.1 for further details on convergence criteria). To further avoid local minima in the mixture analyses, we performed 10 restarts of the entire process (10 *k*-means restarts, followed by full mixture-model convergence), and selected the maximum-likelihood overall solution. To determine how parameter choices affect the performance of the mixture models, we examined three issues: model convergence properties, generalization of between-voxel covariance relationships, and parameter reproducibility.

#### 2.2.1 Model Convergence

We compared the training model likelihoods and convergence rates for the different mixture models. The log likelihood was plotted as a function of the EM iteration step, for the 10 initializations of each mixture model. We also compared the average number of EM iterations until the fractional change in log likelihood fell below a fixed threshold of 10 and the average number of EM iterations until there were no measured changes in cluster assignments *z _{kv}* For all subsequent clustering results, EM iteration was terminated when both of these criteria were met.

*k*indicates whether voxels belong to cluster

*k*. For each pair of mixture model solutions , we computed the normalized mutual information (NMI) between and . This is a nonparametric measure of shared information, analogous to the Pearson correlation coefficient, and is invariant to permutations in the clustering partitions. It is computed as based on cluster entropy and and unnormalized mutual information . Entropy measures were obtained by computing the discrete probabilities of each cluster in and mutual information given by the identity . We compared the average convergence rates and NMI values for different mixture models for a set of representative model orders

*K*.

#### 2.2.2 Covariance Generalization and Model Reproducibility

The EM approach is prone to overfitting, and thus we cannot directly perform model comparison based on the (biased) likelihood defined in equation 2.2. Instead, split-half cross-validation is used to obtain unbiased estimates of model generalization error and parameter reproducibility. Generalization error quantifies the model’s ability to predict structure in independent data, and statistical models are often chosen to minimize generalization error, as this provides an optimal balance between bias and variance in parameter estimates (Efron & Tibshirani, 1986). Reproducibility determines the stability of the model parameters, which are used to visualize and interpret brain function. The proposed cross-validation framework allows us to evaluate both metrics, which exhibit complex trade-offs when fitting multivariate models of neuroimaging data (Rasmussen, Hansen, Madsen, Churchill, & Strother, 2012; Strother, Rasmussen, Churchill, & Hansen, 2015).

*Covariance Generalization*. We evaluated the generalization of clustering models, based on the predictive likelihood of subject covariance for independent test data, where denotes temporal degrees of freedom. This approach is similar to the one used for component subspace selection in Hansen et al. (1999), but here used to directly predict covariance matrices. We employed a within-subject split-half framework for group data, as individual subjects are expected to have different covariance structure. For a subject data set , each matrix was split into two halves containing an equal number of time points. The FSIM was fit on split-half 1 and used to predict covariance for split-half 2. The split halves were then exchanged, with FSIM fit to split-half 2 and covariance predicted for split-half 1. Afterward, prediction was averaged across the two splits. For a *K*-component clustering model, the predictive likelihood is defined as follows.

*K*-cluster representation of the signal subspace is . The covariance of the estimated signal matrix is To model total covariance, we also include noise estimated by the mixture models. For subject

*s*, a diagonal matrix of voxel-wise noise estimates is defined . That is, for voxel

*v*and subject

*s*, we assign variance from the “true” latent class. To correctly account for degrees of freedom, the values are rescaled by , giving full-rank covariance estimate for subject

*s*:

*K*.

*Parameter reproducibility*. It is crucial to evaluate the reliability of GMM parameters, as they are used to interpret network structure in the brain. Reproducibility of spatial parcellations was evaluated using the split-half approach defined for covariance generalization. FSIM models were fitted independently for data split halves 1 and 2, and the resulting latent variable matrices were compared via normalized mutual information (NMI) as defined in equation 2.4. This quantifies the test-retest reliability of clustering models for a fixed group of subjects.

We also evaluated the reliability of between-cluster connectivity for the full FSIM in order to demonstrate that it produces robust network models. For this letter, we focused on partial correlation matrix , which may be computed from (Marrelec et al., 2006). This measure is of particular interest for connectivity modeling as it accounts for interdependencies between brain regions. Because split halves may have different parcellations, their connectivity values cannot be directly compared. As an alternative, we employed a within-split bootstrapping procedure. The FSIM was initially fit using all subject data and the resulting parcellation held fixed. Subjects were then sampled with replacement for iterations. For each bootstrap sample, we refit FSIM for fixed , producing a new estimate of which was used to compute . For each connectivity element (), we obtained an empirical significance estimate by converting bootstrapped partial correlations to normally distributed variables via Fisher’s Z-transform . From the set of transformed values , we computed the bootstap ratio as the mean divided by the standard error which has a *z*-score distribution (Efron & Tibshirani, 1986), allowing us to estimate *p*-values based on the two-tailed normal distribution. We estimated *p*-values for all elements of , adjusted for multiple comparisons at a false-discovery Rate (FDR) threshold of 0.05.

### 2.3 Interpreting Model Parameters

This section focuses on the information provided by the full parametric FSIM implementation for experimental fMRI data. We demonstrated example results and interpreted (1) the segregative structure, including latent variable parcellations *z _{kv}* and noise variance , and (2) integrative structure, including group connectivity-forming matrix , voxel-specific scaling profile

*w*, and subject-specific scaling profile

_{kv}*c*. These results were demonstrated in experimental resting-state data for the full FSIM model and representative clusters. To better visualize results, we performed

_{ks}*k*-means clustering directly on the connectivity matrix and grouped parcels into subclusters. For this step, the optimal number of clusters was based on the variance ratio criterion (Caliński & Harabasz, 1974).

We also demonstrated the utility of the subject-specific network scaling profile *c _{ks}* by using it to identify parcellations where network scaling is predictive of subject age. This was estimated by performing principal component regression (PCR) on the scaling profile matrix , using it to predict the vector of subject ages in a 10-fold cross-validation framework. For each cluster size

*K*, we measured prediction using the statistic (Consonni, Ballabio, & Todeschini, 2010), which is the cross-validated equivalent of the coefficient of determination; significance was determined via bootstrapped confidence bounds on the estimates (1000 iterations). We plotted as a function of model order

*K*, and for the model that had optimal prediction, we identified parcels showing a significant change in network expression with age. This was done by computing the bootstrap ratio on regression weights for each parcel (bootstrapped mean / standard error) and plotting significant parcels after multiple comparison correction for an FDR threshold of 0.05.

### 2.4 Simulation Data

To validate the proposed metrics of covariance generalization and NMI of spatial parcellations, we analyzed simulated data with a known number of parcels. We simulated 2D brain slices, consisting of six square brain parcels (864 pixels total), acquired at 100 time points per split-half. For this study, simulated data were generated from the FSIM model (the IS model defined in section 2.2) by randomly sampling all parameters of the cluster means for each data set, under the appropriate constraints. They were generated as follows: orthonormal basis was obtained by generating random normal vectors and applying sequential Gram-Schmidt orthogonalization, connectivity-forming matrix was generated as a random normal matrix with unit-normalized columns, and were generated as a random uniformly distributed matrix with range [0, 1]. We also generated spatial weightings *w _{kv}* within each parcel, using a square cosine basis to define relative scaling from 0.1 (edge) to 1 (center). Independent gaussian noise was also simulated for each pixel, time point, and subject. To account for the temporally smooth hemodynamic response in fMRI, both signal and noise time series were convolved with AFNI’s SPMG1 function (afni.nimh.nih.gov/pub/dist/doc/program_help/3dDeconvolve.html), using standard parameter settings and a TR of 2.0s. In order to test FSIM performance for a range of signal strengths, background noise was scaled so that the average signal-to-noise ratio (SNR) in regions of peak scaling across all clusters and subjects, was . Averaging over all pixels, this corresponded to a mean SNR over the entire “brain slice” of . This scheme was used to simulate data sets of 20 subjects, with 100 time points per split-half.

Along with results in the main text, appendix D shows the effects on model prediction and reproducibility due to including or excluding different network variability parameters. It also shows the impact of between-subject variability in the underlying connectivity matrix , to determine if FSIM is robust in cases of inconsistent core connectivity. In addition, appendix E shows the performance of the FSIM for data generated under a continuous latent variable model, where independent components may be spatially overlapped; this is the basis for standard principal component analysis (PCA) and independent component analysis (ICA) models. These results demonstrates how the FSIM recovers and represents overlapped signals.

### 2.5 Experimental Data

To evaluate the different clustering models, we used a resting-state data set previously published in Andersen et al. (2014), consisting of 30 healthy subjects (median age 45, range 22–69; 15 female). The data were acquired during 20 minutes of eyes-closed resting-state scanning, using a Siemens Magnetom Trio 3T scanner. Scan volumes were obtained using echo-planar imaging with an isotropic resolution of mm, with sequence parameters: TR = 2.49 s, TE = 30 ms, FA = 90; 42 slices, acquired in a matrix. This resulted in 480 acquired volumes per subject, which we preprocessed using the SPM8 software package (SPM8, rev. 5236). The standard preprocessing steps included rigid-body realignment to the subject mean, slice timing correction with sinc interpolation, and spatial normalization to an MNI template using affine warping and with discrete cosine transform basis functions (Ashburner & Friston, 1999). To further reduce noise and variability in the data, we performed spatial smoothing with an isotropic 6 mm FWHM gaussian kernel and corrected for head motion spikes using the algorithm established in Campbell, Grigg, Saverinu, Churchill, and Grady (2013). Global signal fluctuations were removed by regressing out the first PC of each run (Carbonell, Bellec, & Shmuel, 2011), and residual variance due to nonrigid head motion effects was modeled using the first two principal components of the six rigid-body realignment parameters. Low-frequency signal fluctuations potentially caused by hardware instability or gradual motion were modeled using a third-order Legendre polynomial basis (chosen using AFNI’s selection heuristic; afni.nimh.nih.gov/pub/dist/doc/program_help/3dDetrend.html). We controlled physiological fluctuations in the data by removing vascular, CSF, and white matter signal using the PHYCAA+ algorithm (Churchill & Strother, 2013). To reduce dependence between data split halves, temporal preprocessing steps were performed separately on the split halves.

## 3 Results

### 3.1 Simulation Data

Figure 2 plots the true simulation parameters (left column) and parameters estimated by the four different clustering models at () for the true model order of . For spatial parcellations (row 1), all models except SX (the standard GMM) correct identify clusters based on *z _{kv}*, with few errors near the between-parcel boundaries, where signal scaling is lowest. The SX fails due to an inability to account for voxel scaling

*w*, allocating voxels with low scaling to incorrect clusters. In contrast, VS, IX, and IS successfully reconstruct the spatial scaling of cluster signals in the

_{kv}*w*maps (row 2). Both IX and IS also accurately model between-cluster connectivity (row 3), although IS appears more similar to the true model. Finally, the IS estimates subject-specific network scaling via

_{kv}*c*profiles (row 4). They are highly consistent with true subject scaling, showing a mean correlation of 0.92 0.03 (average SD for the six clusters).

_{ks}Figure 3 depicts cross-validation metrics as a function of model order *K* for the simulated data. For all tested SNR levels, the predictive log likelihood saturates at the true model order of , and the simplest SX model (i.e., a standard GMM) shows lowest prediction, followed by the highly constrained IX model. The full IS model and highly flexible SV model show no significant differences in performance (, Wilcoxon paired-rank test on subject prediction values at ), indicating that the more heavily parameterized IS approach is not significantly more biased. However, as shown in appendix D, in cases with inconsistent between-subject connectivity , the SV has higher predictive log likelihood than IS. This is expected, as SV does not impose any consistency constraints on between-cluster connectivity. Examining reproducibility of the spatial parcellations, all data sets show a peak at the true model order , and there are no consistent differences between models. Both the predictive log likelihood and NMI decrease for data with lower SNR, as expected.

The FSIM and cross-validation metrics are also robust in cases where simulated data do not conform to discrete latent variable modeling assumptions. Appendix E shows that the FSIM also provide interpretable results for data generated from a component-based model with significant spatial overlap, although it requires a higher model order than the intrinsic subspace dimensionality in order to fully represent functional connectivity structure.

### 3.2 Model Convergence

Figure 4A plots the 10 log-likelihood curves for training data for each of the different mixture models at a representative clusters. Log likelihood is plotted across EM iterations, showing that all models all rapidly plateau within five iterations, with more gradual improvements beyond this point. The models with constrained covariance (IX, IS) have consistently lower training likelihood compared to the purely segregative models (SX, SV), whereas the addition of voxel-scaling profiles (SV) and subject-scaling profiles (IS) increases model likelihood. Figure 4B plots the 10 log-likelihood values at EM convergence as a function of *K*, demonstrating that training likelihood consistently grows with number of clusters and models without constrained covariance (SX, SV) show a more rapid increase in likelihood than those with constrained covariance (IX, IS).

Table 1 provides convergence statistics for the different models, at *K* = 8, 60, 100. The consistency of parcellations (based on pairwise NMI between clustering runs) is not significantly different between models (, nonparametric Friedman rank test), and increases for higher numbers of clusters. In addition, the parcellations *z _{kv}* tend to converge within five EM iterations for all models and across all tested cluster sizes. Conversely, the log likelihood converges relatively rapidly for the standard SX model but tends to require a greater number of iterations for more complex models. However, no consistent differences were identified between models (, Friedman test) due to the relatively high variability across EM runs.

. | . | SX . | SV . | IX . | IS . |
---|---|---|---|---|---|

K = 8 | Parcellation NMI | 0.672 0.08 | 0.681 0.08 | 0.693 0.07 | 0.657 0.04 |

Convergence (z) _{kv} | 3.5 0.8 | 5.6 1.6 | 5.3 1.5 | 6.1 1.6 | |

Convergence (likelihood) | 6.1 0.3 | 14.1 3.5 | 21.31 6.7 | 21.3 8.7 | |

K = 60 | Parcellation NMI | 0.732 0.01 | 0.738 0.04 | 0.737 0.04 | 0.732 0.01 |

Convergence (z) _{kv} | 3.9 0.8 | 5.1 1.3 | 5.0 1.3 | 5.7 1.3 | |

Convergence (likelihood) | 7.3 0.9 | 25.4 11.2 | 15.5 5.5 | 32.5 9.3 | |

K = 100 | Parcellation NMI | 0.735 0.01 | 0.743 0.04 | 0.741 0.04 | 0.741 0.04 |

Convergence (z) _{kv} | 3.9 0.6 | 4.2 1.6 | 4.7 1.5 | 5.6 1.2 | |

Convergence (likelihood) | 8.2 1.2 | 24.4 11.3 | 12.4 2.3 | 26.8 7.3 |

. | . | SX . | SV . | IX . | IS . |
---|---|---|---|---|---|

K = 8 | Parcellation NMI | 0.672 0.08 | 0.681 0.08 | 0.693 0.07 | 0.657 0.04 |

Convergence (z) _{kv} | 3.5 0.8 | 5.6 1.6 | 5.3 1.5 | 6.1 1.6 | |

Convergence (likelihood) | 6.1 0.3 | 14.1 3.5 | 21.31 6.7 | 21.3 8.7 | |

K = 60 | Parcellation NMI | 0.732 0.01 | 0.738 0.04 | 0.737 0.04 | 0.732 0.01 |

Convergence (z) _{kv} | 3.9 0.8 | 5.1 1.3 | 5.0 1.3 | 5.7 1.3 | |

Convergence (likelihood) | 7.3 0.9 | 25.4 11.2 | 15.5 5.5 | 32.5 9.3 | |

K = 100 | Parcellation NMI | 0.735 0.01 | 0.743 0.04 | 0.741 0.04 | 0.741 0.04 |

Convergence (z) _{kv} | 3.9 0.6 | 4.2 1.6 | 4.7 1.5 | 5.6 1.2 | |

Convergence (likelihood) | 8.2 1.2 | 24.4 11.3 | 12.4 2.3 | 26.8 7.3 |

Notes: Parcellation NMI is the mean pairwise NMI between all 10 clustering runs: Convergence (*z _{kv}*) is the mean number of iterations before cluster assignments do not change. Convergence (likelihood) is the mean number of iterations before change in log likelihood is less than 10. Results are shown standard deviation.

### 3.3 Covariance Generalization and Model Reproducibility

Figure 5 depicts independent cross-validation metrics for the different clustering models. Predictive log likelihood is depicted in Figure 5A. Unlike the training log likelihood (see Figure 4), both constrained SV and IS models outperform SX (the standard GMM), although SV shows highest likelihood, particularly for larger numbers of clusters. This is expected, since there are no constraints on individual subjects’ covariance, allowing more flexible subject fit. The models show a consistent ranking on prediction across *K*, (; Friedman test), where SV IS IX SX (post hoc Nemenyi permutation test, significance). The SV consistently outperforms the full IS model in experimental data, which matches simulation results with simulated between-subject variability in connectivity networks (refer to section 4 for more on this point). For all models, the predictive log-likelihood curves do not saturate within clusters, indicating that high-order clustering models continue to identify generalizable covariance structure. Figure 5B plots reproducibility of the spatial parcellations. There is no significant ranking in reproducibility between the different models (; Friedman test). While reproducibility curves are somewhat noisy, they tend to increase until saturating at for all models, indicating the stability of parcellations is consistently maximized at this point for the tested GMMs.

Figure 6 displays the full network estimated by FSIM for a representative cluster, including the spatial parcellations and between-cluster connectivity values, based on partial correlation estimates , with bootstrapped significance estimates. The FSIM produces significantly reliable correlations between the majority of brain parcellations, even after correcting for multiple comparisons at an FDR of 0.05.

### 3.4 Interpreting Model Parameters

Figure 7 depicts the segregative structure identified by the FSIM for clusters. Figure 7 (top) plots the whole-brain parcellation based on the *z _{kv}* estimates for each voxel; each color corresponds to a different cluster

*k*. We observe clear delineation of cortical subregions. At this clustering scale, many bilateral brain regions are combined into the same cluster, indicating strong coherence in BOLD fluctuations. Figure 7 (bottom) plots noise variance at each voxel, obtained from its assigned cluster. This map quantifies how well individual voxels are represented by their cluster means . Many interior gray matter regions are highly consistent (e.g., the insula and thalamus), as are superior frontal regions. Conversely, dorsal areas show comparatively high noise variance, including the cuneus, middle cingulum, and parietal lobes, indicating relatively high signal heterogeneity within these regions.

Figure 8 represents integrative structure obtained by the FSIM for clusters. Figure 8A shows the most consistent group connectivity matrix , with strong correlations (positive and negative) and a clear grouping structure within this matrix. Figures 8B and 8C show representative parcels from two subgroups, illustrating the relationship between these brain regions. For each subgroup, we plot (left) parcellation given by *z _{kv}* and (right) the map of voxel scaling values

*w*for network components. In Figure 8B, we observe elements of the default mode network, including parcels within the posterior cingulate, left and right middle temporal lobes, and ventromedial prefrontal cortex. Figure 8C depicts elements of the sensorimotor network, with parcels located in the supplementary motor area and both precentral and postcentral gyri.

_{kv}The binary plots of *z _{kv}* show brain regions that have been assigned to each cluster. In contrast, the voxel scaling profiles

*w*show nonzero values in voxels extending outside each cluster. These plots can be interpreted as regression maps, showing the relative association between the cluster prototype and all brain voxels. This has two uses: (1) determining which brain regions within a cluster of interest are well represented by the mean time series and (2) identifying brain regions from other clusters that show highly similar BOLD fluctuations. Given a specific cluster of interest, the latter may be used to identify other clusters in the brain that are likely to show strong associations in the integrative matrix

_{kv}Figure 9 plots the results of a regression analysis using cluster scalings *c _{ks}* to predict subject age. One subject was identified as an outlier across all

*K*, based on Cook’s distance criterion (Cook, 1977) and was excluded (no values were significantly greater than zero if included). The predicted regression coefficient is plotted in Figure 9A, as a function of

*K*. For low

*K*, we obtain negative values, indicating worse prediction than a flat model that assigns all subjects the mean age . However, prediction improves beyond and peaks at .

Figure 9B depicts *c _{ks}* values for the model with (excluded) outlier subject #22, showing both subject-specific and cluster-specific trends in network scaling. Figure 9C shows the predicted ages based on

*c*values, indicating good fit for most of the subjects Figure 9D displays maps of the significant clusters. All significant parcels have negative bootstrap ratios including the sensorimotor cortex and cingulum, indicating reduced network expression in these brain regions as a function of age.

_{ks}## 4 Discussion

One of the principal goals of functional connectivity analysis is to learn the properties of brain function, both segregative and integrative, that underlie cognition. Unsupervised clustering models are increasingly applied to BOLD fMRI data, to parcellate the brain based on similarity of BOLD time series and measure functional connectivity between clusters (Li, Guo, Nie, Li, & Liu, 2009; Thirion, Varoquaux, Dohmatob, & Poline, 2014). However, these clustering models have limited parametric flexibility and do not account for integrative between-cluster relationships. In this letter, we propose FSIM (functional segregation and integration model), a major extension of GMMs that learns the most consistent between-cluster connectivity matrix in group fMRI data, along with profiles of voxel- and subject-specific network expression. This is the first attempt to jointly model segregative and integrative properties of brain networks in a mixture-model framework. The FSIM provides a descriptive model of the functional connectivity structure in fMRI data, with interpretable parameters. We also examined how different choices in model parameters affect covariance generalization and the stability of model parameters.

### 4.1 Effects of FSIM Model Parameters

The addition of constraints (e.g., consistent group connectivity) and flexibility terms (e.g., voxel and subject scaling) have a modest effect on rates of model convergence. Moreover, they do not affect the reproducibility of spatial parcellations across split halves, as there was no significant difference in NMI for the different clustering models. This indicates that the use of more complex clustering models does not significantly reduce the stability of spatial clustering.

However, measures of covariance generalization showed significant model effects. For example, the voxel scaling profile *w _{kv}* appears to be critical for modeling group-level functional networks, as all models that include it (SV, IX, IS) outperform the model that does not (SX). Moreover, these spatial scaling values

*w*must be relatively consistent across subjects and data splits to give high generalization. Similarly, subject scaling profiles

_{kv}*c*improve the generalization of IS compared to IX; this indicates that subjects have sufficiently reliable network scaling (i.e., between test-retest splits) that it improves generalization. Conversely, the fixed between-cluster connectivity matrix in IS leads to reduced generalization compared to the unconstrained SV. Taken with the simulation data, this is evidence of significant between-subject variability in network connectivity of experimental data, which is unsurprising. For example, Daimoiseaux et al. (2006) found that many functional networks are highly variable across subjects. We emphasize that this reduced generalizability should not be considered a failing of the FSIM model. Rather, it demonstrates the inherent challenge of identifying consistent functional connectivity structure in heterogeneous group data. FSIM provides an optimal parametric description of this connectivity structure, which is not estimated by standard clustering models but must be extracted as a postprocessing exercise.

_{ks}Regarding the training log likelihood, standard GMMs show higher likelihood relative to the FSIM model. However, this measure does not account for integrative relationships, whereas our predictive covariance model considers both within- and between-cluster relationships. The covariance log likelihood is a more generalizable metric, since it does not presume a fixed BOLD time series across test-retest sessions, an issue for resting-state tasks without temporally constrained stimuli. Although there is a consistent difference between different GMMs, the log likelihoods are relatively similar; the choice in number of clusters (*K*) has a larger impact than model parameterization. The reduced importance of model parameterization is also likely due to the flexible noise estimator , which absorbs voxel/subject-specific variability.

### 4.2 Model Order and Cross-Validation Metrics

Although predictive likelihood saturated at the true model order for simulated data, it did not maximize for experimental data within the tested range of to 100 but increased continually. This suggests that there is a generalizable covariance structure for cortical divisions well beyond the tested range of . These findings agree with literature showing reliable cortical subdivisions in fMRI down to the millimeter scale, dictated by the spatial extent of arterioles that irrigate neural tissue (Detre & Wang, 2002). This is corroborated by Thirion et al. (2014), who found that the predictive likelihood of clustering models saturates at for task-based fMRI data. However, this level of granularity is not always desirable, as higher model orders *K* trade off better generalization for decreased model simplicity (and interpretability). Studies of cross-validated task fMRI have shown that optimization based solely on prediction tends to select excessively high-order models, whereas reproducibility often provides a better metric for choosing robust, interpretable models (Strother et al., 2015). In this study, this is supported by NMI of spatial parcellations, which plateaus near , with no significant improvements beyond this point. Current results show reliable spatial parcellations for up to in experimental data (NMI 0.70) and highly robust functional connectivity structure based on bootstrapped partial correlations.

These findings also provide evidence that when FSIM clustering is performed, the criterion for model order selection may depend on the specific research aims. If the goal is simply to predict covariance structure, the user may prefer the highest attainable model order *K*, determined by available computational resources. If instead the goal is to define the most parsimonious segregative structure, NMI may be used to identify the lowest-order *K* at which reproducibility plateaus. Alternatively, if the goal is to use brain function to predict specific behavioral, demographic, or clinical correlations, the user may select the *K* directly optimizing cross-validated regression scores as in Figure 9.

### 4.3 Interpretation of FSIM Parameters

In this letter, we demonstrated that FSIM provides a detailed parametric model of connectivity structure and subject-dependent variability. These parameters provide information about connectivity structure beyond standard GMMs. The matrix provides a direct estimate of integrative structure without relying on post hoc connectivity estimation. Spatial scaling profiles *w _{kv}* may be used to determine which brain regions within a cluster are well represented by cluster means and which voxels outside a cluster show strong associations and, thus, potentially high connectivity. Subject scaling profiles

*c*may be used to characterize between-subject differences in network expression and identify potential outliers.

_{ks}Importantly, FSIM model parameters are also relevant for studies of behavior and cognition, as they can be used to model the relationships between network modulation and behavioral measures. As an example, we demonstrated this by investigating changes in network scaling as a function of subject age. The analysis revealed age-related decreases in variance in elements of the sensorimotor cortex and cingulum. This is consistent with prior studies showing decreased BOLD signal as a hallmark of aging, with an emphasis on motor cortex (D’Esposito, Deouell, & Gazzaley, 2003), and prior studies in aging that reported decreased BOLD variance (Garrett, Kovacevic, McIntosh, & Grady, 2010; McIntosh et al., 2010). Interestingly, we did not observe decreases in visual cortex, which is also commonly reported (D’Esposito et al., 2003). This may be due to prior studies focusing on task-based activation, which may have a distinct profile from resting-state network expression.

### 4.4 Future Directions

While current results are promising, further development of FSIM will be an important area of future research. The maximum a posteriori (MAP) approach provides robust model solutions; however, these methods are prone to model overfitting, necessitating cross-validated model comparison. An appealing alternative is to extend the current model to fully Bayesian inference. Variational Bayes (VB) provides a natural extension of the current framework, in which a solution is obtained by minimizing variational free energy (Corduneau & Bishop, 2001). This would allow the computation of a bound on model evidence to compare between models and optimize model order *K*. The VB approach represents an alternative to cross-validation, as it automatically balances between model generalization and parsimony. It will be important to investigate trade-offs between these approaches: VB methods have a larger sample on which to compute parameter estimates compared to split-half methods but may also be prone to underestimating parameter uncertainty.

The development of a complete VB model is beyond the scope of this letter, as this requires distributions over all model parameters, including the orthonormal matrices needed for consistent covariance estimation. There exist distributions on orthonormal matrices, typically based on the hypergeometric function of a matrix argument, including the Von Mises-Fisher (or Langevin) and more general matrix-Bingham distributions (Turaga, Veeraraghavan, & Chellappa, 2008). An alternative to VB is the variational Laplace method (Friston, Mattout, Trujillo-Barreto, Ashburner, & Penny, 2007), which finesses many of the difficulties of VB. This method uses approximate forms to represent unknown variables, and priors need not be conjugate. Future work will attempt to develop a complete variational framework based on these distributions and building on methods proposed by Šmídl and Quinn (2007) for Bayesian PCA.

The FSIM characterizes linear dependence between regional BOLD responses in terms of unknown regional clusters and an unknown functional connectivity matrix that is conserved over subjects. As we have shown in this letter, this approach is effective at summarizing high-dimensional fMRI data in a low-rank representation. This linear gaussian model can be considered one approach among a range of possible dependency measures, chosen for its analytic tractability and robust results. In addition, there is evidence that linear measures can reliably account for the majority of fMRI signal (Hlinka, Paluš, Vejmelka, Mantini, & Corbetta, 2011) and provide a useful, reliable metric of brain function (Raichle et al., 2001; Greicius et al., 2003).

However, fMRI signal arises from a network of nonlinear neural oscillators, mediated via complex neurovascular response. As such, measures of linear dependency provide limited information about the neural architecture and causal processes that underlie these dependencies. In this respect, nonlinear effective connectivity models such as dynamic causal modeling (DCM) are well suited, identifying models based on Bayesian evidence (Penny et al., 2004; Stephan et al., 2010). However, due to their inherent complexity, these models are also highly dependent on the a priori selection of brain regions of interest. This suggests the utility of combining these approaches, using FSIM to identify linear dependencies in the brain, and then applying models of effective connectivity to characterize underlying neuronal structure.

Finally, the current FSIM framework adapts elements of the PARAFAC2 model (a continuous latent variable model) into the discrete latent variable framework. This involves significant trade-offs, as discrete models are able to parameterize data specifically in terms of segregative structure (discrete parcellations) and integrative structure (interregional connectivity), which is particularly relevant to domains such as graph-theoretical network analysis. As shown in appendix E, we also successfully applied clustering models to simulated continuous latent variable (i.e., component-based) data. Nonetheless, clustering models tend to have higher computational burden and nonconvex optimization space, often making them more challenging to implement. To date, there has been limited direct comparison between these approaches to functional connectivity modeling, although they are equivalent under certain sparsity conditions (Vinnikov & Shalev-Shwartz, 2014), and in some cases, clustering models may outperform component-based models in separating signal from correlated noise (Baumgartner et al., 2000). Further research is needed to compare the utility of these approaches under a wider set of cohorts and experimental conditions.

## Appendix A: Deriving the EM Update Equations

In this appendix, we provide full, explicit solutions for the parameter update equations of the EM algorithm. All M-steps (i.e., all solutions except cluster responsibilities) are obtained by finding the maximum-likelihood solution of the auxiliary equation, equation 2.3.

### A.1 Cluster Responsibilities ()

*v*. This allows us to use Bayes’ rule to directly evaluate the conditional likelihood that each voxel belongs to cluster

*k*: where data likelihood is given by marginalizing over the full latent space.

### A.2 Cluster Priors

*Q*that are constant in . This requires that we optimize the expression This can be solved in the manner of standard GMM solutions (Dempster et al., 1977), defined as follows. Because are cluster probabilities, we have the constraint . Using the methods of Lagrange multipliers, this can be solved by the constrained optimization of We take the partial derivative to find the optimum. This yields the expression Using the trick of summing over

*k*on both sides of the equation, this solves for . Resubstituting, Cluster priors are thus given by the average responsibility over all voxels.

### A.3 Noise Variance

### A.4 Voxel-Specific Network Scaling

### A.5 Subject-Specific Network Scaling

*Q*that are constant in

*c*, we obtain: Taking the partial derivative to find the stationary point, we obtain which rearranges to solve for

_{ks}*c*as

_{ks}### A.6 Group Connectivity Matrix

*Q*that are constant in , the following expression must be optimized for the group connectivity matrix: Unlike the matrices, where nonzero elements are fully independent (i.e., they each contribute to unique cluster/subject), all elements of the matrix contribute across subjects and clusters. Thus, we must optimize the entire matrix simultaneously. This can be done by redefining as an expression of matrix traces, where , , , and is a vector of ones. Expanding, and further dropping terms constant in , we obtain By cyclic permutation and reordering of the diagonal matrices, this can be reexpressed: Taking the partial derivative , we obtain Rearranging to solve for , we obtain

### A.7 Orthonormal Subject Matrices

## Appendix B: A Weak Prior on Noise Variance

## Appendix C: Solving the Singular Wishart Distribution

*w*value, that is, , where “” denotes element-wise multiplication.

_{kv}## Appendix D: Effects of Simulated Variability Parameters

To demonstrate the impact of various aspects of signal variability, we examined four simulation parameter settings, based on the initial simulation model described in section 2.4: (1) a data set without voxel or subject scaling variability and a consistent core network (i.e., setting ; (2) a data set with voxel scaling only; (3) a data set with subject and voxel scaling; and (4) a data set with subject and voxel scaling, along with intersubject variability in the core network. Intersubject variability was simulated by generating subject-specific and computing cluster means based on the average . For each simulated data set, we measured covariance generalization as a function of *K* for the four clustering models.

The predictive likelihood curves are plotted in Figure 10. For data without signal variability models, all show comparable performance. The addition of spatial variability reduces generalization for SX (i.e., standard GMM model), which assumes uniform spatial scaling relative to the other models. The addition of subject variability reduces generalization for the highly constrained IX model, which assumes consistent subject scaling relative to SV and IS. Finally, the addition of between-subject variability in connectivity (see Figure 10, right most panel) causes IS models to show worse performance compared to SV, which assumes no consistent connectivity structure.

## Appendix E: Clustering and Continuous Latent Variable Data

This appendix demonstrates that discrete latent variable clustering models can also robustly describe data derived from a continuous latent variable model. As a simple example, we simulated data consisting of three temporally independent, spatially overlapped components (see Figure 11A), instead of the spatially disjoint parcels *z _{kv}* and spatial weights

*w*described in section 2.4. These components were temporally uncorrelated by setting Simulation parameters were otherwise the same as in previous simulation modeling, with

_{kv}Figures 11B and 11C show that predictive log likelihood saturates at clusters, and NMI rapidly decreases beyond this point, indicating this is the optimal model order. Figure 11D shows parameters of the FSIM at . Parcels and form the parts of components 1 and 2 that do not overlap with other components; this is shown by the near-zero correlation between and in the connectivity matrix. Parcel identifies the overlapped region between components 1 and 3, which is reflected in the strong correlation of with and in the connectivity matrix Here, overlap is represented as multiple (high) correlations between spatially disjoint parcels. A similar relationship is seen for parcel .

Thus, the discrete FSIM model gives a robust alternate representation of overlapped components, although it requires a higher model order () than the intrinsic subspace dimensionality 3 in order to fully represent the integrative structure. In general, for *P* overlapped components, clustering models may find up to disjoint regions with distinct connectivity structure. However, in practice, the optimal model order will depend on the effective spatial extent of components, and their signal-to-noise ratio (SNR).

## Acknowledgments

This work was supported by the Lundbeckfonden (fellowship grant R105-9813 to M.M.). K.H.M. was supported by Lundbeckfonden (Grant of Excellence “ContAct” R59-5399 to Hartwig Roman Siebner) and by a Novo Nordisk Foundation Interdisciplinary Synergy Grant (NNF14OC0011413). The Magnetom Trio MR scanner was donated by the Simon Spies Foundation.

## References

*Optimization for machine learning*