conpagnon.connectivity_statistics package¶

Submodules¶

conpagnon.connectivity_statistics.parametric_tests module¶

conpagnon.connectivity_statistics.parametric_tests.design_matrix_builder(dataframe, formula, return_type='dataframe')[source]¶

Build a design matrix based on a dataframe

Parameters

dataframe (pandas.DataFrame) – A pandas dataframe containing the data.
formula (str) – The formula written in R style, with the variable to explain in the left side of the ~, and the explanatory variables in the right side.
return_type (str, optional) – Return type of the response variable, and design matrix. Default is dataframe. Other choices are: matrix.

Returns

output1 (pandas.DataFrame) – A dataframe, of shape (n_observations, ). This is the variable to explain
output2 (pandas.DataFrame) – The design matrix, of shape (n_observation, n_explanatory_variables +1)

conpagnon.connectivity_statistics.parametric_tests.distribution_estimation_mean_subjects_connectivity(mean_matrix_for_each_subjects, groupes, kinds)[source]¶

Provides an estimation of mean and standard deviation of the distribution for the two groupes under study, of the mean matrix of each subjects.

Parameters

mean_matrix_for_each_subjects (dict) – A dictionnary containing, at the first level, the groups as keys, and the kinds as values. Inside each each kind, a dictionnary containing the subjects identifier as values, and the mean functional connectivity of that subject as values.
kinds (list) – The list of kinds to test.
groupes (list) – The list of the two groupes in the study.

Returns

output – A dictionnary with the groupes as keys, and for each group the estimated standard deviation, the estimated mean of the distribution, and the array of shape (n_subject, ) of the mean matrix of each subjects.

Return type

dict

conpagnon.connectivity_statistics.parametric_tests.functional_connectivity_distribution_estimation(functional_connectivity_estimate)[source]¶

Estimates the mean and standard deviation of functional connectivity distribution assuming a Gaussian behavior.

Parameters

functional_connectivity_estimate (numpy.array, shape(n_features, n_features)) –
shape 0.5*n_features*(n_features + 1) (or) – A functional connectivity matrices, if a 2D array is provided it will vectorized discarding the diagonal.

Returns

output 1 (numpy.array shape 0.5*n_features*(n_features + 1)) – The vectorized functional connectivity array.
output 2 (float) – The estimated mean of the data.
output 2 (float) – The estimated standard deviation of the data.

See also

scipy.stats.norm(): This function from the scipy library is used here, to estimate the mean and the standard deviation of the data.
pre_preprocessing.fisher_transform(): To ensure a normal behavior of the connectivity coefficient, this function apply a classical Fisher transform to the data.

Notes

We assume here that the functional connectivity your’re dealing with have a Gaussian behavior, and therefore can be describe properly with two parameters: the mean and the standard deviation. To ensure a Gaussian behavior transformation of the connectivity coefficient should be used like Fisher transform for correlation or partial correlation for example.

conpagnon.connectivity_statistics.parametric_tests.inter_network_two_sample_t_test(subjects_inter_network_connectivity_matrices, groupes, kinds, contrast, network_label_list, alpha=0.05, p_value_correction_method='fdr_bh', assuming_equal_var=True, nan_policy='omit')[source]¶

Test the difference of connectivity between network performing a two sample t test.

Parameters

subjects_inter_network_connectivity_matrices (dict) – A subjects connectivity dictionnary, with groupes in the study as the first levels of keys, the kinds in the study as the second levels of keys, and the inter network connectivity matrices as values, of shape (number of networks, number of networks)
groupes (list) – The list of the groups under the study.
kinds (list) – The list of kinds in the study.
contrast (list) – The contrast vector use for the t-test. Choices are [1.0, -1.0], or [-1.0, 1.0].
network_label_list (list) – The list of the name of the different network.
alpha (float, optional) – The type I error rate threshold. For p-value under alpha, the null hypothesis can be rejected. Default is 0.05.
p_value_correction_method (string, optional) – The correction method accounting for the multiple comparison problem. Default is ‘fdr_bh’, the traditional False Discovery Rate correction from Benjamini & Hochberg.
assuming_equal_var (bool, optional) – If False, the Welch t-test is perform accounting for different variances between the tested sample.
nan_policy (string, optional) – Behavior regarding possible missing data (nan values). Default is ‘omit’.

Returns

output – A dictionnary structure containing the t-test result : the raw t statistic array, the corrected and uncorrected p values array, the masked t statistic array for corrected p values under the alpha threshold.

Return type

dict

conpagnon.connectivity_statistics.parametric_tests.intra_network_two_samples_t_test(intra_network_connectivity_dictionary, groupes, kinds, contrast, network_labels_list, alpha=0.05, p_value_correction_method='fdr_bh', assume_equal_var=True, nan_policy='omit', paired=False)[source]¶

Test the difference of intra network connectivity between the groups under the study.

Parameters

intra_network_connectivity_dictionary (dict) – A subjects connectivity dictionnary, with groupes in the study as the first levels of keys, the kinds in the study as the second levels of keys, and the intra network connectivity for each network in the study.
groupes (list) – The list of the groups under the study.
kinds (list) – The list of kinds in the study.
contrast (list) – The contrast vector use for the t-test. Choices are [1.0, -1.0], or [-1.0, 1.0].
network_labels_list (list) – The list of the name of the different network.
alpha (float, optional) – The type I error rate threshold. For p-value under alpha, the null hypothesis can be rejected. Default os 0.05.
p_value_correction_method (string, optional) – The correction method accounting for the multiple comparison problem. Default is ‘fdr_bh’, the traditional False Discovery Rate correction from Benjamini & Hochberg.
assume_equal_var (bool, optional) – If False, the Welch t-test is perform accounting for different variances between the tested sample.
nan_policy (string, optional) – Behavior regarding possible missing data (nan values). Default is ‘omit’.

Returns

output (dict) – A dictionnary structure containing the t-test result for each network : the t statistic value, the corrected and uncorrected p values, the intra network connectivity array of shape (number of subject, ) for each group, and the contrast vector. .
# TODO (add an argument to precise the field name in the dictionnary containing the network strength.)

conpagnon.connectivity_statistics.parametric_tests.linear_regression(connectivity_data, data, formula, NA_action, kind, subjects_to_drop=None, sheetname=0, save_regression_directory=None, contrasts='Id', compute_pvalues=True, pvalues_tail='two_tailed', alpha=0.05, pvals_correction_method=['fdr_bh'], nperms_maxT=10000, vectorize=False, discard_diagonal=False)[source]¶

Fit a linear model on connectivity coefficients across subjects.

Parameters

connectivity_data (dict) – The connectivity matrix organised in a dictionary. The subject identifier as keys, and metric as values, i.e the matrix which contain the connectivity coefficient. As metric are symmetric, matrix as to be vectorize in a 1D array of shape n_features.
formula (string) – The model, in a R fashion.
data (string) – The full path to the xlsx data file, containing all the dependant variables in the model you want to estimate.
NA_action (string) – Directive for handling missing data in the xlsx file. Choices are : ‘drop’: subject will discarded in the analysis, ‘raise’: raise an error if missing data is present.
sheetname (int) – The position in your excel file of the sheet of interest.
subjects_to_drop (list, optional) – List of subjects you want to discard in the analysis. If None, all the row in the dataframe are kept. Default is None.
kind (string) – The metric, present in the provided connectivity data you want to perform analysis.
save_regression_directory (string) – The full path to a directory for saving the regression results
contrasts (string or numpy.array of shape (n_features,n_features) optional) – The contrast vector for infering the regression coefficients. Default is ‘Id’, all regressors are tested.
compute_pvalues (bool, optional) – If True pvalues are computed. Default is True
pvalues_tail (string, optional) – If ‘two_tailed’, a two-sided t-test is computed. If ‘one_tailed’ a one tailed t-test is computed.
alpha (float, optional) – The Type error rate. Corrected p-values above alpha will be discarded. Default is 0.05.
pvals_correction_method (string, optional) – The method for accounting for the multiple comparison problems. Choices are among the statsmodels library : {‘bonferroni’, ‘sidak’, ‘holm-sidak’, ‘holm’, ‘simes-hochberg’, ‘hommel’, ‘fdr_bh’, ‘fdr_by’, ‘fdr_tsbh’, ‘fdr_tsbky’}, and the ‘maxT’ method in the mulm library. Default is ‘fdr_bh’.
nperms_maxT (int, optional) – If maximum statistic correction is chosen, the number of permutations. Default is 10000.

Returns

output 1 (dict) – The regression results, with the regressors variable as keys, and corrected and corrected p-values matrix, significant t-values matrix, t-values matrix under the alpha uncorrected threshold.
output 2 (numpy.array of shape (n_samples, n_features)) – The design matrix of the analysis.
output 3 (numpy.array of shape (n_samples, q), q: number of model to fit.) – The independent variable matrix with all the multiple outcome to fit.

conpagnon.connectivity_statistics.parametric_tests.mean_functional_connectivity_distribution_estimation(mean_groups_connectivity_matrices)[source]¶

Estimates for the mean connectivity matrices for each group, the mean and standard deviation assuming gaussian distribution

Parameters

multi-levels dictionnary organised as follow (A) –

The first keys levels is the different groups in the study.
The second keys levels is the mean connectivity matrices for

the different kinds. They are array of shape (number of regions , number of regions).

Returns

output –

A dictionnary organised as follow:

The first keys levels is the different groups in the study.
The second keys levels is the kinds present in the provided dictionnary.
The third levels keys contain the estimated mean, the estimated standard deviation,

and the vectorized array of connectivity coefficients.

Return type

dict

Notes

Apply a Z-fisher transformation to the input matrices can be useful to improve and ensure a normal behavior of the data.

conpagnon.connectivity_statistics.parametric_tests.ols_regression(y, X)[source]¶

Fit a linear model with ordinary least square regression from statmodels library

Parameters

y (array-like) – The variable to explain of shape (n_observations, )
X (array-like) – The design matrix, of shape (n_obervations, n_regressors) or (n_obervations + 1, n_regressors) if intercept is added in the model.

Returns

output – A statsmodels regression object containing the fit of the model.

Return type

statsmodels.regression.linear_model.RegressionResultsWrapper

conpagnon.connectivity_statistics.parametric_tests.ols_regression_formula(formula, data)[source]¶

Fit a linear model with a formula API in R style.

Parameters

formula –
data –

Returns

conpagnon.connectivity_statistics.parametric_tests.partial_corr(C)[source]¶

Returns the sample linear partial correlation coefficients between pairs of variables in C, controlling for the remaining variables in C. :param C: Array with the different variables. Each column of C is taken as a variable :type C: array-like, shape (n, p)

Returns: P – P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling for the remaining variables in C.
Return type: array-like, shape (p, p)

conpagnon.connectivity_statistics.parametric_tests.regress_confounds(vectorize_subjects_connectivity, confound_dictionary, groupes, kinds, data, sheetname, NA_action='drop')[source]¶

Regress confound on connectivity matrices

Parameters

vectorize_subjects_connectivity (dict) – The subject connectivity matrices dictionary, with vectorized matrices, WITHOUT the diagonal.
confound_dictionary (dict) – The nested dictionary containing for each group and kind: a field named ‘confounds’ containing a list of confounds. A second field named ‘subjects to drop’ containing a list of subjects identifier to drop, None if you want to pick all of the subjects.
groupes (list) – The list of group on which you want to regress confounds.
kinds (list) – The list of kind
data (str) – The full path, including extension to the excel file containing the confound for each subjects. This will be read by pandas. Note that the index of the resulting dataframe must be the subjects identifiers.
sheetname (str) – The sheet name if the excel file containing the confound for each subjects of each groups.
NA_action (str, optional) – Behavior regarding the missing values. If ‘drop’, the entire row is deleted from the design matrix.

conpagnon.connectivity_statistics.parametric_tests.two_sample_t_test_(connectivity_dictionnary_, groupes, kinds, field, contrast, assume_equal_var=True, nan_policy='omit', paired=False)[source]¶

Perform a simple two sample t test between two sets of connectivity matrices.

Parameters

connectivity_dictionnary_ (dict) – A dictionnary which contain some connectivity of interest, and associated measure, in a ConPagnon subjects connectivity matrices like.
groupes (list) – The list of groupes under in the study.
kinds (list) – The list of kinds in the study
field (string) – The field name in the dictionary containing the connectivity coefficient array. It should be a 1D vector of shape (number of subject, ).
contrast (list) – The contrast vector. Choices are [1.0, -1.0], or [-1.0, 1.0].
assume_equal_var (bool, optional) – If False, a Welch t-test assuming different variances between the two group.
nan_policy (string, optional) – Behavior regarding the missing value in the tested data. Default is ‘omit’

Returns

output – A dictionnary with the raw t statistic, the used contrast vector, and the uncorrected p values.

Return type

dict

conpagnon.connectivity_statistics.parametric_tests.two_sample_t_test_on_mean_connectivity(mean_matrix_for_each_subjects, kinds, groupes, contrast)[source]¶

Performs a two sample t-test between the two groupes in the study on the mean matrix of each subjects.

Parameters

mean_matrix_for_each_subjects (dict) – A dictionnary containing, at the first level, the groups as keys, and the kinds as values. Inside each each kind, a dictionnary containing the subjects identifier as values, and the mean functional connectivity of that subject as values.
kinds (list) – The list of kinds to test.
groupes (list) – The list of the two groupes in the study.
contrast (list of int.) – A list of int, to precise the contrast between the two group. If contrast is set to the vector [1.0, -1.0], the computed t values is base on the contrast groupes[0] - groupes[1], and if contrast is [-1.0, 1.0], it’s the contrary.

Returns

output – A dictionnary containing each kinds as keys, and the t-statistic and p-values as values for each kinds.

Return type

dict

Notes

The p-values is based on a two-sided statistic.

conpagnon.connectivity_statistics.parametric_tests.two_samples_t_test(subjects_connectivity_matrices_dictionnary, groupes, kinds, contrast, preprocessing_method='fisher', alpha=0.05, multicomp_method='fdr_bh')[source]¶

Perform two samples t-test on connectivity matrices to detect group differences in connectivity using different kinds.

The t-test account for discarded rois you might want to exclude in the analysis.

Parameters

subjects_connectivity_matrices_dictionnary (dict) –
A multi-levels dictionnary organised as follow :
- The first keys levels is the different groupes in the study.
- The second keys levels is the subjects IDs
- The third levels is the different kind matrices
for each subjects, a ‘discarded_rois’ key for the discarded rois array index, a ‘masked_array’ key containing the array of Boolean of True for the discarded_rois index, and False elsewhere.
groupes (list) – The list of the two groups to detect group differences.
kinds (list) – The list of metrics you want to perform the group comparison. Choices are : ‘correlation’, ‘covariances’, ‘tangent’, ‘partial correlation’, ‘precision’.
preprocessing_method (string, optional) – The type of preprocessing methods to apply of connectivity coefficients of type ‘correlation’, ‘partial correlation’, ‘covariances’, ‘precision’. Choices are : ‘fisher’.
contrast (list, optional) – The contrast you want to compute in the t-test. Default is [1.0, -1.0] to compute mean(groupes[0]) - mean(groupes[1]). The other contrast is [-1.0, 1.0] for mean(groupes[1]) - mean(groupes[0]).
alpha (float, optional) – The false positive proportion, commonly named the alpha level. Default is 0.05.
multicomp_method (str, optional) – The inference method for accounting the multiple comparison problems. Default is the classic False Discovery Rate (FDR) proposed by Benjamini & Hochberg, see Notes.

Returns

output –

A dictionnary containing multiple keys :

’tstatistic’ : The raw statistic t-map for the chosen contrast

2D numpy array of shape (number of regions, number of regions).

’uncorrected pvalues’ : The raw pvalues, 2D numpy array of shape

(number of regions, number of regions).

’corrected pvalues’ : The corrected pvalues with the chosen methods.

2D numpy array of shape (number of regions, number of regions).

’significant edges’ : The significant t-values after masking

for the non significant pvalues at alpha level. 2D numpy array of shape (number of regions, number of regions).

’significant pvalues’ : The significant pvalues at level alpha.

2D numpy array of shape (number of regions, number of regions)

’significant mean effect’ : The differences of mean connectivity

between the two groups according to the chosen contrast, and non-significant connexion are mask at alpha level. 2D numpy array of shape (number of regions, number of regions).

Return type

dict

:raises ValueError : If the number in groupes is strictly less than 2 raise: :raises a ValueError, print a warning and take the two first groupes in the list: :raises else.: :raises ValueError : If the contrast is an unrecognized contrast is entered.:

conpagnon.connectivity_statistics.regression_analysis_model module¶

Modules to perform connectivity analysis at the network level

Author: Dhaif BEKHA.

conpagnon.connectivity_statistics.regression_analysis_model.joint_models_correction(root_analysis_directory, kinds, models, correction_methods, networks=None, alpha=0.05)[source]¶: Performs a joint models correction for the whole brains models, or the networks models.

conpagnon.connectivity_statistics.regression_analysis_model.one_way_anova(models, groups, behavioral_dataframe, kinds, correction_method, root_analysis_directory, variables_in_model, alpha=0.05)[source]¶

Perform one way ANOVA analysis followed by a post hoc analysis with t test, and multiple comparison correction.

Parameters

models (list) – List of models. The list model is defined as the string in the filename containing the raw data for each groups.
groups (list) – The list of group in the study.
behavioral_dataframe (pandas.DataFrame) – The dataframe containing the variable to study in the ANOVA analysis. The dataframe must contain a column named “subjects” containing the identifier for each subjects.
kinds (list) – The list of the different connectivity metrics you want to perform the analysis.
correction_method (list) – The list of multiple comparison correction method, as available in the statsmodels library.
root_analysis_directory (string) – The full path to the directory containing the raw data.
variables_in_model (list) – A list containing the categorical variable to study.
alpha (float, optional) – The type I error threshold. The default is 0.05.

Notes

If models in a list containing more than one elements, the p values are jointly corrected for all models.

conpagnon.connectivity_statistics.regression_analysis_model.one_way_anova_network(root_analysis_directory, kinds, groups, networks_list, models, behavioral_dataframe, variables_in_model, correction_method, alpha=0.05)[source]¶

Perform a one way ANOVA at the network level.

Parameters

models (list) – List of models. The list model is defined as the string in the filename containing the raw data for each groups.
groups (list) – The list of group in the study.
networks_list (list) – The list of network to include in the analysis
behavioral_dataframe (pandas.DataFrame) – The dataframe containing the variable to study in the ANOVA analysis. The dataframe must contain a column named “subjects” containing the identifier for each subjects.
kinds (list) – The list of the different connectivity metrics you want to perform the analysis.
correction_method (list) – The list of multiple comparison correction method, as available in the statsmodels library.
root_analysis_directory (string) – The full path to the directory containing the raw data.
variables_in_model (list) – A list containing the categorical variable to study.
alpha (float, optional) – The type I error threshold. The default is 0.05.

Notes

If models and network_list is a list containing more than one elements, the p values are jointly corrected for the number of models * number of networks.

conpagnon.connectivity_statistics.regression_analysis_model.regression_analysis_internetwork_level(internetwork_subjects_connectivity_dictionary, groups_in_model, behavioral_data_path, sheet_name, subjects_to_drop, model_formula, kinds_to_model, root_analysis_directory, inter_network_model, network_labels_list, network_labels_colors, pvals_correction_method=['fdr_bh'], vectorize=True, discard_diagonal=False, nperms_maxT=10000, contrasts='Id', compute_pvalues='True', pvalues_tail='True', NA_action='drop', alpha=0.05)[source]¶

conpagnon.connectivity_statistics.regression_analysis_model.regression_analysis_network_level(groups, kinds, networks_list, root_analysis_directory, network_model, variables_in_model, behavioral_dataframe, correction_method=['fdr_bh'], alpha=0.05, two_tailed=True, n_permutations=10000)[source]¶: Regress a linear model at the network level

conpagnon.connectivity_statistics.regression_analysis_model.regression_analysis_network_level_v2(groups, kinds, networks_list, root_analysis_directory, network_model, variables_in_model, score_of_interest, behavioral_dataframe, correction_method=['fdr_bh'], alpha=0.05)[source]¶

Perform linear regression between a behavioral score and a functional connectivity “score”. The connectivity score is usually a simple mean between the connectivity coefficient of a single network. ConPagnon has some function to compute those kind of score from a atlas, where at least a few network are identified from the user.

Parameters

groups (list) – The list of string of the name of the group involved in the analysis. Usually, it’s simply the entries of the subjects connectivity dictionary.
kinds (list) – Repeat the statistical analysis for the all the connectivity metrics present in the list.
networks_list (list) – Repeat the statistical analysis for all the network in the list. The network name should match the keys of the network dictionary containing the connectivity score.
root_analysis_directory (str) – The full path to the text file containing all the text data to read and feed to the linear model.
network_model (list) – A list containing the name of the model, matching the prefix of the corresponding text data to read.
variables_in_model (list) – A list of the variables to put in the linear model. The variables name should match the columns name present in the text data.
score_of_interest (str) – The behavioral score to analyze, it’s simply the Y variable in the classical linear model: Y = X*Beta
behavioral_dataframe (pandas.DataFrame) – The dataframe containing all the behavioral variables for each subjects.
correction_method (list, optional) – A list containing all the desired correction method. Be careful, we do not stack all the models before the correction. By default, it’s the Bonferonni method.
alpha (float, optional) – The type I error rate, set to 0.05 by default.

Module contents¶

Created on Fri Oct 6 17:03:39 2017

@author: db242421