conpagnon.machine_learning package

Submodules

conpagnon.machine_learning.CPM_method module

This module is designed to study the link between functional connectivity and behaviour. The algorithm named connectome predictive modelling is adapted from [1].

1

Using connectome-based predictive modeling to

predict individual behavior from brain connectivity, Shen et al.

author: Dhaif BEKHA.

conpagnon.machine_learning.CPM_method.compute_summary_subjects_summary_values(training_connectivity_matrices, significance_selection_threshold, R_mat, P_mat)[source]
conpagnon.machine_learning.CPM_method.fit_model_on_training_set(negative_edges_summary_values, positive_edges_summary_values, training_set_behavioral_score, add_predictive_variables=None)[source]

Fit a linear model on the training set for positive and negative model, with behavioral score as variable.

Parameters
  • negative_edges_summary_values (numpy.array, shape(n_subjects_in_training_set, )) – The sum of predictors for each subject in the training set having a negative correlation with the behavioral scores.

  • positive_edges_summary_values (numpy.array, shape(n_subjects_in_training_set, )) – The sum of predictors for each subject in the training set having a positive correlation with the behavioral scores.

  • training_set_behavioral_score (numpy.array, shape (n_subjects_in_training_set, 1)) – The behavioral scores of the training set.

  • add_predictive_variables (pandas.DataFrame, optional) – If not None, a pandas DataFrame of shape (n_subjects_in_training_set, n_variables) should be given. The variables wil be used in the predictive model.

conpagnon.machine_learning.CPM_method.predict_behavior(vectorized_connectivity_matrices, behavioral_scores, selection_predictor_method='correlation', significance_selection_threshold=0.01, confounding_variables_matrix=None, add_predictive_variables=None, verbose=0)[source]

Predict behavior from matrix connectivity with a simple linear model.

Parameters
  • vectorized_connectivity_matrices (numpy.array, shape (n_sample, n_features)) – The subjects connectivity matrices in a vectorized form, i.e, you must give a array of shape (n_subjects, 0.5*n_features*(n_features-1)).

  • behavioral_scores (pandas.DataFrame or numpy.array of shape(n_subjects, 1)) – The behavioral scores in a pandas DataFrame or numpy array form. Off course the behavioral scores should ordered in the same order of subjects matrices.

  • selection_predictor_method (str, optional) – The selection method of predictors. When relate behavior and functional connectivity multiple choices are possible: simple correlation, partial correlation or linear_model. Default is correlation. Note that partial correlation or linear model should give the same results.

  • significance_selection_threshold (float, optional) – The threshold level for the p-value resulting of the selection of predictors step. Default is 0.01.

  • confounding_variables_matrix (pandas.DataFrame of shape (n_subjects, n_variables), optional) – A dataframe of shape (n_subjects, n_variables) containing the confounding/controlling variable which might be used in the selection predictors step when selection method is partial correlation/linear regression. Defaults is None.

  • add_predictive_variables (pandas.DataFrame shape (n_subjects_in_training_set, n_variables) or None, optional) – If not None, additional variables will be fitted in the predictive model, besides negative and positive summary features.

  • verbose (int, optional) – If verbose equal to 0 nothing is printed.

Returns

  • output 1 (float) – The correlation coefficient for the positive features model.

  • output 2 (float) – The correlation coefficient for the negative features model.

conpagnon.machine_learning.CPM_method.predictor_selection_pcorrelation(training_connectivity_matrices, training_set_behavioral_scores, training_set_confounding_variables)[source]
conpagnon.machine_learning.CPM_method.predictors_selection_correlation(training_connectivity_matrices, training_set_behavioral_scores)[source]
conpagnon.machine_learning.CPM_method.predictors_selection_linear_model(training_connectivity_matrices, training_confound_variable_matrix, training_set_behavioral_score)[source]

Relate each edges of subjects connectivity matrices in the training set with a behavioral scores using a linear model

conpagnon.machine_learning.classification module

Created on Fri Oct 6 13:40:49 2017

@author: db242421 (dhaif.bekha@cea.fr)

conpagnon.machine_learning.classification.two_groups_classification(pooled_groups_connectivity_matrices, kinds, labels, n_splits, test_size, train_size, C=1.0, dual=True, fit_intercept=True, loss='squared_hinge', max_iter=1000, penalty='l2', random_state=0, scoring='accuracy')[source]

Perform binary classification on two classes using connectivity coefficient as features.

Perform classification on two labels classes using a support vector machine with a linear kernel function.

Parameters
  • pooled_groups_connectivity_matrices (dict) –

    A multi-levels dictionnary organised as follow :
    • The first key is the different groupes to compare.

    • The second key is the different kinds you want to test

    for classification. The values are simply the stacked connectivity matrices as a ndarray of shape (…, number of regions, number of regions)

  • kinds (list) – The different kind you want to test for classification.

  • labels (numpy.array of shape (number of subjects)) – A one-dimensional numpy array of binary labels like 0 for the first class and one for the second class.

  • n_splits (int) – The number of splitting operations performed by the cross validation scheme, StratifiedShuffleSplit.

  • test_size (float) – Between 0 and 1, the proportion of the testing set. This is the complementary of the train_size : test_size = 1 - train_size.

  • train_size (float) – Between 0 and 1, the proportion of the training set. This is the complementary of the test_size : train_size = 1 - test_size.

  • C (float, optional) – Penalty parameter of the error term. Default is 1.0.

  • dual (bool, optional) – Algoritm to solve the dual or primal formulation of the problems. When n_samples < n_features, prefer dual = True, and False otherwise.

  • fit_intercept (bool, optional) – Fit an intercept in the model. If false, the data assumed to be centered. Default is True

  • loss (str, optional) – The loss function you want to use. Choices are : ‘hinge’ or ‘squared_hinge’. Default is ‘squared_hinge’.

  • max_iter (int, optional) – The maximum iteration you want to run Default is 1000.

  • penalty (str, optional) – The type of penalty term you want to add in the model. Choices are the L1 norm ‘l1’ or L2 norm ‘l2’. Default is ‘l2’.

  • scoring (str, optional) – The maners you want to evaluate the classification by computing a score. Default is ‘accuracy’. See Notes.

Returns

  • output 1 (list) – List of mean scores for each kind.

  • output 2 (dict) – Dictionnary of mean scores with each kind as keys.

:raises ValueError : If the numbers of the different labels is less than two,: :raises raises a ValueError. Binary classification requires at least two different classes.:

Notes

The classifier, and the Cross validation scheme are from the scikit learn library. I encourage the users to consult the scikit learn site cited in the references to further details concerning the way using support vector machines as classifiers, and the different ways of evaluating classification algorithm.

References

The Scikit learn official documentation. [1] http://scikit-learn.org/stable/index.html

conpagnon.machine_learning.cpm_predict_behavior module

conpagnon.machine_learning.cpm_predict_behavior.predict_behavior(vectorized_connectivity_matrices, behavioral_scores, selection_predictor_method='correlation', significance_selection_threshold=0.01, confounding_variables=None, confounding_variables_kwarg=None)[source]

The Connectome Predictive Modelling pipeline. This function select the predictors, train/test a linear model on the selected predictors following a Leave One Out cross validation scheme.

Parameters
  • vectorized_connectivity_matrices (numpy.array of shape (n_subjects, n_features)) – The stack of the vectorized (lower or upper triangle of the connectivity matrices) connectivity matrices. Be careful, the matrices should be stack in the same order as the vector of scores to predict !

  • behavioral_scores (numpy.array of shape (n_subject, 1)) – The vector of scores to predict. The scores should be in the same order as the vectorized connectivity matrices stack.

  • selection_predictor_method (str, optional) – The predictors selection method. By default, a correlation between each connectivity coefficient and scores is computed, and the resulted correlation matrices is threshold at a type I error rate equal to 0.01. Other selection are available: ‘linear_model’, ‘partial correlation’.

  • significance_selection_threshold (float, optional) – The significance threshold during the selection procedure. By default, set to 0.01.

  • confounding_variables (list, optional) – A list of the possible confounding variables you might want to add, during the selection procedure only.

  • confounding_variables_kwarg (dict, optional) – A dictionary with a field called ‘file_path’. This field should contains the full path to a file containing as many columns as confounding variable.

Returns

  • output 1 (float) – The correlation coefficient between the predicted and true scores from the positively correlated set of features.

  • output 2 (float) – he correlation coefficient between the predicted and true scores from the negatively correlated set of features.

conpagnon.machine_learning.features_indentification module

This module enable the identification of discriminative brain connections when performing classification between two groups with connectivity coefficients as features.

The classification is performed with a Support Vector Machine (SVM) algorithm with a linear kernel. The C constant is set to 1.

References

1

Bernard Ng, Gaël Varoquaux, Jean-Baptiste Poline, Michael D. Greicius, Bertrand Thirion, “Transport on Riemannian Manifold for Connectivity-based brain decoding”, IEEE Transactions on Medical Imaging, 2015.

Author: Dhaif BEKHA.

# TODO: code a function permutation_svc to estimate the null distribution without # TODO: bootstrapping, because when sample size is too small, bootstrap with replacement fail.

conpagnon.machine_learning.features_indentification.bootstrap_classification(features, class_labels, boot_indices, C=1)[source]

Perform classification on two binary class for each sample generated by bootstrap (with replacement).

Parameters
  • features (numpy.ndarray, shape (n_samples, n_features)) – The connectivity matrices in a vectorized form, that is each row is a subjects and each column is a pair of regions. Only the lower part of connectivity matrices should be given.

  • class_labels (numpy.ndarray, shape (n_samples, )) – The class labels of each subjects, permuted one time.

  • boot_indices (numpy.ndarray, shape (n_samples, )) – The array containing the indices of bootstrapped subjects.

Returns

output – The weight of the linear SVM estimated on the boostrap sample.

Return type

nunmpy.ndarray, shape (n_features, )

conpagnon.machine_learning.features_indentification.bootstrap_svc(features, class_labels, bootstrap_array_indices, n_cpus_bootstrap=1, verbose=0, backend='multiprocessing', C=1)[source]

Perform classification between two binary class on bootstrapped samples.

Parameters
  • features (numpy.ndarray, shape (n_samples, n_features)) – The connectivity matrices in a vectorized form, that is each row is a subjects and each column is a pair of regions. Only the lower part of connectivity matrices should be given.

  • class_labels (numpy.ndarray, shape (n_sample, )) – The class labels of each subjects.

  • bootstrap_array_indices (numpy.ndarray, shape (n_bootstrap, n_features)) – A array containing the bootstrapped indices. Each row contain the indices to generate a bootstrapped sample.

  • n_cpus_bootstrap (int, optional) – The number CPU to be used concurrently during computation on bootstrap sample. Default is one, like a classical for loop over bootstrap sample.

  • backend (str, optional) – The method used to execute concurrent task. This argument is passed to the Parallel function in the joblib package. Default is multiprocessing.

  • verbose (int, optional) – The verbosity level during parallel computation. This argument is passed to Parallel function in the joblib package.

Returns

output – The array of estimated features weights, for each bootstrapped sample.

Return type

numpy.ndarray, shape (n_bootstrap, n_features)

conpagnon.machine_learning.features_indentification.bootstrap_svc_(vectorized_connectivity_matrices, class_labels, bootstrap_number)[source]

Fit Support Vector Machine with linear kernel on bootstrap sample

conpagnon.machine_learning.features_indentification.compute_weight_distribution(vectorized_connectivity_matrices, bootstrap_number, n_permutations, class_labels, C=1, n_cpus_bootstrap=1, verbose_bootstrap=1, verbose_permutations=1)[source]
conpagnon.machine_learning.features_indentification.discriminative_brain_connection_identification(vectorized_connectivity_matrices, class_labels, class_names, save_directory, n_permutations, bootstrap_number, features_labels, features_colors, n_nodes, atlas_nodes, first_class_mean_matrix, second_class_mean_matrix, top_features_number=100, correction='fdr_bh', alpha=0.05, n_cpus_bootstrap=1, verbose_bootstrap=1, verbose_permutations=1, write_report=True, node_size=15, C=1)[source]

Identify important connection when performing a binary classification task.

conpagnon.machine_learning.features_indentification.features_weights_max_t_correction(null_distribution_features_weights, normalized_mean_weight)[source]

Compute features weight corrected p values with the estimated null distribution of normalized mean features weight.

Parameters
  • null_distribution_features_weights (numpy.ndarray, shape (n_permutations, (n_features*(n_features - 1)/2))) – The estimated null distribution of normalized mean of features weights estimated with class labels permutations, and bootstrap.

  • normalized_mean_weight (numpy.ndarray, shape ( (n_features*(n_features - 1)/2), )) – The normalized mean of features weight estimated on bootstrapped sample.

Returns

output 1

Return type

numpy.ndarray, shape (

conpagnon.machine_learning.features_indentification.features_weights_parametric_correction(null_distribution_features_weights, normalized_mean_weight, method='fdr_bh', alpha=0.05)[source]

Parametric estimation of p-values for each features weight using the estimated null distribution and fitting it’s mean and standard deviation with normal law.

Parameters
  • null_distribution_features_weights (numpy.ndarray of shape (n_permutation, (n_features*(n_features - 1)/2))) – The normalized mean features weights for each permutations.

  • normalized_mean_weight (numpy.ndarray, shape ( (n_features*(n_features - 1)/2), )) – The estimated normalized mean features weights from bootstrapped samples.

  • method (str, optional) – The correction method. There are multiple possible choices, please consults the statsmodels library. Default is the False Discovery Rate correction (FDR).

  • alpha (float, optional) – The type I error rate threshold. Default is 0.05.

Returns

output – The corrected p-values array.

Return type

numpy.ndarray, shape ((n_features*(n_features - 1)/2), )

conpagnon.machine_learning.features_indentification.find_significant_features_indices(p_positive_features_significant, p_negative_features_significant, features_labels)[source]

Return regions indices and corresponding labels for s surviving features after permutation testing, for both negative and positive features weight.

Parameters
  • p_positive_features_significant (numpy.ndarray, shape (n_features, n_features)) – An array containing the weight for a associated significant p-values features for positive weights, and zero elsewhere.

  • p_negative_features_significant (numpy.ndarray, shape (n_features, n_features)) – An array containing the weight for a associated significant p-values features for negative weights, and zero elsewhere.

  • features_labels (numpy.ndarray, shape (n_features, )) – The features labels.

Returns

  • output 1 (numpy.ndarray, shape (n_significant_features, 2)) – The indices array of significant positive weighted features.

  • output 2 (numpy.ndarray, shape (n_significant_features, 2)) – The indices array of significant negative weighted features.

  • output 3 (numpy.ndarray, shape (n_significant_features, 2)) – The labels array of significant positive weighted features.

  • output 4 (numpy.ndarray, shape (n_significant_features, 2)) – The labels array of significant negative weighted features.

conpagnon.machine_learning.features_indentification.find_top_features(normalized_mean_weight_array, labels_regions, top_features_number=50)[source]

Find the top features weight in the normalized mean weight array, and mask the other features weight outside the ranking.

Parameters
  • normalized_mean_weight_array (numpy.ndarray shape (n_features,n_features)) – The array of each normalized mean feature weight, computed after bootstrapping.

  • labels_regions (list) – The list of feature label.

  • top_features_number (int) – The top features number to keep.

Returns

  • output 1 (numpy.ndarray, shape(n_features, n_features)) – The normalized mean weight array containing the top features weight values and zero elsewhere.

  • output 2 (numpy.ndarray, shape(top_features_number + 1, )) – The top features weights.

  • output 3 (numpy.ndarray, shape(top_features_number + 1, 2)) – The indices of the top features in the normalized mean weights array.

  • output 4 (numpy.ndarray, shape(top_features_number + 1, 2)) – The labels of the top features in the normalized mean weights array.

conpagnon.machine_learning.features_indentification.k_largest_index_argsort(arr, k, reverse_order=False)[source]

Returns the k+1 largest element indices in a an array

Parameters
  • arr (numpy.ndarray) – A multi-dimensional array.

  • k (int) – The number of largest elements indices to return.

  • reverse_order (bool) – If True, the indices are returned from the largest to smallest element.

Returns

output – The array of the k+1 largest element indices. The shape is the same of the input array.

Return type

numpy.ndarray

conpagnon.machine_learning.features_indentification.k_smallest_index_argsort(arr, k, reverse_order=False)[source]

Returns the k+1 smallest element indices in a an array

Parameters
  • arr (numpy.ndarray) – A multi-dimensional array.

  • k (int) – The number of smallest elements indices to return.

  • reverse_order (bool) – If True, the indices are returned from the largest to smallest element.

Returns

output – The array of the k+1 smallest element indices. The shape is the same of the input array.

Return type

numpy.ndarray

conpagnon.machine_learning.features_indentification.one_against_all_bootstrap(features, class_labels, bootstrap_array_indices, n_cpus_bootstrap=1, verbose=0, backend='multiprocessing')[source]

Perform classification between two binary class on bootstrapped samples.

Parameters
  • features (numpy.ndarray, shape (n_samples, n_features)) – The connectivity matrices in a vectorized form, that is each row is a subjects and each column is a pair of regions. Only the lower part of connectivity matrices should be given.

  • class_labels (numpy.ndarray, shape (n_sample, )) – The class labels of each subjects.

  • bootstrap_array_indices (numpy.ndarray, shape (n_bootstrap, n_features)) – A array containing the bootstrapped indices. Each row contain the indices to generate a bootstrapped sample.

  • n_cpus_bootstrap (int, optional) – The number CPU to be used concurrently during computation on bootstrap sample. Default is one, like a classical for loop over bootstrap sample.

  • backend (str, optional) – The method used to execute concurrent task. This argument is passed to the Parallel function in the joblib package. Default is multiprocessing.

  • verbose (int, optional) – The verbosity level during parallel computation. This argument is passed to Parallel function in the joblib package.

Returns

output – The array of estimated features weights, for each bootstrapped sample.

Return type

numpy.ndarray, shape (n_bootstrap, n_features)

conpagnon.machine_learning.features_indentification.one_against_all_classification(features, class_labels, boot_indices)[source]

Perform multi-class classification problem on a bootstrapped sample with a one versus all strategy.

Parameters
  • features (numpy.ndarray, shape (n_samples, n_features)) – The connectivity matrices in a vectorized form, that is each row is a subjects and each column is a pair of regions. Only the lower part of connectivity matrices should be given.

  • class_labels (numpy.ndarray, shape (n_samples, )) – The class labels of each subjects, permuted one time.

  • boot_indices (numpy.ndarray, shape (n_samples, )) – The array containing the indices of bootstrapped subjects.

Returns

output – The weight of the linear SVM estimated on the boostrap sample.

Return type

nunmpy.ndarray, shape (n_features, )

conpagnon.machine_learning.features_indentification.one_against_all_permutation_bootstrap(features, class_labels_perm, bootstrap_array_perm, n_classes, n_permutations=1000, n_cpus_bootstrap=1, backend='multiprocessing', verbose_bootstrap=0, verbose_permutation=0)[source]

Perform classification on two binary class for each sample generated by bootstrap (with replacement) and permuted class labels vector.

Parameters
  • features (numpy.ndarray, shape (n_samples, n_features)) – The connectivity matrices in a vectorized form, that is each row is a subjects and each column is a pair of regions. Only the lower part of connectivity matrices should be given.

  • class_labels_perm (numpy.ndarray, shape (n_permutations, n_samples)) – The class labels array: each row contain the subjects labels permuted one time.

  • n_permutations (int, optional) – The number of permutations. Default is 1000.

  • bootstrap_array_perm (,numpy.ndarray, shape (n_permutations, n_bootstrap, n_samples)) – A array which contain a number of bootstrap array indices for each permutations.

  • n_cpus_bootstrap (int, optional) – The number CPU to be used concurrently during computation on bootstrap sample. Default is one, like a classical for loop over bootstrap sample.

  • backend (str, optional) – The method used to execute concurrent task. This argument is passed to the Parallel function in the joblib package. Default is multiprocessing.

  • verbose_bootstrap (int, optional) – The verbosity level during parallel computation. This argument is passed to Parallel function in the joblib package.

  • verbose_permutation (int, optional) – If equal to 1, print the progression of the permutations testing. Default is 1.

Returns

output – The normalized features weights mean, estimated by classification with a linear SVM, over bootstrap sample

Return type

numpy.ndarray, shape (n_features, )

conpagnon.machine_learning.features_indentification.permutation_bootstrap_svc(features, class_labels_perm, bootstrap_array_perm, n_permutations=1000, n_cpus_bootstrap=1, backend='multiprocessing', verbose_bootstrap=0, verbose_permutation=0, C=1)[source]

Perform classification on two binary class for each sample generated by bootstrap (with replacement) and permuted class labels vector.

Parameters
  • features (numpy.ndarray, shape (n_samples, n_features)) – The connectivity matrices in a vectorized form, that is each row is a subjects and each column is a pair of regions. Only the lower part of connectivity matrices should be given.

  • class_labels_perm (numpy.ndarray, shape (n_permutations, n_samples)) – The class labels array: each row contain the subjects labels permuted one time.

  • n_permutations (int, optional) – The number of permutations. Default is 1000.

  • bootstrap_array_perm (,numpy.ndarray, shape (n_permutations, n_bootstrap, n_samples)) – A array which contain a number of bootstrap array indices for each permutations.

  • n_cpus_bootstrap (int, optional) – The number CPU to be used concurrently during computation on bootstrap sample. Default is one, like a classical for loop over bootstrap sample.

  • backend (str, optional) – The method used to execute concurrent task. This argument is passed to the Parallel function in the joblib package. Default is multiprocessing.

  • verbose_bootstrap (int, optional) – The verbosity level during parallel computation. This argument is passed to Parallel function in the joblib package.

  • verbose_permutation (int, optional) – If equal to 1, print the progression of the permutations testing. Default is 1.

Returns

output – The normalized features weights mean, estimated by classification with a linear SVM, over bootstrap sample

Return type

numpy.ndarray, shape (n_features, )

conpagnon.machine_learning.features_indentification.rank_top_features_weight(coefficients_array, top_features_number, features_labels)[source]

Establish a ranking of the most important features from the classifier, based on weights magnitude.

Parameters
  • coefficients_array (numpy.ndarray, shape (n_features, n_features)) – The 2D array containing the features weights to rank.

  • top_features_number (int) – The desired number of top features.

  • features_labels (list) – The features labels list

Returns

  • output 1 (numpy.ndarray, shape (n_top_features + 1, )) – The desired top features weights

  • output 2 (numpy.ndarray, shape (n_top_features +1, 2)) – The positions of each top features weights

  • output 3 (numpy.ndarray, shape (n_top_features +1, 2)) – The roi labels couple corresponding to top features.

conpagnon.machine_learning.features_indentification.remove_reversed_duplicates(iterable)[source]
conpagnon.machine_learning.features_indentification.timer(start, end)[source]

Print measured time between two point in the code

Parameters
  • start (float) – The start of the measure

  • end (float) – The end of the measure

conpagnon.machine_learning.scores_predictions module

Created by db242421 at 14/01/19

conpagnon.machine_learning.scores_predictions.predict_scores(connectivity_matrices, raw_score, alpha=0.01, C=1, alpha_ridge=1, estimator='svr', with_mean=False, with_std=False)[source]
conpagnon.machine_learning.scores_predictions.vcorrcoef(X, y)[source]

Module contents

Created on Fri Oct 6 13:40:24 2017

@author: db242421