conpagnon.data_handling package¶
Submodules¶
conpagnon.data_handling.atlas module¶
-
class
conpagnon.data_handling.atlas.
Atlas
(path, name)[source]¶ Bases:
object
A class Atlas for computing useful information when dealing with atlases.
-
loadAtlas
()¶ Load the atlas images and return a 4D numpy array.
-
GetRegionNumbers
()¶ Return the numbers of regions in the Atlas.
-
GetCenterOfMass
()¶ Return tha array of coordinates of center of mass to each atlas regions.
-
UserLabelsColors
()¶ Generate an array of users defined colors to the labels for display purpose.
-
GetLabels
(labelsFile, colname='labels')[source]¶ Read the labels text file of the atlas
- Parameters
labelsFile (str) – The full path to the label file of the atlas. Supported extension are : .csv, .txt, .xlsx or .xls. By default, the header of the labels file is the column labels name entitled ‘labels’.
colname (str, optional) – The columns name containing the labels. Default is labels. If no header, leave None.
- Returns
output – The list of the labels.
- Return type
list
-
get_center_of_mass
(asanarray=False)[source]¶ Compute centers of mass of the different atlas regions.
- Parameters
asanarray (bool, optional) – If True, then the array are return if numpy.array of shape (number of regions, 3).
- Returns
output – The coordinates of the centers of mass for each regions of the atlas.
- Return type
list or numpy.array
-
user_labels_colors
(networks, colors)[source]¶ Generates user defined labels colors for each label of the atlas.
- Parameters
networks (list) – The list containing the numbers of regions for each networks.
colors (list) – The list of colors for each networks.
- Returns
output – The array containing the numerical values for the RGB space for the given colors entered in colors. For the normalized RGB space, you just have divided the values of this array by 255.
- Return type
numpy.array, shape(number of regions, 3)
References
Please find all possible colors at [1] https://matplotlib.org/examples/color/named_colors.html
-
-
conpagnon.data_handling.atlas.
fetch_atlas
(atlas_folder, atlas_name, colors_labels='auto', network_regions_number='auto', labels='auto', normalize_colors=False)[source]¶ Return important information from an atlas file.
- Parameters
atlas_folder (str) – The full path to the directory containing the atlas
atlas_name (str) – The filename of the atlas file.
colors_labels (str, or list. Optional) – If set to ‘auto’, the labels of the ROI name will get a random colors. Else, if a list of colors is provided, ROIs belonging to a network will get the desired colors. The colors should be in the same order of the network in the atlas file. The length of the list should match the number of network in the atlas.
network_regions_number (list, Optional.) – If set to ‘auto’, random colors will be chosen. If a list of the number of regions in each network is provided, the corresponding color, in the color list will be applied to the corresponding number in the list.
labels (list, optional) – The list of the ROI labels. If not provided, the ROI name is simply it’s position in the atlas file.
normalize_colors (bool, optional) – If True, all triplets in the RGB space are divided by the maximum 255.
- Returns
output 1 (numpy.array) – The coordinates of the center of mass of each ROI in the atlas. An array of shape (n_rois, 3).
output 2 (list) – The name of each ROIs in the atlas. A list of length (n_rois, ).
output 3 (numpy.array) – The array containing the colors in the RGB space of each ROIs. An Array of shape (n_rois, 3).
output_4 (int) – The number of ROIs in the atlas.
-
conpagnon.data_handling.atlas.
fetch_atlas_functional_network
(atlas_excel_file, sheetname, network_column_name)[source]¶ - Return a dictionary containing information for all functional networks.
Information for all network is simply fetch in the excel file.
- Parameters
atlas_excel_file (str) – The full path to the excel file containing all the information on your atlas.
sheetname (str) – The active sheet name in the atlas excel file.
network_column_name (str) – The name of columns containing the label for all the functional networks.
- Returns
output – A dictionnary with the networks name as keys, and the sub-dataframe and the number of roi for each networks as values.
- Return type
dict
-
conpagnon.data_handling.atlas.
generate_3d_img_network
(reference_4datlas, atlas_information_xlsx_file, network_column_name, sheetname, atlas4d_index_keys, atlas3d_label_key, save_network_img_directory)[source]¶ This function generate a 3D NifTi file for each defined functional network in a 4D atlas.
conpagnon.data_handling.data_architecture module¶
Created on Mon Sep 18 17:32:38 2017
@author: Dhaif BEKHA
ComPagnon version 2.0
-
conpagnon.data_handling.data_architecture.
create_group_dictionnary
(subjects_id_data_path, root_fmri_data_directory, groupes)[source]¶ Initialise a dictionnary containing groups as keys and subjects IDs as values
- Parameters
subjects_id_data_path (str) – The full path to the data file containing the subjects IDs.
root_fmri_data_directory (str) – The full path of a root directory containing one or numerous sub-directories where functional images are
groupes (list) – List of sub-directories names containing fmri files you want.
- Returns
output – A dictionnary groupes as keys and subjects IDs as values for each groups
- Return type
dict
See also
list_fmri_data()
Fetch functional images in all sub-directories.
read_text_data_file()
read a text data file.
Notes
Whatever the format for the subjects IDs datafile, it should not contains any header. it should consist of one raw columns of subject IDs.
-
conpagnon.data_handling.data_architecture.
fetch_data
(subjects_id_data_path, root_fmri_data_directory, groupes, individual_confounds_directory=None)[source]¶ Fetch a complete organised structure for a groups study on a common atlas for all subjects
- Parameters
subjects_id_data_path (str) – The full path to the data file containing the subjects IDs.
root_fmri_data_directory (str) – The full path of a root directory containing one or numerous sub-directories where functional images are
groupes (list) – List of sub-directories names containing fmri files you want.
individual_confounds_directory (None or str) – Full path to confounds files for all subjects.
- Returns
output – A multi level dictionnary containing all the data. The first level is the groups keys. The second levels is the subjects IDs. The last level, is all the relevant file for one subjects: fmri image, and confound file if required.
- Return type
dict
Notes
Whatever the format for the subjects IDs datafile, it should not contains any header. it should consist of one raw columns of subject IDs.
-
conpagnon.data_handling.data_architecture.
fetch_data_with_individual_atlases
(subjects_id_data_path, root_fmri_data_directory, groupes, individual_atlases_directory, individual_atlases_labels_directory, individual_atlas_file_extension, individual_atlas_labels_extension, individual_counfounds_directory=None)[source]¶ Fetch a complete organised structure for a groups study require the use of individual atlases
- Parameters
subjects_id_data_path (str) – The full path to the data file containing the subjects IDs.
root_fmri_data_directory (str) – The full path of a root directory containing one or numerous sub-directories where functional images are
groupes (list) – List of sub-directories names containing fmri files you want.
individual_atlases_directory (str) – Full path to individual atlases directory for all subjects.
individual_atlases_labels_directory (str) – Full path to individual atlases labels directory for all subjects.
individual_atlas_file_extension (str) – Extension of individuals atlases images for all subjects.
individual_atlas_labels_extension (str) – Extension of text data containing individual atlases labels file for all subjects.
individual_counfounds_directory (None or str) – Full path to counfounds files for all subjects.
- Returns
output – A multi level dictionnary containing all the data. The first level is the groups keys. The second levels is the subjects IDs. The last level, is all the relevant file for one subjects: fmri image, subject atlas, subject atlas labels file, and confound file if required. A keys called ‘discarded_rois’ contain the excluded rois, see Notes.
- Return type
dict
Notes
Whatever the format for the subjects IDs datafile, it should not contains any header. it should consist of one raw columns of subject IDs.
A discarded_rois is a subject atlas ROI where the corresponding labels is ‘void’. This ROIS can be a empty ROIs, a ROI you doesnt need for the analysis. The discarded rois will be discarded (!) in the connectivity analysis when computing t-test for example.
-
conpagnon.data_handling.data_architecture.
fetch_fmri_data
(root_fmri_data_directory, groupes)[source]¶ Fetch functional images found in a list of sub-directories.
- Parameters
root_fmri_data_directory (str) – The full path of a root directory containing one or numerous sub-directories where functional images are.
groupes (list) – List of sub-directories names containing fmri files you want.
- Returns
output – A dictionnary with sub-directories names as keys and full path to functional images as values.
- Return type
dict
See also
list_fmri_data()
Fetch functional images in all sub-directories.
-
conpagnon.data_handling.data_architecture.
list_fmri_data
(root_fmri_data_directory)[source]¶ Fetch all functional images found in sub-directories at a root directory.
- Parameters
root_fmri_data_directory (str) – The full path of a root directory containing one or numerous sub-directories where functional images are.
- Returns
output – A dictionnary with sub-directories names as keys and full path to functional images as values.
- Return type
dict
-
conpagnon.data_handling.data_architecture.
read_text_data_file
(file_path, colname=None, header=None)[source]¶ Read a data file
The data file can be a .csv, .txt, .xlsx or .xls file
- Parameters
file_path (str) – Full path to the file to read.
colname (None or str) – The column name to extract.
header (None of int) – Row number to use as the column names, and the start of the data.
- Returns
output – The extracted column if form of a panda dataframe.
- Return type
pandas.core.frame.DataFrame
conpagnon.data_handling.data_management module¶
-
conpagnon.data_handling.data_management.
concatenate_dataframes
(list_of_dataframes, axis=0)[source]¶ Concatenate a list of pandas DataFrame
-
conpagnon.data_handling.data_management.
csv_from_dictionary
(subjects_dictionary, groupes, kinds, field_to_write, header, csv_filename, output_directory, delimiter=',')[source]¶ Write a csv file from a subjects dictionary.
- Parameters
subjects_dictionary (dict) – A dictionnary with the same structure as a subjects connectivity matrices dictionary
groupes (list) – The list of groups to write
kinds (list) – The list of kind to write
field_to_write (str) – The field containing the value to write for each subject.
header (list) – The header of the CSV file, in a list of column name
csv_filename (str) – The end of CSV filename with the extension
output_directory (str) – The full path to a directory for saving the CSV file.
delimiter (str, optional) – The delimiter between columns. Default is a comma.
-
conpagnon.data_handling.data_management.
csv_from_intra_network_dictionary
(subjects_dictionary, groupes, kinds, network_labels_list, field_to_write, output_directory, csv_prefix, delimiter=',')[source]¶ Write csv file from the intra-network connectivity dictionary structure.
-
conpagnon.data_handling.data_management.
dataframe_to_csv
(dataframe, path, delimiter=',', index=False)[source]¶ Create and write a CSV file from a DataFrame
-
conpagnon.data_handling.data_management.
dictionary_to_csv
(dictionary, output_dir, output_filename)[source]¶ Write dictionary couple (key, value) in a CSV file
-
conpagnon.data_handling.data_management.
flatten
(values)[source]¶ Flatten a list of numpy ND-array
- Parameters
values (list) – A list of numpy array, with same or different dimensions.
- Returns
output – A flat array (one dimensional array) containing all the values in the same order of the list of array.
- Return type
numpy.array
-
conpagnon.data_handling.data_management.
group_by_factors
(dataframe, list_of_factors, return_type='list_of_dataframe')[source]¶ Group by factors present in a dataframe
- Parameters
dataframe (pandas.DataFrame) – A pandas dataframe.
list_of_factors (list) – The list of factors, i.e columns name in the dataframe, you want to group by.
return_type (str) – The output format, choices are list_of_dataframe or dictionary. If the former, a list of dataframe is returned of length equal to the number of groups, if the latter a dictionary with groups name as keys and corresponding dataframe as values is returned. Default is list_of_dataframe.
- Returns
A list or dictionary of the corresponding dataframe group by attribute.
- Return type
output
-
conpagnon.data_handling.data_management.
merge_by_index
(dataframe1, dataframe2, left_index=True, right_index=True)[source]¶ Merge two dataframes based on the index concordances
- Parameters
dataframe1 (pandas.DataFrame) – A panda dataframe
dataframe2 (pandas.DataFrame) – A panda dataframe
left_index (bool, optional) – If True, the merge operation is based on the left index
right_index (bool, optional) – If True, the merge operation is based on the right index
- Returns
output – The merged dataframe.
- Return type
pandas.DataFrame
Notes
If left_index and right_index are both True the merge is based on the intersection of both dataframe, i.e a missing index in one of the dataframe will be deleted in the final dataframe.
-
conpagnon.data_handling.data_management.
merge_list_dataframes
(list_dataframes)[source]¶ Merge a list of dataframes
-
conpagnon.data_handling.data_management.
read_csv
(csv_file, delimiter=',')[source]¶ Read a CSV file and return a panda.DataFrame
- Parameters
csv_file (str) – The full path to the CSV file to read
delimiter (str) – The separator use in the CSV file
-
conpagnon.data_handling.data_management.
read_excel_file
(excel_file_path, sheetname, subjects_column_name)[source]¶ Read a excel document
- Parameters
excel_file_path (str) – Full path to the excel document
sheetname (str) – The sheetname to read in the excel document
subjects_column_name (str) – The column name containing the subjects identifiers.
- Returns
output – A panda DataFrame, indexed by subject name.
- Return type
pandas.DataFrame
-
conpagnon.data_handling.data_management.
remove_duplicate
(seq)[source]¶ Remove duplicate in a sequence of items while keeping the order.
-
conpagnon.data_handling.data_management.
shift_index_column
(panda_dataframe, columns_to_index)[source]¶ Shift the index column of a pandas DataFrame
- Parameters
panda_dataframe (pandas.DataFrame) – A pandas dataframe.
columns_to_index (list) – Column label or list of column labels / arrays
- Returns
output – A new pandas DataFrame with the shifted columns as index.
- Return type
pandas.DataFrame
-
conpagnon.data_handling.data_management.
unflatten
(flat_values, prototype)[source]¶ Unflatten a one dimension array of values to the original list of array.
- Parameters
flat_values (numpy.ndarray) – The numpy array containing the values.
prototype (list) – The original list of numpy array.
- Returns
output – A list of array with the same structure as prototype.
- Return type
list
-
conpagnon.data_handling.data_management.
write_ols_results
(ols_fit, design_matrix, response_variable, output_dir, model_name, design_matrix_index_name=None)[source]¶ Write OLS result, along with the design matrix and the variable to explain.
conpagnon.data_handling.dictionary_operations module¶
-
conpagnon.data_handling.dictionary_operations.
groupby_factor_connectivity_matrices
(population_data_file, sheetname, subjects_connectivity_matrices_dictionnary, groupes, factors, drop_subjects_list=None, index_col=0)[source]¶ Group by attribute the subjects connectivity matrices. # TODO: 18/09/2019: I added index_col to precise the index of the column # TODO: to be considered as the index of the whole dataframe. # TODO: Side Note: this function work with a time series dictionary too. !! # TODO: Refractoring of subjects_connectivity_matrices_dictionary to subjects_dictionary.
-
conpagnon.data_handling.dictionary_operations.
merge_dictionary
(dict_list, new_key=None)[source]¶ Merge a list of dictionary
- Parameters
new_key (str, optional) – The key of the new merged dictionary. If None, the dictionaries in the list are simply merged together. Default is None
dict_list (list) – A list of the dictionary to be merged
- Returns
output – A dictionnary with one key, and merged dictionary as value.
- Return type
dict
Notes
Note that all the dictionnary you want to merge must have different keys.
-
conpagnon.data_handling.dictionary_operations.
random_draw_of_connectivity_matrices
(subjects_connectivity_dictionary, groupe, n_matrices, subjects_id_list=None, random_state=None, extract_kwargs=None)[source]¶ Randomly pick N connectivity matrices from a subjects connectivity dictionary.
- Parameters
subjects_connectivity_dictionary (dict) – The subjects dictionary containing connectivity matrices
groupe (str) – The group in which you want pick the matrices
n_matrices (int) – The number of connectivity matrices you want to randomly choose
subjects_id_list (list, optional) – The subjects identifiers list in which you want to choose matrices. If None, random matrices are picked in the entire group. Default is None.
random_state (int, optional) – The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
extract_kwargs (dict, optional) – A dictionnary of argument passed to extract_sub_connectivity_matrices function. Default is None
- Returns
output 1 (dict) – The connectivity matrices dictionary, with subjects chosen randomly.
output 2 (list) – The list of randomly chosen subjects identifier.
-
conpagnon.data_handling.dictionary_operations.
rebuild_subject_connectivity_matrices
(subjects_connectivity_dictionary, groupes, kinds, diagonal_were_kept=False)[source]¶ Given the subject connectivity dictionary, the matrix are rebuild from the vectorized one.
- Parameters
subjects_connectivity_dictionary (dict) – The subjects connectivity dictionary
groupes (list) – The list of groups to rebuild the subjects matrices.
kinds (list) – The list of kinds to rebuild.
diagonal_were_kept (bool, optional) – If True, the reconstructed matrix, will have the diagonal store in the kind diagonal field of the dictionary, and the mask diagonal field for the mask. If False, the reconstructed matrix will have a zeros diagonal, and a True diagonal for the mask.
- Returns
output 1 – The reconstructed subjects connectivity matrices. All the matrices have now shape (number_of_regions, number_of_regions).
- Return type
dict
Notes
If in the input dictionary, the matrices and corresponding mask where vectorized with the diagonal kept, the argument diagonal_is_there must be set to False. A dimension error will be raises otherwise.
-
conpagnon.data_handling.dictionary_operations.
stack_subjects_connectivity_matrices
(subjects_connectivity_dictionary, groupes, kinds)[source]¶ Re-arrange the subjects connectivity dictionary to return a stack version per group and kind.
- Parameters
subjects_connectivity_dictionary –
groupes –
kinds –
- Returns
Module contents¶
Created on Mon Sep 18 16:38:48 2017
@author: db242421
Le module data_handling fournit des fonctions commode pour stocker les differentes informations relatives au fichiers utiles (irmf, atlas individuels…) dans une structure facilement utilisable de dictionnaire.