Title: | Tools for Assessing Clustering |
---|---|
Description: | A set of tools for evaluating clustering robustness using proportion of ambiguously clustered pairs (Senbabaoglu et al. (2014) <doi:10.1038/srep06207>), as well as similarity across methods and method stability using element-centric clustering comparison (Gates et al. (2019) <doi:10.1038/s41598-019-44892-y>). Additionally, this package enables stability-based parameter assessment for graph-based clustering pipelines typical in single-cell data analysis. |
Authors: | Andi Munteanu [aut, cre], Arash Shahsavari [aut], Rafael Kollyfas [ctb], Miguel Larraz Lopez de Novales [aut], Liviu Ciortuz [ctb], Irina Mohorianu [aut] |
Maintainer: | Andi Munteanu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-02-13 12:40:04 UTC |
Source: | https://github.com/core-bioinformatics/clustassess |
Adds new metadata into the ClustAssess ShinyApp without having to update the object and re-create the app.
add_metadata(app_folder, metadata, qualpalr_colorspace = "pretty")
add_metadata(app_folder, metadata, qualpalr_colorspace = "pretty")
app_folder |
The folder containing the ClustAssess ShinyApp |
metadata |
The new metadata to be added. This parameter should be a dataframe that follows the same row ordering as the already existing metadata from the ClustAssess app. |
qualpalr_colorspace |
The colorspace to be used for the metadata |
NULL - the metadata object is updated in the app folder
Evaluates the stability of different graph clustering methods in the clustering pipeline. The method will iterate through different values of the resolution parameter and compare, using the EC Consistency score, the partitions obtained at different seeds.
assess_clustering_stability( graph_adjacency_matrix, resolution, n_repetitions = 100, seed_sequence = NULL, ecs_thresh = 1, clustering_algorithm = 1:3, clustering_arguments = list(), verbose = TRUE )
assess_clustering_stability( graph_adjacency_matrix, resolution, n_repetitions = 100, seed_sequence = NULL, ecs_thresh = 1, clustering_algorithm = 1:3, clustering_arguments = list(), verbose = TRUE )
graph_adjacency_matrix |
A square adjacency matrix based on which an igraph object will be built. The matrix should have rownames and colnames that correspond to the names of the cells. |
resolution |
A sequence of resolution values. The resolution parameter controls the coarseness of the clustering. The higher the resolution, the more clusters will be obtained. The resolution parameter is used in the community detection algorithms. |
n_repetitions |
The number of repetitions of applying the pipeline with
different seeds; ignored if seed_sequence is provided by the user. Defaults to |
seed_sequence |
A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100. |
ecs_thresh |
The ECS threshold used for merging similar clusterings. |
clustering_algorithm |
An index or a list of indexes indicating which community detection
algorithm will be used: Louvain (1), Louvain refined (2), SLM (3) or Leiden (4).
More details can be found in the Seurat's |
clustering_arguments |
A list of additional arguments that will be passed to the
clustering method. More details can be found in the Seurat's |
verbose |
Boolean value used for displaying the progress bar. |
A list having two fields:
all
- a list that contains, for each clustering method and each resolution
value, the EC consistency between the partitions obtained by changing the seed
filtered
- similar to all
, but for each configuration, we determine the
number of clusters that appears the most and use only the partitions with this
size
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = TRUE ) plot_clustering_overall_stability(clust_diff_obj)
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = TRUE ) plot_clustering_overall_stability(clust_diff_obj)
Evaluate the stability of clusterings obtained based on incremental subsets of a given feature set.
assess_feature_stability( data_matrix, feature_set, steps, feature_type, resolution, n_repetitions = 100, seed_sequence = NULL, graph_reduction_type = "PCA", ecs_thresh = 1, matrix_processing = function(dt_mtx, actual_npcs = 30, ...) { actual_npcs <- min(actual_npcs, ncol(dt_mtx)%/%2) RhpcBLASctl::blas_set_num_threads(foreach::getDoParWorkers()) embedding <- stats::prcomp(x = dt_mtx, rank. = actual_npcs)$x RhpcBLASctl::blas_set_num_threads(1) rownames(embedding) <- rownames(dt_mtx) colnames(embedding) <- paste0("PC_", seq_len(ncol(embedding))) return(embedding) }, umap_arguments = list(), prune_value = -1, clustering_algorithm = 1, clustering_arguments = list(), verbose = FALSE )
assess_feature_stability( data_matrix, feature_set, steps, feature_type, resolution, n_repetitions = 100, seed_sequence = NULL, graph_reduction_type = "PCA", ecs_thresh = 1, matrix_processing = function(dt_mtx, actual_npcs = 30, ...) { actual_npcs <- min(actual_npcs, ncol(dt_mtx)%/%2) RhpcBLASctl::blas_set_num_threads(foreach::getDoParWorkers()) embedding <- stats::prcomp(x = dt_mtx, rank. = actual_npcs)$x RhpcBLASctl::blas_set_num_threads(1) rownames(embedding) <- rownames(dt_mtx) colnames(embedding) <- paste0("PC_", seq_len(ncol(embedding))) return(embedding) }, umap_arguments = list(), prune_value = -1, clustering_algorithm = 1, clustering_arguments = list(), verbose = FALSE )
data_matrix |
A data matrix having the features on the rows and the observations on the columns. |
feature_set |
A set of feature names that can be found on the rownames of the data matrix. |
steps |
Vector containing the sizes of the subsets; negative values will be interpreted as using all features. |
feature_type |
A name associated to the feature_set. |
resolution |
A vector containing the resolution values used for clustering. |
n_repetitions |
The number of repetitions of applying the pipeline with
different seeds; ignored if seed_sequence is provided by the user. Defaults
to |
seed_sequence |
A custom seed sequence; if the value is NULL, the
sequence will be built starting from 1 with a step of 100. Defaults to
|
graph_reduction_type |
The graph reduction type, denoting if the graph
should be built on either the PCA or the UMAP embedding. Defaults to |
ecs_thresh |
The ECS threshold used for merging similar clusterings. We
recommend using the 1 value. Defaults to |
matrix_processing |
A function that will be used to process the data
matrix
by using a dimensionality reduction technique. The function should have
one parameter, the data matrix, and should return an embedding describing the
reduced space. By default, the function will use the precise PCA method with
|
umap_arguments |
A list containing the arguments that will be passed
to the UMAP function. Refer to the |
prune_value |
Argument indicating whether to prune the SNN graph. If the value is 0, the graph won't be pruned. If the value is between 0 and 1, the edges with weight under the pruning value will be removed. If the value is -1, the highest pruning value will be calculated automatically and used. |
clustering_algorithm |
An index indicating which community detection
algorithm will be used: Louvain (1), Louvain refined (2), SLM (3) or
Leiden (4). More details can be found in the Seurat's
|
clustering_arguments |
A list containing the arguments that will be
passed to the community detection algorithm, such as the number of iterations
and the number of starts. Refer to the Seurat's |
verbose |
A boolean indicating if the intermediate progress will be printed or not. |
A list having one field associated with a step value. Each step contains a list with three fields:
ecc - the EC-Consistency of the partitions obtained on all repetitions
embedding - one UMAP embedding generated on the feature subset
most_frequent_partition - the most common partition obtained across repetitions
The algorithm assumes that the feature_set is already sorted when performing the subsetting based on the steps values. For example, if the user wants to analyze highly variable feature set, they should provide them sorted by their variability.
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:200) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_overall_stability_boxplot(feature_stability_result)
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:200) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_overall_stability_boxplot(feature_stability_result)
Evaluates clustering stability when changing the values of different parameters involved in the graph building step, namely the base embedding, the graph type and the number of neighbours.
assess_nn_stability( embedding, n_neigh_sequence, n_repetitions = 100, seed_sequence = NULL, graph_reduction_type = "PCA", ecs_thresh = 1, graph_type = 2, prune_value = -1, clustering_algorithm = 1, clustering_arguments = list(), umap_arguments = list() )
assess_nn_stability( embedding, n_neigh_sequence, n_repetitions = 100, seed_sequence = NULL, graph_reduction_type = "PCA", ecs_thresh = 1, graph_type = 2, prune_value = -1, clustering_algorithm = 1, clustering_arguments = list(), umap_arguments = list() )
embedding |
A matrix associated with a PCA embedding. Embeddings from other dimensionality reduction techniques (such as LSI) can be used. |
n_neigh_sequence |
A sequence of the number of nearest neighbours. |
n_repetitions |
The number of repetitions of applying the pipeline with different seeds; ignored if seed_sequence is provided by the user. |
seed_sequence |
A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100. |
graph_reduction_type |
The graph reduction type, denoting if the graph should be built on either the PCA or the UMAP embedding. |
ecs_thresh |
The ECS threshold used for merging similar clusterings. |
graph_type |
Argument indicating whether the graph should be unweighted (0), weighted (1) or both (2). |
prune_value |
Argument indicating whether to prune the SNN graph. If the value is 0, the graph won't be pruned. If the value is between 0 and 1, the edges with weight under the pruning value will be removed. If the value is -1, the highest pruning value will be calculated automatically and used. |
clustering_algorithm |
An index indicating which community detection algorithm will
be used: Louvain (1), Louvain refined (2), SLM (3) or Leiden (4). More
details can be found in the Seurat's |
clustering_arguments |
A list of arguments that will be passed to the
clustering algorithm. See the |
umap_arguments |
Additional arguments passed to the the |
A list having three fields:
n_neigh_k_corresp
- list containing the number of the clusters obtained by running
the pipeline multiple times with different seed, number of neighbours and graph type (weighted vs unweigted)
n_neigh_ec_consistency
- list containing the EC consistency of the partitions obtained
at multiple runs when changing the number of neighbours or the graph type
n_different_partitions
- the number of different partitions obtained by each
number of neighbours
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PC_", 1:30) nn_stability_obj <- assess_nn_stability( embedding = pca_emb, n_neigh_sequence = c(10, 15, 20), n_repetitions = 10, graph_reduction_type = "PCA", clustering_algorithm = 1 ) plot_n_neigh_ecs(nn_stability_obj)
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PC_", 1:30) nn_stability_obj <- assess_nn_stability( embedding = pca_emb, n_neigh_sequence = c(10, 15, 20), n_repetitions = 10, graph_reduction_type = "PCA", clustering_algorithm = 1 ) plot_n_neigh_ecs(nn_stability_obj)
Evaluates the stability of different graph clustering methods in the clustering pipeline. The method will iterate through different values of the resolution parameter and compare, using the EC Consistency score, the partitions obtained at different seeds.
automatic_stability_assessment( expression_matrix, n_repetitions, n_neigh_sequence, resolution_sequence, features_sets, steps, seed_sequence = NULL, graph_reduction_embedding = "PCA", include_umap_nn_assessment = FALSE, n_top_configs = 3, ranking_criterion = "iqr", overall_summary = "median", ecs_threshold = 1, matrix_processing = function(dt_mtx, actual_npcs = 30, ...) { actual_npcs <- min(actual_npcs, ncol(dt_mtx)%/%2) RhpcBLASctl::blas_set_num_threads(foreach::getDoParWorkers()) embedding <- stats::prcomp(x = dt_mtx, rank. = actual_npcs)$x RhpcBLASctl::blas_set_num_threads(1) rownames(embedding) <- rownames(dt_mtx) colnames(embedding) <- paste0("PC_", seq_len(ncol(embedding))) return(embedding) }, umap_arguments = list(), prune_value = -1, algorithm_dim_reduction = 1, algorithm_graph_construct = 1, algorithms_clustering_assessment = 1:3, clustering_arguments = list(), verbose = TRUE, temp_file = NULL, save_temp = TRUE )
automatic_stability_assessment( expression_matrix, n_repetitions, n_neigh_sequence, resolution_sequence, features_sets, steps, seed_sequence = NULL, graph_reduction_embedding = "PCA", include_umap_nn_assessment = FALSE, n_top_configs = 3, ranking_criterion = "iqr", overall_summary = "median", ecs_threshold = 1, matrix_processing = function(dt_mtx, actual_npcs = 30, ...) { actual_npcs <- min(actual_npcs, ncol(dt_mtx)%/%2) RhpcBLASctl::blas_set_num_threads(foreach::getDoParWorkers()) embedding <- stats::prcomp(x = dt_mtx, rank. = actual_npcs)$x RhpcBLASctl::blas_set_num_threads(1) rownames(embedding) <- rownames(dt_mtx) colnames(embedding) <- paste0("PC_", seq_len(ncol(embedding))) return(embedding) }, umap_arguments = list(), prune_value = -1, algorithm_dim_reduction = 1, algorithm_graph_construct = 1, algorithms_clustering_assessment = 1:3, clustering_arguments = list(), verbose = TRUE, temp_file = NULL, save_temp = TRUE )
expression_matrix |
An expression matrix having the features on the rows and the cells on the columns. |
n_repetitions |
The number of repetitions of applying the pipeline with
different seeds; ignored if seed_sequence is provided by the user. Defaults to |
n_neigh_sequence |
A sequence of the number of nearest neighbours. |
resolution_sequence |
A sequence of resolution values. The resolution parameter controls the coarseness of the clustering. The higher the resolution, the more clusters will be obtained. The resolution parameter is used in the community detection algorithms. |
features_sets |
A list of the feature sets. A feature set is a list of genes from the expression matrix that will be used in the dimensionality reduction. |
steps |
A list with the same names as |
seed_sequence |
A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100. |
graph_reduction_embedding |
The type of dimensionality reduction used for
the graph construction. The options are "PCA" and "UMAP". Defaults to |
include_umap_nn_assessment |
A boolean value indicating if the UMAP embeddings
will be used for the nearest neighbours assessment. Defaults to |
n_top_configs |
The number of top configurations that will be used for the
downstream analysis in the dimensionality reduction step. Defaults to |
ranking_criterion |
The criterion used for ranking the configurations from
the dimensionality reduction step. The options are "iqr", "median", "max", "top_qt",
"top_qt_max", "iqr_median", "iqr_median_coeff" and "mean". Defaults to |
overall_summary |
A function used to summarize the stability of the configurations
from the dimensionality reduction step across the different resolution values.
The options are "median", "max", "top_qt", "top_qt_max", "iqr", "iqr_median",
"iqr_median_coeff" and "mean". Defaults to |
ecs_threshold |
The ECS threshold used for merging similar clusterings. |
matrix_processing |
A function that will be used to process the data matrix
by using a dimensionality reduction technique. The function should have
one parameter, the data matrix, and should return an embedding describing the
reduced space. By default, the function will use the precise PCA method with
|
umap_arguments |
A list containing the arguments that will be passed to the
UMAP function. Refer to the |
prune_value |
Argument indicating whether to prune the SNN graph. If the value is 0, the graph won't be pruned. If the value is between 0 and 1, the edges with weight under the pruning value will be removed. If the value is -1, the highest pruning value will be calculated automatically and used. |
algorithm_dim_reduction |
An index indicating the community detection algorithm that will be used in the Dimensionality reduction step. |
algorithm_graph_construct |
An index indicating the community detection algorithm that will be used in the Graph construction step. |
algorithms_clustering_assessment |
An index indicating which community
detection algorithm will be used for the clustering step: Louvain (1),
Louvain refined (2), SLM (3) or Leiden (4). More details can be found in
the Seurat's |
clustering_arguments |
A list containing the arguments that will be passed to the
community detection algorithm, such as the number of iterations and the number of starts.
Refer to the Seurat's |
verbose |
Boolean value used for displaying the progress of the assessment. |
temp_file |
The path to the file where the object will be saved. |
save_temp |
A boolean value indicating if the object will be saved to a file. |
A list having two fields:
all - a list that contains, for each clustering method and each resolution value, the EC consistency between the partitions obtained by changing the seed
filtered - similar to all
, but for each configuration, we determine the
number of clusters that appears the most and use only the partitions with this
size
## Not run: set.seed(2024) # create an already-transposed artificial expression matrix expr_matrix <- matrix( c(runif(20 * 10), runif(30 * 10, min = 3, max = 4)), nrow = 10, byrow = FALSE ) colnames(expr_matrix) <- as.character(seq_len(ncol(expr_matrix))) rownames(expr_matrix) <- paste("feature", seq_len(nrow(expr_matrix))) autom_object <- automatic_stability_assessment( expression_matrix = expr_matrix, n_repetitions = 3, n_neigh_sequence = c(5), resolution_sequence = c(0.1, 0.5), features_sets = list( "set1" = rownames(expr_matrix) ), steps = list( "set1" = c(5, 7) ), umap_arguments = list( # the following parameters have been modified # from the default values to ensure that # the function will run under 5 seconds n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), n_top_configs = 1, algorithms_clustering_assessment = 1, save_temp = FALSE, verbose = FALSE ) # the object can be further used to plot the assessment results plot_feature_overall_stability_boxplot(autom_object$feature_stability) plot_n_neigh_ecs(autom_object$set1$"5"$nn_stability) plot_k_n_partitions(autom_object$set1$"5"$clustering_stability) ## End(Not run)
## Not run: set.seed(2024) # create an already-transposed artificial expression matrix expr_matrix <- matrix( c(runif(20 * 10), runif(30 * 10, min = 3, max = 4)), nrow = 10, byrow = FALSE ) colnames(expr_matrix) <- as.character(seq_len(ncol(expr_matrix))) rownames(expr_matrix) <- paste("feature", seq_len(nrow(expr_matrix))) autom_object <- automatic_stability_assessment( expression_matrix = expr_matrix, n_repetitions = 3, n_neigh_sequence = c(5), resolution_sequence = c(0.1, 0.5), features_sets = list( "set1" = rownames(expr_matrix) ), steps = list( "set1" = c(5, 7) ), umap_arguments = list( # the following parameters have been modified # from the default values to ensure that # the function will run under 5 seconds n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), n_top_configs = 1, algorithms_clustering_assessment = 1, save_temp = FALSE, verbose = FALSE ) # the object can be further used to plot the assessment results plot_feature_overall_stability_boxplot(autom_object$feature_stability) plot_n_neigh_ecs(autom_object$set1$"5"$nn_stability) plot_k_n_partitions(autom_object$set1$"5"$clustering_stability) ## End(Not run)
Performs the Wilcoxon rank sum test to identify differentially expressed genes between two groups of cells.
calculate_markers( expression_matrix, cells1, cells2, logfc_threshold = 0, min_pct_threshold = 0.1, avg_expr_threshold_group1 = 0, min_diff_pct_threshold = -Inf, rank_matrix = NULL, feature_names = NULL, used_slot = "data", norm_method = "SCT", pseudocount_use = 1, base = 2, adjust_pvals = TRUE, check_cells_set_diff = TRUE )
calculate_markers( expression_matrix, cells1, cells2, logfc_threshold = 0, min_pct_threshold = 0.1, avg_expr_threshold_group1 = 0, min_diff_pct_threshold = -Inf, rank_matrix = NULL, feature_names = NULL, used_slot = "data", norm_method = "SCT", pseudocount_use = 1, base = 2, adjust_pvals = TRUE, check_cells_set_diff = TRUE )
expression_matrix |
A matrix of gene expression values having genes in rows and cells in columns. |
cells1 |
A vector of cell indices for the first group of cells. |
cells2 |
A vector of cell indices for the second group of cells. |
logfc_threshold |
The minimum absolute log fold change to consider a
gene as differentially expressed. Defaults to |
min_pct_threshold |
The minimum fraction of cells expressing a gene
form each cell population to consider the gene as differentially expressed.
Increasing the value will speed up the function. Defaults to |
avg_expr_threshold_group1 |
The minimum average expression that a gene
should have in the first group of cells to be considered as differentially
expressed. Defaults to |
min_diff_pct_threshold |
The minimum difference in the fraction of cells
expressing a gene between the two cell populations to consider the gene as
differentially expressed. Defaults to |
rank_matrix |
A matrix where the cells are ranked based on their
expression levels with respect to each gene. Defaults to |
feature_names |
A vector of gene names. Defaults to |
used_slot |
Parameter that provides additional information about the
expression matrix, whether it was scaled or not. The value of this parameter
impacts the calculation of the fold change. If |
norm_method |
The normalization method used to normalize the expression
matrix. The value of this parameter impacts the calculation of the average
expression of the genes when |
pseudocount_use |
The pseudocount to add to the expression values when
calculating the average expression of the genes, to avoid the 0 value for
the denominator. Defaults to |
base |
The base of the logharithm. Defaults to |
adjust_pvals |
A logical value indicating whether to adjust the p-values
for multiple testing using the Bonferonni method. Defaults to |
check_cells_set_diff |
A logical value indicating whether to check if
thw two cell groups are disjoint or not. Defaults to |
A data frame containing the following columns:
gene
: The gene name.
avg_log2FC
: The average log fold change between the two cell groups.
p_val
: The p-value of the Wilcoxon rank sum test.
p_val_adj
: The adjusted p-value of the Wilcoxon rank sum test.
pct.1
: The fraction of cells expressing the gene in the first cell group.
pct.2
: The fraction of cells expressing the gene in the second cell group.
avg_expr_group1
: The average expression of the gene in the first cell group.
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 50), runif(100 * 50, min = 3, max = 4)), ncol = 200, byrow = FALSE ) colnames(expr_matrix) <- as.character(1:200) rownames(expr_matrix) <- paste("feature", 1:50) calculate_markers( expression_matrix = expr_matrix, cells1 = 101:200, cells2 = 1:100 ) # TODO should be rewritten such that you don't create new matrix objects inside # just
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 50), runif(100 * 50, min = 3, max = 4)), ncol = 200, byrow = FALSE ) colnames(expr_matrix) <- as.character(1:200) rownames(expr_matrix) <- paste("feature", 1:50) calculate_markers( expression_matrix = expr_matrix, cells1 = 101:200, cells2 = 1:100 ) # TODO should be rewritten such that you don't create new matrix objects inside # just
Performs the Wilcoxon rank sum test to identify differentially expressed genes between two groups of cells in the shiny context. The method can be also used outside the shiny context, as long as the expression matrix is stored in a h5 file.
calculate_markers_shiny( cells1, cells2, logfc_threshold = 0, min_pct_threshold = 0.1, average_expression_threshold = 0, average_expression_group1_threshold = 0, min_diff_pct_threshold = -Inf, used_slot = "data", norm_method = "SCT", expression_h5_path = "expression.h5", pseudocount_use = 1, base = 2, verbose = TRUE, check_difference = TRUE )
calculate_markers_shiny( cells1, cells2, logfc_threshold = 0, min_pct_threshold = 0.1, average_expression_threshold = 0, average_expression_group1_threshold = 0, min_diff_pct_threshold = -Inf, used_slot = "data", norm_method = "SCT", expression_h5_path = "expression.h5", pseudocount_use = 1, base = 2, verbose = TRUE, check_difference = TRUE )
cells1 |
A vector of cell indices for the first group of cells. |
cells2 |
A vector of cell indices for the second group of cells. |
logfc_threshold |
The minimum absolute log fold change to consider a
gene as differentially expressed. Defaults to |
min_pct_threshold |
The minimum fraction of cells expressing a gene
form each cell population to consider the gene as differentially expressed.
Increasing the value will speed up the function. Defaults to |
average_expression_threshold |
The minimum average expression that a gene should have in order to be considered as differentially expressed. |
average_expression_group1_threshold |
The minimum average expression
that a gene should have in the first group of cells to be considered as
differentially expressed. Defaults to |
min_diff_pct_threshold |
The minimum difference in the fraction of cells
expressing a gene between the two cell populations to consider the gene as
differentially expressed. Defaults to |
used_slot |
Parameter that provides additional information about the
expression matrix, whether it was scaled or not. The value of this parameter
impacts the calculation of the fold change. If |
norm_method |
The normalization method used to normalize the expression
matrix. The value of this parameter impacts the calculation of the average
expression of the genes when |
expression_h5_path |
The path to the h5 file containing the expression
matrix. The h5 file should contain the following fields: |
pseudocount_use |
The pseudocount to add to the expression values when
calculating the average expression of the genes, to avoid the 0 value for
the denominator. Defaults to |
base |
The base of the logharithm. Defaults to |
verbose |
Whether to print messages about the progress of the function. Defaults to TRUE. |
check_difference |
Whether to perform set difference between the two cells. Defaults to TRUE. |
A data frame containing the following columns:
gene
: The gene name.
avg_log2FC
: The average log fold change between the two cell groups.
p_val
: The p-value of the Wilcoxon rank sum test.
p_val_adj
: The adjusted p-value of the Wilcoxon rank sum test.
pct.1
: The fraction of cells expressing the gene in the first cell group.
pct.2
: The fraction of cells expressing the gene in the second cell group.
avg_expr_group1
: The average expression of the gene in the first cell group.
avg_expr
: The average expression of the gene.
Filter the list of clusters obtained by the automatic ClustAssess pipeline using the ECC and frequency thresholds. The ECC threshold is meant to filter out the partitions that are highly sensitive to the change of the random seed, while the purpose of the frequency threshold is to assure a statistical significance of the inferred stability.
choose_stable_clusters( clusters_list, ecc_threshold = 0.9, freq_threshold = 30, summary_function = mean )
choose_stable_clusters( clusters_list, ecc_threshold = 0.9, freq_threshold = 30, summary_function = mean )
clusters_list |
List of clusters obtained from the
|
ecc_threshold |
Minimum ECC value to consider a cluster as stable. Default is 0.9. |
freq_threshold |
Minimum total frequency of the partitions to consider. Default is 30. |
summary_function |
Function to summarize the ECC values. Default
is |
A list of stable clusters that satisfy the ECC and frequency.
Calculate consensus clustering and proportion of ambiguously clustered pairs (PAC) with hierarchical clustering.
consensus_cluster( x, k_min = 3, k_max = 100, n_reps = 100, p_sample = 0.8, p_feature = 1, p_minkowski = 2, dist_method = "euclidean", linkage = "complete", lower_lim = 0.1, upper_lim = 0.9, verbose = TRUE )
consensus_cluster( x, k_min = 3, k_max = 100, n_reps = 100, p_sample = 0.8, p_feature = 1, p_minkowski = 2, dist_method = "euclidean", linkage = "complete", lower_lim = 0.1, upper_lim = 0.9, verbose = TRUE )
x |
A samples x features normalized data matrix. |
k_min |
The minimum number of clusters calculated. |
k_max |
The maximum number of clusters calculated. |
n_reps |
The total number of subsamplings and reclusterings of the data; this value needs to be high enough to ensure PAC converges; convergence can be assessed with pac_convergence. |
p_sample |
The proportion of samples included in each subsample. |
p_feature |
The proportion of features included in each subsample. |
p_minkowski |
The power of the Minkowski distance. |
dist_method |
The distance measure for the distance matrix used in hclust; must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". |
linkage |
The linkage method used in hclust; must be one of "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median" or "centroid" |
lower_lim |
The lower limit for determining whether a pair is clustered ambiguously; the lower this value, the higher the PAC. |
upper_lim |
The upper limit for determining whether a pair is clustered ambiguously; the higher this value, the higher the PAC. |
verbose |
Logical value used for choosing to display a progress bar or not. |
A data.frame with PAC values across iterations, as well as parameter values used when calling the method.
Monti, S., Tamayo, P., Mesirov, J., & Golub, T. (2003). Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning, 52(1), 91-118. https://doi.org/10.1023/A:1023949509487
Senbabaoglu, Y., Michailidis, G., & Li, J. Z. (2014). Critical limitations of consensus clustering in class discovery. Scientific reports, 4(1), 1-13. https://doi.org/10.1038/srep06207
pac.res <- consensus_cluster(iris[, 1:4], k_max = 20) pac_convergence(pac.res, k_plot = c(3, 5, 7, 9))
pac.res <- consensus_cluster(iris[, 1:4], k_max = 20) pac_convergence(pac.res, k_plot = c(3, 5, 7, 9))
Use a normalized expression matrix and, potentially, an already generated PCA / UMAP embedding, to create a Monocle object.
create_monocle_default( normalized_expression_matrix, count_matrix = NULL, pca_embedding = NULL, umap_embedding = NULL, metadata_df = NULL )
create_monocle_default( normalized_expression_matrix, count_matrix = NULL, pca_embedding = NULL, umap_embedding = NULL, metadata_df = NULL )
normalized_expression_matrix |
The normalized expression matrix having genes on rows and cells on columns. |
count_matrix |
The count matrix having genes on rows and cells on columns. If NULL, the normalized_expression_matrix will be used. |
pca_embedding |
The PCA embedding of the expression matrix. If NULL, the
pca will be created using the |
umap_embedding |
The UMAP embedding of the expression matrix. If NULL, the
umap will be created using the |
metadata_df |
The metadata dataframe having the cell names as rownames.
If NULL, a dataframe with a single column named |
A Monocle object of the expression matrix, having the stable number of clusters identified by ClustAssess.
## Not run: set.seed(2024) # create an already-transposed artificial expression matrix expr_matrix <- matrix( c(runif(20 * 10), runif(30 * 10, min = 3, max = 4)), nrow = 10, byrow = FALSE ) colnames(expr_matrix) <- as.character(seq_len(ncol(expr_matrix))) rownames(expr_matrix) <- paste("feature", seq_len(nrow(expr_matrix))) # uncomment to create the monocle object mon_obj <- create_monocle_default( normalized_expression_matrix = expr_matrix, pca_emb = NULL, umap_emb = NULL, metadata_df = NULL ) ## End(Not run)
## Not run: set.seed(2024) # create an already-transposed artificial expression matrix expr_matrix <- matrix( c(runif(20 * 10), runif(30 * 10, min = 3, max = 4)), nrow = 10, byrow = FALSE ) colnames(expr_matrix) <- as.character(seq_len(ncol(expr_matrix))) rownames(expr_matrix) <- paste("feature", seq_len(nrow(expr_matrix))) # uncomment to create the monocle object mon_obj <- create_monocle_default( normalized_expression_matrix = expr_matrix, pca_emb = NULL, umap_emb = NULL, metadata_df = NULL ) ## End(Not run)
Use the object generated using the ClustAssess
automatic_stability_assessment
function to create a Monocle object
which has the stable number of clusters.
create_monocle_from_clustassess( normalized_expression_matrix, count_matrix = NULL, clustassess_object, metadata_df, stable_feature_type, stable_feature_set_size, stable_clustering_method, stable_n_clusters = NULL, use_all_genes = FALSE )
create_monocle_from_clustassess( normalized_expression_matrix, count_matrix = NULL, clustassess_object, metadata_df, stable_feature_type, stable_feature_set_size, stable_clustering_method, stable_n_clusters = NULL, use_all_genes = FALSE )
normalized_expression_matrix |
The normalized expression matrix having genes on rows and cells on columns. |
count_matrix |
The count matrix having genes on rows and cells on columns. If NULL, the normalized_expression_matrix will be used. |
clustassess_object |
The output of the |
metadata_df |
The metadata dataframe having the cell names as rownames.
If NULL, a dataframe with a single column named |
stable_feature_type |
The feature type which leads to stable clusters. |
stable_feature_set_size |
The feature size which leads to stable clusters. |
stable_clustering_method |
The clustering method which leads to stable clusters. |
stable_n_clusters |
The number of clusters that are stable. If NULL,
all the clusters will be provided. Defaults to |
use_all_genes |
A boolean value indicating if the expression matrix
should be truncated to the genes used in the stability assessment. Defaults
to |
A Monocle object of the expression matrix, having the stable number of clusters identified by ClustAssess.
## Not run: set.seed(2024) # create an already-transposed artificial expression matrix expr_matrix <- matrix( c(runif(20 * 10), runif(30 * 10, min = 3, max = 4)), nrow = 10, byrow = FALSE ) colnames(expr_matrix) <- as.character(seq_len(ncol(expr_matrix))) rownames(expr_matrix) <- paste("feature", seq_len(nrow(expr_matrix))) autom_object <- automatic_stability_assessment( expression_matrix = expr_matrix, n_repetitions = 3, n_neigh_sequence = c(5), resolution_sequence = c(0.1, 0.5), features_sets = list( "set1" = rownames(expr_matrix) ), steps = list( "set1" = c(5, 7) ), umap_arguments = list( # the following parameters have been modified # from the default values to ensure that the function # will run under 5 seconds n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), n_top_configs = 1, algorithms_clustering_assessment = 1, save_temp = FALSE, verbose = FALSE ) # uncomment to create the monocle object # mon_obj <- create_monocle_from_clustassess( # normalized_expression_matrix = expr_matrix, # clustassess_object = autom_object, # metadata = NULL, # stable_feature_type = "set1", # stable_feature_set_size = "5", # stable_clustering_method = "Louvain" # ) ## End(Not run)
## Not run: set.seed(2024) # create an already-transposed artificial expression matrix expr_matrix <- matrix( c(runif(20 * 10), runif(30 * 10, min = 3, max = 4)), nrow = 10, byrow = FALSE ) colnames(expr_matrix) <- as.character(seq_len(ncol(expr_matrix))) rownames(expr_matrix) <- paste("feature", seq_len(nrow(expr_matrix))) autom_object <- automatic_stability_assessment( expression_matrix = expr_matrix, n_repetitions = 3, n_neigh_sequence = c(5), resolution_sequence = c(0.1, 0.5), features_sets = list( "set1" = rownames(expr_matrix) ), steps = list( "set1" = c(5, 7) ), umap_arguments = list( # the following parameters have been modified # from the default values to ensure that the function # will run under 5 seconds n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), n_top_configs = 1, algorithms_clustering_assessment = 1, save_temp = FALSE, verbose = FALSE ) # uncomment to create the monocle object # mon_obj <- create_monocle_from_clustassess( # normalized_expression_matrix = expr_matrix, # clustassess_object = autom_object, # metadata = NULL, # stable_feature_type = "set1", # stable_feature_set_size = "5", # stable_clustering_method = "Louvain" # ) ## End(Not run)
Use the files generated in the ClustAssess app to create a Monocle object which has the stable number of clusters.
create_monocle_from_clustassess_app( app_folder, stable_feature_type, stable_feature_set_size, stable_clustering_method, stable_n_clusters = NULL, use_all_genes = FALSE )
create_monocle_from_clustassess_app( app_folder, stable_feature_type, stable_feature_set_size, stable_clustering_method, stable_n_clusters = NULL, use_all_genes = FALSE )
app_folder |
Path pointing to the folder containing a ClustAssess app. |
stable_feature_type |
The feature type which leads to stable clusters. |
stable_feature_set_size |
The feature size which leads to stable clusters. |
stable_clustering_method |
The clustering method which leads to stable clusters. |
stable_n_clusters |
The number of clusters that are stable. If NULL,
all the clusters will be provided. Defaults to |
use_all_genes |
A boolean value indicating if the expression matrix
should be truncated to the genes used in the stability assessment. Defaults
to |
A Monocle object of the expression matrix, having the stable number of clusters identified by ClustAssess.
Use a normalized expression matrix and, potentially, an already generated PCA / UMAP embedding, to create a Seurat object.
create_seurat_object_default( normalized_expression_matrix, count_matrix = NULL, pca_embedding = NULL, umap_embedding = NULL, metadata_df = NULL )
create_seurat_object_default( normalized_expression_matrix, count_matrix = NULL, pca_embedding = NULL, umap_embedding = NULL, metadata_df = NULL )
normalized_expression_matrix |
The normalized expression matrix having genes on rows and cells on columns. |
count_matrix |
The count matrix having genes on rows and cells on columns. If NULL, the normalized_expression_matrix will be used. |
pca_embedding |
The PCA embedding of the expression matrix. If NULL, the
pca will be created using the |
umap_embedding |
The UMAP embedding of the expression matrix. If NULL, the
umap will be created using the |
metadata_df |
The metadata dataframe having the cell names as rownames.
If NULL, a dataframe with a single column named |
A Seurat object of the expression matrix, having the stable number of clusters identified by ClustAssess.
Use the files generated in the ClustAssess app to create a Seurat object which has the stable number of clusters.
create_seurat_object_from_clustassess_app( app_folder, stable_feature_type, stable_feature_set_size, stable_clustering_method, stable_n_clusters = NULL, use_all_genes = FALSE )
create_seurat_object_from_clustassess_app( app_folder, stable_feature_type, stable_feature_set_size, stable_clustering_method, stable_n_clusters = NULL, use_all_genes = FALSE )
app_folder |
Path pointing to the folder containing a ClustAssess app. |
stable_feature_type |
The feature type which leads to stable clusters. |
stable_feature_set_size |
The feature size which leads to stable clusters. |
stable_clustering_method |
The clustering method which leads to stable clusters. |
stable_n_clusters |
The number of clusters that are stable. If NULL,
all the clusters will be provided. Defaults to |
use_all_genes |
A boolean value indicating if the expression matrix
should be truncated to the genes used in the stability assessment. Defaults
to |
A Seurat object of the expression matrix, having the stable number of clusters identified by ClustAssess.
Inspect how consistently of a set of clusterings agree with a reference clustering by calculating their element-wise average agreement.
element_agreement( reference_clustering, clustering_list, alpha = 0.9, r = 1, rescale_path_type = "max", ppr_implementation = "prpack", dist_rescaled = FALSE, row_normalize = TRUE )
element_agreement( reference_clustering, clustering_list, alpha = 0.9, r = 1, rescale_path_type = "max", ppr_implementation = "prpack", dist_rescaled = FALSE, row_normalize = TRUE )
reference_clustering |
The reference clustering, that each clustering in clustering_list is compared to. It can be either:
|
clustering_list |
The list of clustering results, each of which is either:
|
alpha |
A numeric giving the personalized PageRank damping factor; 1 - alpha is the restart probability for the PPR random walk. |
r |
A numeric hierarchical scaling parameter. |
rescale_path_type |
A string; rescale the hierarchical height by:
|
ppr_implementation |
Choose a implementation for personalized page-rank calculation:
|
dist_rescaled |
A logical: if TRUE, the linkage distances are linearly rescaled to be in-between 0 and 1. |
row_normalize |
Whether to normalize all rows in clustering_result so they sum to one before calculating ECS. It is recommended to set this to TRUE, which will lead to slightly different ECS values compared to clusim. |
A vector containing the element-wise average agreement.
Gates, A. J., Wood, I. B., Hetrick, W. P., & Ahn, Y. Y. (2019). Element-centric clustering comparison unifies overlaps and hierarchy. Scientific reports, 9(1), 1-13. https://doi.org/10.1038/s41598-019-44892-y
# perform k-means clustering across 20 random seeds reference.clustering <- iris$Species clustering.list <- lapply(1:20, function(x) kmeans(iris[, 1:4], centers = 3)$cluster) element_agreement(reference.clustering, clustering.list)
# perform k-means clustering across 20 random seeds reference.clustering <- iris$Species clustering.list <- lapply(1:20, function(x) kmeans(iris[, 1:4], centers = 3)$cluster) element_agreement(reference.clustering, clustering.list)
Inspect the consistency of a set of clusterings by calculating their element-wise clustering consistency (also known as element-wise frustration).
element_consistency( clustering_list, alpha = 0.9, r = 1, rescale_path_type = "max", ppr_implementation = "prpack", dist_rescaled = FALSE, row_normalize = TRUE )
element_consistency( clustering_list, alpha = 0.9, r = 1, rescale_path_type = "max", ppr_implementation = "prpack", dist_rescaled = FALSE, row_normalize = TRUE )
clustering_list |
The list of clustering results, each of which is either:
|
alpha |
A numeric giving the personalized PageRank damping factor; 1 - alpha is the restart probability for the PPR random walk. |
r |
A numeric hierarchical scaling parameter. |
rescale_path_type |
A string; rescale the hierarchical height by:
|
ppr_implementation |
Choose a implementation for personalized page-rank calculation:
|
dist_rescaled |
A logical: if TRUE, the linkage distances are linearly rescaled to be in-between 0 and 1. |
row_normalize |
Whether to normalize all rows in clustering_result so they sum to one before calculating ECS. It is recommended to set this to TRUE, which will lead to slightly different ECS values compared to clusim. |
A vector containing the element-wise consistency. If
calculate_sim_matrix
is set to TRUE
, the element similarity matrix
will be returned as well.
Gates, A. J., Wood, I. B., Hetrick, W. P., & Ahn, Y. Y. (2019). Element-centric clustering comparison unifies overlaps and hierarchy. Scientific reports, 9(1), 1-13. https://doi.org/10.1038/s41598-019-44892-y
# cluster across 20 random seeds clustering.list <- lapply(1:20, function(x) kmeans(mtcars, centers = 3)$cluster) element_consistency(clustering.list)
# cluster across 20 random seeds clustering.list <- lapply(1:20, function(x) kmeans(mtcars, centers = 3)$cluster) element_consistency(clustering.list)
Calculates the average element-centric similarity between two clustering results
element_sim( clustering1, clustering2, alpha = 0.9, r_cl1 = 1, rescale_path_type_cl1 = "max", ppr_implementation_cl1 = "prpack", dist_rescaled_cl1 = FALSE, row_normalize_cl1 = TRUE, r_cl2 = 1, rescale_path_type_cl2 = "max", ppr_implementation_cl2 = "prpack", dist_rescaled_cl2 = FALSE, row_normalize_cl2 = TRUE )
element_sim( clustering1, clustering2, alpha = 0.9, r_cl1 = 1, rescale_path_type_cl1 = "max", ppr_implementation_cl1 = "prpack", dist_rescaled_cl1 = FALSE, row_normalize_cl1 = TRUE, r_cl2 = 1, rescale_path_type_cl2 = "max", ppr_implementation_cl2 = "prpack", dist_rescaled_cl2 = FALSE, row_normalize_cl2 = TRUE )
clustering1 |
The first clustering result, which can be one of:
|
clustering2 |
The second clustering result, which can be one of:
|
alpha |
A numeric giving the personalized PageRank damping factor; 1 - alpha is the restart probability for the PPR random walk. |
r_cl1 |
A numeric hierarchical scaling parameter for the first clustering. |
rescale_path_type_cl1 |
A string; rescale the hierarchical height of the first clustering by:
|
ppr_implementation_cl1 |
Choose a implementation for personalized page-rank calculation for the first clustering:
|
dist_rescaled_cl1 |
A logical: if TRUE, the linkage distances of the first clustering are linearly rescaled to be in-between 0 and 1. |
row_normalize_cl1 |
Whether to normalize all rows in the first clustering so they sum to one before calculating ECS. It is recommended to set this to TRUE, which will lead to slightly different ECS values compared to clusim. |
r_cl2 |
A numeric hierarchical scaling parameter for the second clustering. |
rescale_path_type_cl2 |
A string; rescale the hierarchical height of the second clustering by:
|
ppr_implementation_cl2 |
Choose a implementation for personalized page-rank calculation for the second clustering:
|
dist_rescaled_cl2 |
A logical: if TRUE, the linkage distances of the second clustering are linearly rescaled to be in-between 0 and 1. |
row_normalize_cl2 |
Whether to normalize all rows in the second clustering so they sum to one before calculating ECS. It is recommended to set this to TRUE, which will lead to slightly different ECS values compared to clusim. |
The average element-wise similarity between the two Clusterings.
km.res <- kmeans(mtcars, centers = 3)$cluster hc.res <- hclust(dist(mtcars)) element_sim(km.res, hc.res)
km.res <- kmeans(mtcars, centers = 3)$cluster hc.res <- hclust(dist(mtcars)) element_sim(km.res, hc.res)
Calculates the element-wise element-centric similarity between two clustering results.
element_sim_elscore( clustering1, clustering2, alpha = 0.9, r_cl1 = 1, rescale_path_type_cl1 = "max", ppr_implementation_cl1 = "prpack", dist_rescaled_cl1 = FALSE, row_normalize_cl1 = TRUE, r_cl2 = 1, rescale_path_type_cl2 = "max", ppr_implementation_cl2 = "prpack", dist_rescaled_cl2 = FALSE, row_normalize_cl2 = TRUE )
element_sim_elscore( clustering1, clustering2, alpha = 0.9, r_cl1 = 1, rescale_path_type_cl1 = "max", ppr_implementation_cl1 = "prpack", dist_rescaled_cl1 = FALSE, row_normalize_cl1 = TRUE, r_cl2 = 1, rescale_path_type_cl2 = "max", ppr_implementation_cl2 = "prpack", dist_rescaled_cl2 = FALSE, row_normalize_cl2 = TRUE )
clustering1 |
The first clustering result, which can be one of:
|
clustering2 |
The second clustering result, which can be one of:
|
alpha |
A numeric giving the personalized PageRank damping factor; 1 - alpha is the restart probability for the PPR random walk. |
r_cl1 |
A numeric hierarchical scaling parameter for the first clustering. |
rescale_path_type_cl1 |
A string; rescale the hierarchical height of the first clustering by:
|
ppr_implementation_cl1 |
Choose a implementation for personalized page-rank calculation for the first clustering:
|
dist_rescaled_cl1 |
A logical: if TRUE, the linkage distances of the first clustering are linearly rescaled to be in-between 0 and 1. |
row_normalize_cl1 |
Whether to normalize all rows in the first clustering so they sum to one before calculating ECS. It is recommended to set this to TRUE, which will lead to slightly different ECS values compared to clusim. |
r_cl2 |
A numeric hierarchical scaling parameter for the second clustering. |
rescale_path_type_cl2 |
A string; rescale the hierarchical height of the second clustering by:
|
ppr_implementation_cl2 |
Choose a implementation for personalized page-rank calculation for the second clustering:
|
dist_rescaled_cl2 |
A logical: if TRUE, the linkage distances of the second clustering are linearly rescaled to be in-between 0 and 1. |
row_normalize_cl2 |
Whether to normalize all rows in the second clustering so they sum to one before calculating ECS. It is recommended to set this to TRUE, which will lead to slightly different ECS values compared to clusim. |
Vector of element-centric similarity between the two clusterings for each element.
Gates, A. J., Wood, I. B., Hetrick, W. P., & Ahn, Y. Y. (2019). Element-centric clustering comparison unifies overlaps and hierarchy. Scientific reports, 9(1), 1-13. https://doi.org/10.1038/s41598-019-44892-y
km.res <- kmeans(iris[, 1:4], centers = 8)$cluster hc.res <- hclust(dist(iris[, 1:4])) element_sim_elscore(km.res, hc.res)
km.res <- kmeans(iris[, 1:4], centers = 8)$cluster hc.res <- hclust(dist(iris[, 1:4])) element_sim_elscore(km.res, hc.res)
Compare a set of clusterings by calculating their pairwise average element-centric clustering similarities.
element_sim_matrix( clustering_list, output_type = "matrix", alpha = 0.9, r = 1, rescale_path_type = "max", ppr_implementation = "prpack", dist_rescaled = FALSE, row_normalize = TRUE )
element_sim_matrix( clustering_list, output_type = "matrix", alpha = 0.9, r = 1, rescale_path_type = "max", ppr_implementation = "prpack", dist_rescaled = FALSE, row_normalize = TRUE )
clustering_list |
The list of clustering results, each of which is either:
|
output_type |
A string specifying whether the output should be a matrix or a data.frame. |
alpha |
A numeric giving the personalized PageRank damping factor; 1 - alpha is the restart probability for the PPR random walk. |
r |
A numeric hierarchical scaling parameter. |
rescale_path_type |
A string; rescale the hierarchical height by:
|
ppr_implementation |
Choose a implementation for personalized page-rank calculation:
|
dist_rescaled |
A logical: if TRUE, the linkage distances are linearly rescaled to be in-between 0 and 1. |
row_normalize |
Whether to normalize all rows in clustering_result so they sum to one before calculating ECS. It is recommended to set this to TRUE, which will lead to slightly different ECS values compared to clusim. |
A matrix or data.frame containing the pairwise ECS values.
Gates, A. J., Wood, I. B., Hetrick, W. P., & Ahn, Y. Y. (2019). Element-centric clustering comparison unifies overlaps and hierarchy. Scientific reports, 9(1), 1-13. https://doi.org/10.1038/s41598-019-44892-y
# cluster across 20 random seeds clustering.list <- lapply(1:20, function(x) kmeans(mtcars, centers = 3)$cluster) element_sim_matrix(clustering.list, output_type = "matrix")
# cluster across 20 random seeds clustering.list <- lapply(1:20, function(x) kmeans(mtcars, centers = 3)$cluster) element_sim_matrix(clustering.list, output_type = "matrix")
Given the output of the automatic_stability_assessment
function, extract the clusters that are specific to a particular
configuration of feature type, feature size, clustering method and,
optionally, the number of clusters.
get_clusters_from_clustassess_object( clustassess_object, feature_type = NULL, feature_size = NULL, clustering_method = NULL, nclusters = NULL )
get_clusters_from_clustassess_object( clustassess_object, feature_type = NULL, feature_size = NULL, clustering_method = NULL, nclusters = NULL )
clustassess_object |
Output of the |
feature_type |
Type of feature used for dimensionality reduction. If
|
feature_size |
Size of the feature set used for clustering. If |
clustering_method |
Clustering method used. If |
nclusters |
Number of clusters to extract. If |
A list of clusters that are specific to the given configuration. Each number of cluster will contain the list of partitions with that specific k and the ECC value indicating the overall stability of k.
Given a NN adjacency matrix, the function calculates the highest pruning parameter for the SNN graph that preserves the connectivity of the graph.
get_highest_prune_param(nn_matrix, n_neigh)
get_highest_prune_param(nn_matrix, n_neigh)
nn_matrix |
The adjacency matrix of the nearest neighbour graph. |
n_neigh |
The number of nearest neighbours. |
A list with the following fields:
prune_value
: The value of the highest pruning parameter.
adj_matrix
: The adjacency matrix of the SNN graph after pruning.
Given the way the SNN graph is built, the possible values for the pruning
parameter are limited and can be determined by the formula i / (2 * n_neigh - i)
,
where i
is a number of nearest neighbours between 0 and n_neigh
.
set.seed(2024) # create an artificial pca embedding pca_embedding <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(pca_embedding) <- as.character(1:200) colnames(pca_embedding) <- paste("PC", 1:10) # calculate the nn adjacency matrix nn_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 5)$nn.idx, 5, 0, -1 )$nn get_highest_prune_param(nn_matrix, 5)$prune_value
set.seed(2024) # create an artificial pca embedding pca_embedding <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(pca_embedding) <- as.character(1:200) colnames(pca_embedding) <- paste("PC", 1:10) # calculate the nn adjacency matrix nn_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 5)$nn.idx, 5, 0, -1 )$nn get_highest_prune_param(nn_matrix, 5)$prune_value
Given an embedding, the function calculates the highest pruning parameter for the SNN graph that preserves the connectivity of the graph.
get_highest_prune_param_embedding(embedding, n_neigh)
get_highest_prune_param_embedding(embedding, n_neigh)
embedding |
A matrix associated with a PCA embedding. Embeddings from other dimensionality reduction techniques (such as LSI) can be used. |
n_neigh |
The number of nearest neighbours. |
The value of the highest pruning parameter.
Given the way the SNN graph is built, the possible values for the pruning
parameter are limited and can be determined by the formula i / (2 * n_neigh - i)
,
where i
is a number of nearest neighbours between 0 and n_neigh
.
set.seed(2024) # create an artificial pca embedding pca_embedding <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(pca_embedding) <- as.character(1:200) colnames(pca_embedding) <- paste("PC", 1:10) get_highest_prune_param_embedding(pca_embedding, 5)
set.seed(2024) # create an artificial pca embedding pca_embedding <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(pca_embedding) <- as.character(1:200) colnames(pca_embedding) <- paste("PC", 1:10) get_highest_prune_param_embedding(pca_embedding, 5)
One of the steps in the clustering pipeline is building a k-nearest neighbour graph on a reduced-space embedding. This method assesses the relationship between different number of nearest neighbours and the connectivity of the graph. In the context of graph clustering, the number of connected components can be used as a lower bound for the number of clusters. The calculations are performed multiple times by changing the seed at each repetition.
get_nn_conn_comps( embedding, n_neigh_sequence, n_repetitions = 100, seed_sequence = NULL, include_umap = FALSE, umap_arguments = list() )
get_nn_conn_comps( embedding, n_neigh_sequence, n_repetitions = 100, seed_sequence = NULL, include_umap = FALSE, umap_arguments = list() )
embedding |
A matrix associated with a PCA embedding. Embeddings from other dimensionality reduction techniques (such as LSI) can be used. |
n_neigh_sequence |
A sequence of the number of nearest neighbours. |
n_repetitions |
The number of repetitions of applying the pipeline with different seeds; ignored if seed_sequence is provided by the user. Defaults to '100“. |
seed_sequence |
A custom seed sequence; if the value is NULL, the sequence will be built starting from 1 with a step of 100. |
include_umap |
A boolean value indicating whether to calculate the number
of connected components for the UMAP embedding. Defaults to |
umap_arguments |
Additional arguments passed to the the |
A list having one field associated with a number of nearest neighbours. Each value contains an array of the number of connected components obtained on the specified number of repetitions.
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PCA_", 1:30) nn_conn_comps_obj <- get_nn_conn_comps( embedding = pca_emb, n_neigh_sequence = c(2, 5), n_repetitions = 3, # arguments that are passed to the uwot function umap_arguments = list( min_dist = 0.3, metric = "cosine" ) ) plot_connected_comps_evolution(nn_conn_comps_obj)
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PCA_", 1:30) nn_conn_comps_obj <- get_nn_conn_comps( embedding = pca_emb, n_neigh_sequence = c(2, 5), n_repetitions = 3, # arguments that are passed to the uwot function umap_arguments = list( min_dist = 0.3, metric = "cosine" ) ) plot_connected_comps_evolution(nn_conn_comps_obj)
Computes the NN adjacency matrix given the neighbours
getNNmatrix(nnRanked, k = -1L, start = 0L, prune = 0)
getNNmatrix(nnRanked, k = -1L, start = 0L, prune = 0)
nnRanked |
A matrix with the lists of the nearest neighbours for each point |
k |
The number of neighbours to consider. Defaults to |
start |
The index of the first neighbour to consider. Defaults to |
prune |
The threshold to prune the SNN matrix. If -1, the function will only return the NN matrix. Defaults to |
A list with the NN and SNN adjacency matrices.
Calculates the per-cell overlap of previously calculated marker genes.
marker_overlap( markers1, markers2, clustering1, clustering2, n = 25, overlap_type = "jsi", rank_by = "-p_val", use_sign = TRUE )
marker_overlap( markers1, markers2, clustering1, clustering2, n = 25, overlap_type = "jsi", rank_by = "-p_val", use_sign = TRUE )
markers1 |
The first data frame of marker genes, must contain columns called 'gene' and 'cluster'. |
markers2 |
The second data frame of marker genes, must contain columns called 'gene' and 'cluster'. |
clustering1 |
The first vector of cluster assignments. |
clustering2 |
The second vector of cluster assignments. |
n |
The number of top n markers (ranked by rank_by) to use when calculating the overlap. |
overlap_type |
The type of overlap to calculated: must be one of 'jsi' for Jaccard similarity index and 'intersect' for intersect size. |
rank_by |
A character string giving the name of the column to rank marker genes by. Note the sign here: to rank by lowest p-value, preface the column name with a minus sign; to rank by highest value, where higher value indicates more discriminative genes (for example power in the ROC test), no sign is needed. |
use_sign |
A logical: should the sign of markers match for overlap calculations? So a gene must be a positive or a negative marker in both clusters being compared. If TRUE, markers1 and markers2 must have a 'avg_logFC' or 'avg_log2FC' column, from which the sign of the DE will be extracted. |
A vector of the marker gene overlap per cell.
suppressWarnings({ set.seed(1234) library(Seurat) data("pbmc_small") # cluster with Louvain algorithm pbmc_small <- FindClusters(pbmc_small, resolution = 0.8, verbose = FALSE) # cluster with k-means pbmc.pca <- Embeddings(pbmc_small, "pca") [email protected]$kmeans_clusters <- kmeans(pbmc.pca, centers = 3)$cluster # compare the markers Idents(pbmc_small) <- [email protected]$seurat_clusters louvain.markers <- FindAllMarkers(pbmc_small, logfc.threshold = 1, test.use = "t", verbose = FALSE ) Idents(pbmc_small) <- [email protected]$kmeans_clusters kmeans.markers <- FindAllMarkers(pbmc_small, logfc.threshold = 1, test.use = "t", verbose = FALSE ) [email protected]$jsi <- marker_overlap( louvain.markers, kmeans.markers, [email protected]$seurat_clusters, [email protected]$kmeans_clusters ) # which cells have the same markers, regardless of clustering? FeaturePlot(pbmc_small, "jsi") })
suppressWarnings({ set.seed(1234) library(Seurat) data("pbmc_small") # cluster with Louvain algorithm pbmc_small <- FindClusters(pbmc_small, resolution = 0.8, verbose = FALSE) # cluster with k-means pbmc.pca <- Embeddings(pbmc_small, "pca") pbmc_small@meta.data$kmeans_clusters <- kmeans(pbmc.pca, centers = 3)$cluster # compare the markers Idents(pbmc_small) <- pbmc_small@meta.data$seurat_clusters louvain.markers <- FindAllMarkers(pbmc_small, logfc.threshold = 1, test.use = "t", verbose = FALSE ) Idents(pbmc_small) <- pbmc_small@meta.data$kmeans_clusters kmeans.markers <- FindAllMarkers(pbmc_small, logfc.threshold = 1, test.use = "t", verbose = FALSE ) pbmc_small@meta.data$jsi <- marker_overlap( louvain.markers, kmeans.markers, pbmc_small@meta.data$seurat_clusters, pbmc_small@meta.data$kmeans_clusters ) # which cells have the same markers, regardless of clustering? FeaturePlot(pbmc_small, "jsi") })
Merge flat disjoint clusterings whose pairwise ECS score is above a given threshold. The merging is done using a complete linkage approach.
merge_partitions( partition_list, ecs_thresh = 1, order_logic = c("freq", "avg_agreement", "none"), return_ecs_matrix = FALSE, check_ties = TRUE )
merge_partitions( partition_list, ecs_thresh = 1, order_logic = c("freq", "avg_agreement", "none"), return_ecs_matrix = FALSE, check_ties = TRUE )
partition_list |
A list of flat disjoint membership vectors. |
ecs_thresh |
A numeric: the ecs threshold. |
order_logic |
Variable indicating the method of ordering the partitions. It can take these three values:
|
return_ecs_matrix |
A logical: if TRUE, the function will add the ECS matrix to the return list. Defaults to FALSE. |
check_ties |
A logical value that indicates whether to check for ties
in the highest frequency partitions or not. If TRUE, the function will put
at the first position the partition that has the highest similarity
with the other partitions. Defaults to |
a list of the merged partitions, together with their associated
ECC score. If return_ecs_matrix
is set to TRUE, the function will also
return the ECS matrix.
initial_list <- list(c(1, 1, 2), c(2, 2, 2), c("B", "B", "A")) merge_partitions(initial_list, 1)
initial_list <- list(c(1, 1, 2), c(2, 2, 2), c("B", "B", "A")) merge_partitions(initial_list, 1)
Merge partitions obtained with different resolution values. The
partitions will be grouped based on the number of clusters. The identical
partitions will be merged into a single partition by updating the frequency
using the merge_partitions
method.
merge_resolutions(res_obj)
merge_resolutions(res_obj)
res_obj |
A list associated to a configuration field from the object
returned by the |
A list having one field assigned to each number of clusters. A number
of cluster will contain a list of all merged partitions. To avoid duplicates,
merged_partitions
with threshold 1 is applied.
Plot PAC across iterations for a set of k to assess convergence.
pac_convergence(pac_res, k_plot)
pac_convergence(pac_res, k_plot)
pac_res |
The data.frame output by consensus_cluster. |
k_plot |
A vector with values of k to plot. |
A ggplot2 object with the convergence plot. Convergence has been reached when the lines flatten out across k_plot values. out across
pac.res <- consensus_cluster(iris[, 1:4], k_max = 20) pac_convergence(pac.res, k_plot = c(3, 5, 7, 9))
pac.res <- consensus_cluster(iris[, 1:4], k_max = 20) pac_convergence(pac.res, k_plot = c(3, 5, 7, 9))
Plot final PAC values across range of k to find optimal number of clusters.
pac_landscape(pac_res, n_shade = max(pac_res$iteration)/5)
pac_landscape(pac_res, n_shade = max(pac_res$iteration)/5)
pac_res |
The data.frame output by consensus_cluster. |
n_shade |
The PAC values across the last n_shade iterations will be shaded to illustrate the how stable the PAC score is. |
A ggplot2 object with the final PAC vs k plot. A local minimum in the landscape indicates an especially stable value of k.
pac.res <- consensus_cluster(iris[, 1:4], k_max = 20) pac_landscape(pac.res)
pac.res <- consensus_cluster(iris[, 1:4], k_max = 20) pac_landscape(pac.res)
Display the distribution of the EC consistency for each
clustering method and each resolution value on a given embedding The all
field of the object returned by the get_clustering_difference_object
method is used.
plot_clustering_difference_facet( clust_object, embedding, low_limit = 0, high_limit = 1, grid = TRUE )
plot_clustering_difference_facet( clust_object, embedding, low_limit = 0, high_limit = 1, grid = TRUE )
clust_object |
An object returned by the
|
embedding |
An embedding (only the first two dimensions will be used for visualization). |
low_limit |
The lowest value of ECC that will be displayed on the embedding. |
high_limit |
The highest value of ECC that will be displayed on the embedding. |
grid |
Boolean value indicating whether the facet should be a grid (where each row is associated with a resolution value and each column with a clustering method) or a wrap. |
A ggplot2 object. #TODO should export
# FIXME fix the examples # set.seed(2021) # # create an artificial PCA embedding # pca_embedding <- matrix(runif(100 * 30), nrow = 100) # rownames(pca_embedding) <- as.character(1:100) # colnames(pca_embedding) <- paste0("PCA_", 1:30) # adj_matrix <- Seurat::FindNeighbors(pca_embedding, # k.param = 10, # nn.method = "rann", # verbose = FALSE, # compute.SNN = FALSE # )$nn # clust_diff_obj <- assess_clustering_stability( # graph_adjacency_matrix = adj_matrix, # resolution = c(0.5, 1), # n_repetitions = 10, # algorithm = 1:2, # verbose = FALSE # ) # plot_clustering_difference_facet(clust_diff_obj, pca_embedding)
# FIXME fix the examples # set.seed(2021) # # create an artificial PCA embedding # pca_embedding <- matrix(runif(100 * 30), nrow = 100) # rownames(pca_embedding) <- as.character(1:100) # colnames(pca_embedding) <- paste0("PCA_", 1:30) # adj_matrix <- Seurat::FindNeighbors(pca_embedding, # k.param = 10, # nn.method = "rann", # verbose = FALSE, # compute.SNN = FALSE # )$nn # clust_diff_obj <- assess_clustering_stability( # graph_adjacency_matrix = adj_matrix, # resolution = c(0.5, 1), # n_repetitions = 10, # algorithm = 1:2, # verbose = FALSE # ) # plot_clustering_difference_facet(clust_diff_obj, pca_embedding)
Display EC consistency across clustering methods by summarising the distribution of the EC consistency for each number of clusters.
plot_clustering_overall_stability( clust_object, value_type = c("k", "resolution"), summary_function = stats::median )
plot_clustering_overall_stability( clust_object, value_type = c("k", "resolution"), summary_function = stats::median )
clust_object |
An object returned by the
|
value_type |
A string that specifies the type of value that was used
for grouping the partitions and calculating the ECC score. It can be either
|
summary_function |
The function that will be used to summarize the
distribution of the ECC values obtained for each number of clusters. Defaults
to |
A ggplot2 object with the EC consistency distributions grouped by the clustering methods. Higher consistency indicates a more stable clustering.
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_clustering_overall_stability(clust_diff_obj)
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_clustering_overall_stability(clust_diff_obj)
Display EC consistency across clustering methods, calculated for each value of the resolution parameter or the number of clusters.
plot_clustering_per_value_stability( clust_object, value_type = c("k", "resolution") )
plot_clustering_per_value_stability( clust_object, value_type = c("k", "resolution") )
clust_object |
An object returned by the
|
value_type |
A string that specifies the type of value that was used
for grouping the partitions and calculating the ECC score. It can be either
|
A ggplot2 object with the EC consistency distributions grouped by
the clustering methods. Higher consistency indicates a more stable clustering.
The X axis is decided by the value_type
parameter.
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_clustering_per_value_stability(clust_diff_obj)
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_clustering_per_value_stability(clust_diff_obj)
Display the distribution of the number connected components obtained for each number of neighbours across random seeds.
plot_connected_comps_evolution(nn_conn_comps_object)
plot_connected_comps_evolution(nn_conn_comps_object)
nn_conn_comps_object |
An object or a concatenation of objects returned
by the |
A ggplot2 object with boxplots for the connected component distributions.
The number of connected components is displayed on a logarithmic scale.
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PCA_", 1:30) nn_conn_comps_obj <- get_nn_conn_comps( embedding = pca_emb, n_neigh_sequence = c(2, 5), n_repetitions = 3, # arguments that are passed to the uwot function umap_arguments = list( min_dist = 0.3, metric = "cosine" ) ) plot_connected_comps_evolution(nn_conn_comps_obj)
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PCA_", 1:30) nn_conn_comps_obj <- get_nn_conn_comps( embedding = pca_emb, n_neigh_sequence = c(2, 5), n_repetitions = 3, # arguments that are passed to the uwot function umap_arguments = list( min_dist = 0.3, metric = "cosine" ) ) plot_connected_comps_evolution(nn_conn_comps_obj)
Display EC consistency for each feature set and for each step.
Above each boxplot there is a number representing
the step (or the size of the subset). The ECC values are extracted for each
resolution value and summarized using the summary_function
parameter.
plot_feature_overall_stability_boxplot( feature_object_list, summary_function = stats::median, text_size = 4, boxplot_width = 0.4, dodge_width = 0.7, return_df = FALSE )
plot_feature_overall_stability_boxplot( feature_object_list, summary_function = stats::median, text_size = 4, boxplot_width = 0.4, dodge_width = 0.7, return_df = FALSE )
feature_object_list |
An object or a concatenation of objects returned
by the |
summary_function |
The function that will be used to summarize the ECC
values. Defaults to |
text_size |
The size of the labels above boxplots |
boxplot_width |
Used for adjusting the width of the boxplots; the value
will be passed to the |
dodge_width |
Used for adjusting the horizontal position of the boxplot;
the value will be passed to the |
return_df |
If TRUE, the function will return the ECS values as a
dataframe. Default is |
A ggplot2 object.
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:200) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_overall_stability_boxplot(feature_stability_result)
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:200) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_overall_stability_boxplot(feature_stability_result)
Perform an incremental ECS between two consecutive feature steps. The ECS values are extracted for every resolution value and summarized using a function (e.g. median, mean, etc.).
plot_feature_overall_stability_incremental( feature_object_list, summary_function = stats::median, dodge_width = 0.7, text_size = 4, boxplot_width = 0.4, return_df = FALSE )
plot_feature_overall_stability_incremental( feature_object_list, summary_function = stats::median, dodge_width = 0.7, text_size = 4, boxplot_width = 0.4, return_df = FALSE )
feature_object_list |
An object or a concatenation of objects returned
by the |
summary_function |
The function used to summarize the ECS values.
Default is |
dodge_width |
Used for adjusting the horizontal position of the boxplot;
the value will be passed to the |
text_size |
The size of the labels above boxplots. |
boxplot_width |
Used for adjusting the width of the boxplots; the value
will be passed to the |
return_df |
If TRUE, the function will return the ECS values as
a dataframe. Default is |
A ggplot2 object with ECS distribution will be displayed as a boxplot. Above each boxplot there will be a pair of numbers representing the two steps that are compared.
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(50 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 100, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:100) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = c(5, 10), feature_type = "feature_name", resolution = c(0.1, 0.5), n_repetitions = 3, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_overall_stability_incremental(feature_stability_result)
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(50 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 100, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:100) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = c(5, 10), feature_type = "feature_name", resolution = c(0.1, 0.5), n_repetitions = 3, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_overall_stability_incremental(feature_stability_result)
Display EC consistency for each feature set and for each step. Above each boxplot there is a number representing the step (or the size of the subset). The ECC values are extracted depdening on the resolution value provided by the user.
plot_feature_per_resolution_stability_boxplot( feature_object_list, resolution, violin_plot = FALSE, text_size = 4, boxplot_width = 0.4, dodge_width = 0.7, return_df = FALSE )
plot_feature_per_resolution_stability_boxplot( feature_object_list, resolution, violin_plot = FALSE, text_size = 4, boxplot_width = 0.4, dodge_width = 0.7, return_df = FALSE )
feature_object_list |
An object or a concatenation of objects returned
by the |
resolution |
The resolution value for which the ECC will be extracted. |
violin_plot |
If TRUE, the function will return a violin plot instead
of a boxplot. Default is |
text_size |
The size of the labels above boxplots |
boxplot_width |
Used for adjusting the width of the boxplots; the value
will be passed to the |
dodge_width |
Used for adjusting the horizontal position of the boxplot;
the value will be passed to the |
return_df |
If TRUE, the function will return the ECS values as a
dataframe. Default is |
A ggplot2 object.
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:200) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_per_resolution_stability_boxplot(feature_stability_result, 0.5)
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(100 * 10, min = 3, max = 4)), nrow = 200, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:200) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_per_resolution_stability_boxplot(feature_stability_result, 0.5)
Perform an incremental ECS between two consecutive feature steps. The ECS values are extracted only for a specified resolution value.
plot_feature_per_resolution_stability_incremental( feature_object_list, resolution, dodge_width = 0.7, text_size = 4, boxplot_width = 0.4, return_df = FALSE )
plot_feature_per_resolution_stability_incremental( feature_object_list, resolution, dodge_width = 0.7, text_size = 4, boxplot_width = 0.4, return_df = FALSE )
feature_object_list |
An object or a concatenation of objects returned
by the |
resolution |
The resolution value for which the ECS will be extracted. |
dodge_width |
Used for adjusting the horizontal position of the boxplot;
the value will be passed to the |
text_size |
The size of the labels above boxplots. |
boxplot_width |
Used for adjusting the width of the boxplots; the value
will be passed to the |
return_df |
If TRUE, the function will return the ECS values as a
dataframe. Default is |
A ggplot2 object with ECS distribution will be displayed as a boxplot. Above each boxplot there will be a pair of numbers representing the two steps that are compared.
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(50 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 100, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:100) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = c(5, 10), feature_type = "feature_name", resolution = c(0.1, 0.5), n_repetitions = 3, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_per_resolution_stability_incremental(feature_stability_result, 0.1)
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(50 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 100, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:100) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = c(5, 10), feature_type = "feature_name", resolution = c(0.1, 0.5), n_repetitions = 3, umap_arguments = list( # the following parameters are used by the umap function # and are not mandatory n_neighbors = 3, approx_pow = TRUE, n_epochs = 0, init = "random", min_dist = 0.3 ), clustering_algorithm = 1 ) plot_feature_per_resolution_stability_incremental(feature_stability_result, 0.1)
Display a facet of plots where each subpanel is associated with a feature set and illustrates the distribution of the EC consistency score over the UMAP embedding.
plot_feature_stability_ecs_facet( feature_object_list, resolution, n_facet_cols = 3, point_size = 0.3 )
plot_feature_stability_ecs_facet( feature_object_list, resolution, n_facet_cols = 3, point_size = 0.3 )
feature_object_list |
An object or a concatenation of objects returned
by the |
resolution |
The resolution value for which the ECS will be extracted. |
n_facet_cols |
The number of facet's columns. |
point_size |
The size of the points displayed on the plot. |
A ggplot2 object
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 150, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:150) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, clustering_algorithm = 1 ) plot_feature_stability_ecs_facet( feature_stability_result, 0.5, point_size = 2 )
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 150, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:150) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, clustering_algorithm = 1 ) plot_feature_stability_ecs_facet( feature_stability_result, 0.5, point_size = 2 )
Display a facet of plots where each subpanel is associated with a feature set and illustrates the distribution of the most frequent partition over the UMAP embedding.
plot_feature_stability_mb_facet( feature_object_list, resolution, text_size = 5, n_facet_cols = 3, point_size = 0.3 )
plot_feature_stability_mb_facet( feature_object_list, resolution, text_size = 5, n_facet_cols = 3, point_size = 0.3 )
feature_object_list |
An object or a concatenation of objects returned
by the |
resolution |
The resolution value for which the ECS will be extracted. |
text_size |
The size of the cluster label |
n_facet_cols |
The number of facet's columns. |
point_size |
The size of the points displayed on the plot. |
A ggplot2 object.
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 150, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:150) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, clustering_algorithm = 1 ) plot_feature_stability_mb_facet( feature_stability_result, 0.5, point_size = 2 )
set.seed(2024) # create an artificial expression matrix expr_matrix <- matrix( c(runif(100 * 10), runif(50 * 10, min = 3, max = 4)), nrow = 150, byrow = TRUE ) rownames(expr_matrix) <- as.character(1:150) colnames(expr_matrix) <- paste("feature", 1:10) feature_stability_result <- assess_feature_stability( data_matrix = t(expr_matrix), feature_set = colnames(expr_matrix), steps = 5, feature_type = "feature_name", resolution = c(0.1, 0.5, 1), n_repetitions = 10, clustering_algorithm = 1 ) plot_feature_stability_mb_facet( feature_stability_result, 0.5, point_size = 2 )
For each configuration provided in clust_object, display how many different partitions with the same number of clusters can be obtained by changing the seed.
plot_k_n_partitions( clust_object, colour_information = c("ecc", "freq_part"), dodge_width = 0.3, pt_size_range = c(1.5, 4), summary_function = stats::median, y_step = 5 )
plot_k_n_partitions( clust_object, colour_information = c("ecc", "freq_part"), dodge_width = 0.3, pt_size_range = c(1.5, 4), summary_function = stats::median, y_step = 5 )
clust_object |
An object returned by the
|
colour_information |
String that specifies the information type that will be
illustrated using gradient colour: either |
dodge_width |
Used for adjusting the distance between the boxplots representing
a clustering method. Defaults to |
pt_size_range |
Indicates the minimum and the maximum size a point on the plot can have.
Defaults to |
summary_function |
The function that will be used to summarize the
distribution of the ECC values obtained for each number of clusters. Defaults
to |
y_step |
The step used for the y-axis. Defaults to |
A ggplot2 object. The color gradient suggests the frequency of the most common partition relative to the total number of appearances of that specific number of clusters or the Element-Centric Consistency of the partitions. The size illustrates the frequency of the partitions with k clusters relative to the total number of partitions. The shape of the points indicates the clustering method.
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_k_n_partitions(clust_diff_obj)
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_k_n_partitions(clust_diff_obj)
For each configuration provided in the clust_object, display what number of clusters appear for different values of the resolution parameters.
plot_k_resolution_corresp( clust_object, colour_information = c("ecc", "freq_k"), dodge_width = 0.3, pt_size_range = c(1.5, 4), summary_function = stats::median )
plot_k_resolution_corresp( clust_object, colour_information = c("ecc", "freq_k"), dodge_width = 0.3, pt_size_range = c(1.5, 4), summary_function = stats::median )
clust_object |
An object returned by the
|
colour_information |
String that specifies the information type that
will be illustrated using gradient colour: either |
dodge_width |
Used for adjusting the distance between the boxplots
representing a clustering method. Defaults to |
pt_size_range |
Indicates the minimum and the maximum size a point
on the plot can have. Defaults to |
summary_function |
The function that will be used to summarize the
distribution of the ECC values obtained for each number of clusters. Defaults
to |
A ggplot2 object. Different shapes of points indicate different parameter configuration, while the color illustrates the frequency of the most common partition or the Element-Centric Consistency of the partitions. The frequency is calculated as the fraction between the number of total appearances of partitions with a specific number of clusters and resolution value and the number of runs. The size illustrates the frequency of the most common partition with k clusters relative to the partitions obtained with the same resolution value and have k clusters.
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_k_resolution_corresp(clust_diff_obj)
set.seed(2024) # create an artificial PCA embedding pca_embedding <- matrix(runif(100 * 30), nrow = 100) rownames(pca_embedding) <- paste0("cell_", seq_len(nrow(pca_embedding))) colnames(pca_embedding) <- paste0("PC_", 1:30) adj_matrix <- getNNmatrix( RANN::nn2(pca_embedding, k = 10)$nn.idx, 10, 0, -1 )$nn rownames(adj_matrix) <- paste0("cell_", seq_len(nrow(adj_matrix))) colnames(adj_matrix) <- paste0("cell_", seq_len(ncol(adj_matrix))) # alternatively, the adj_matrix can be calculated # using the `Seurat::FindNeighbors` function. clust_diff_obj <- assess_clustering_stability( graph_adjacency_matrix = adj_matrix, resolution = c(0.5, 1), n_repetitions = 10, clustering_algorithm = 1:2, verbose = FALSE ) plot_k_resolution_corresp(clust_diff_obj)
Display, for all configurations consisting in different number of neighbours, graph types and base embeddings, the EC Consistency of the partitions obtained over multiple runs on an UMAP embedding.
plot_n_neigh_ecs(nn_ecs_object, boxplot_width = 0.5)
plot_n_neigh_ecs(nn_ecs_object, boxplot_width = 0.5)
nn_ecs_object |
An object or a concatenation of objects returned by the
|
boxplot_width |
Used for adjusting the width of the boxplots; the value will
be passed to the |
A ggplot2 object.
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PC_", 1:30) nn_stability_obj <- assess_nn_stability( embedding = pca_emb, n_neigh_sequence = c(10, 15, 20), n_repetitions = 10, graph_reduction_type = "PCA", clustering_algorithm = 1 ) plot_n_neigh_ecs(nn_stability_obj)
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PC_", 1:30) nn_stability_obj <- assess_nn_stability( embedding = pca_emb, n_neigh_sequence = c(10, 15, 20), n_repetitions = 10, graph_reduction_type = "PCA", clustering_algorithm = 1 ) plot_n_neigh_ecs(nn_stability_obj)
Display the distribution of the number of clusters obtained for each number of neighbours across random seeds.
plot_n_neigh_k_correspondence(nn_object_n_clusters)
plot_n_neigh_k_correspondence(nn_object_n_clusters)
nn_object_n_clusters |
An object or a concatenation of objects returned by the
|
A ggplot2 object with the distributions displayed as boxplots.
The number of clusters is displayed on a logarithmic scale.
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PC_", 1:30) nn_stability_obj <- assess_nn_stability( embedding = pca_emb, n_neigh_sequence = c(10, 15, 20), n_repetitions = 10, graph_reduction_type = "PCA", clustering_algorithm = 1 ) plot_n_neigh_k_correspondence(nn_stability_obj)
set.seed(2024) # create an artificial PCA embedding pca_emb <- matrix(runif(100 * 30), nrow = 100, byrow = TRUE) rownames(pca_emb) <- as.character(1:100) colnames(pca_emb) <- paste0("PC_", 1:30) nn_stability_obj <- assess_nn_stability( embedding = pca_emb, n_neigh_sequence = c(10, 15, 20), n_repetitions = 10, graph_reduction_type = "PCA", clustering_algorithm = 1 ) plot_n_neigh_k_correspondence(nn_stability_obj)
Creates the backend interface for the comparison module inside the ClustAssess Shiny application.
server_comparisons(id, chosen_config, chosen_method)
server_comparisons(id, chosen_config, chosen_method)
id |
The id of the module, used to acess the UI elements. |
chosen_config |
A reactive object that contains the chosen configuration from the Dimensionality Reduction tab. |
chosen_method |
A reactive object that contains the chosen method from the Clustering tab. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the backend interface for the dimensionality reduction module inside the ClustAssess Shiny application.
server_dimensionality_reduction(id, parent_session)
server_dimensionality_reduction(id, parent_session)
id |
The id of the module, used to acess the UI elements. |
parent_session |
The session of the parent module, used to control the tabs of the application. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the backend interface for the graph clustering module inside the ClustAssess Shiny application.
server_graph_clustering(id, feature_choice, parent_session)
server_graph_clustering(id, feature_choice, parent_session)
id |
The id of the module, used to acess the UI elements. |
feature_choice |
A reactive object that contains the chosen configuration from the Dimensionality Reduction tab. |
parent_session |
The session of the parent module, used to control the tabs of the application. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the backend interface for the graph construction module inside the ClustAssess Shiny application.
server_graph_construction(id, chosen_config)
server_graph_construction(id, chosen_config)
id |
The id of the module, used to acess the UI elements. |
chosen_config |
A reactive object that contains the chosen configuration from the Dimensionality Reduction tab. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the backend interface for the landing page module inside the ClustAssess Shiny application.
server_landing_page( id, height_ratio, dimension, parent_session, organism = "hsapiens" )
server_landing_page( id, height_ratio, dimension, parent_session, organism = "hsapiens" )
id |
The id of the module, used to acess the UI elements. |
height_ratio |
A reactive object that contains the height ratio of the plots in the application (the height of the plot is calculated using the height ratio and the height of the webpage). |
dimension |
A reactive object that contains the dimensions of the webpage. |
parent_session |
The session of the parent module, used to control the tabs of the application. |
organism |
The organism of the dataset, which will be used in the enrichment analysis. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the backend interface for the sandbox module inside the ClustAssess Shiny application.
server_sandbox(id)
server_sandbox(id)
id |
The id of the module, used to acess the UI elements. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the UI interface for the comparison module inside the ClustAssess Shiny application.
ui_comparisons(id)
ui_comparisons(id)
id |
The id of the module, used to identify the UI elements. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the UI interface for the dimensionality reduction module inside the ClustAssess Shiny application.
ui_dimensionality_reduction(id)
ui_dimensionality_reduction(id)
id |
The id of the module, used to identify the UI elements. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the UI interface for the graph clustering module inside the ClustAssess Shiny application.
ui_graph_clustering(id)
ui_graph_clustering(id)
id |
The id of the module, used to identify the UI elements. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the UI interface for the graph construction module inside the ClustAssess Shiny application.
ui_graph_construction(id)
ui_graph_construction(id)
id |
The id of the module, used to identify the UI elements. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the UI interface for the landing page module inside the ClustAssess Shiny application.
ui_landing_page(id)
ui_landing_page(id)
id |
The id of the module, used to identify the UI elements. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Creates the UI interface for the sandbox module inside the ClustAssess Shiny application.
ui_sandbox(id)
ui_sandbox(id)
id |
The id of the module, used to identify the UI elements. |
This function should not be called directly, but in the context of the
app that is created using the write_shiny_app
function.
Calculate the weighted element-centric consistency of a set of clusterings. The weights are used to give more importance to some clusterings over others.
weighted_element_consistency( clustering_list, weights = NULL, calculate_sim_matrix = FALSE )
weighted_element_consistency( clustering_list, weights = NULL, calculate_sim_matrix = FALSE )
clustering_list |
The list of clustering results, each of which is either:
|
weights |
A numeric vector of weights for each clustering in
|
calculate_sim_matrix |
A logical value that indicates whether to
calculate the similarity matrix or not along with the consistency score.
Defaults to |
A vector containing the weighted element-wise consistency. If
calculate_sim_matrix
is set to TRUE
, the element similarity matrix
will be returned as well.
The weighted ECC will be calculated as
# cluster across 20 random seeds clustering_list <- lapply(1:20, function(x) kmeans(mtcars, centers = 3)$cluster) weights <- sample(1:10, 20, replace = TRUE) weighted_element_consistency(clustering_list, weights = weights)
# cluster across 20 random seeds clustering_list <- lapply(1:20, function(x) kmeans(mtcars, centers = 3)$cluster) weights <- sample(1:10, 20, replace = TRUE) weighted_element_consistency(clustering_list, weights = weights)
Given the output of the ClustAssess pipeline, the expression matrix and the metadata, this function creates the files needed for the ClustAssess ShinyApp. The files are written in the project_folder and are the following:
metadata.rds: the metadata file
stability.h5: contains the stability results
expression.h5: contains the expression matrix and the rank matrix
write_objects( clustassess_object, expression_matrix, metadata, project_folder = ".", compression_level = 6, chunk_size = 100, gene_variance_threshold = 0, summary_function = stats::median, qualpalr_colorspace = "pretty" )
write_objects( clustassess_object, expression_matrix, metadata, project_folder = ".", compression_level = 6, chunk_size = 100, gene_variance_threshold = 0, summary_function = stats::median, qualpalr_colorspace = "pretty" )
clustassess_object |
The output of the ClustAssess automatic pipeline |
expression_matrix |
The expression matrix |
metadata |
The metadata |
project_folder |
The folder where the files will be written |
compression_level |
The compression level for the h5 files (See 'rhdf5::h5createFile“ for more details) |
chunk_size |
The chunk size for the rank matrix (See |
gene_variance_threshold |
The threshold for the gene variance; genes with variance below this threshold will be removed |
summary_function |
The function used for summarizing the stability values; the default is |
qualpalr_colorspace |
The colorspace used for generating the colors; the default is |
NULL (the files are written in the project_folder)
Creates the ClustAssess ShinyApp based on the output of the automatic ClustAssess pipeline. In addition to that, the expression matrix and the metadata dataframe are provided as input to the ShinyApp.
write_shiny_app( object, metadata = NULL, assay_name = NULL, clustassess_object, project_folder, compression_level = 6, summary_function = stats::median, shiny_app_title = "", organism_enrichment = "hsapiens", height_ratio = 0.6, qualpalr_colorspace = "pretty" ) ## S3 method for class 'Seurat' write_shiny_app( object, metadata = NULL, assay_name, clustassess_object, project_folder, compression_level = 6, summary_function = stats::median, shiny_app_title = "", organism_enrichment = "hsapiens", height_ratio = 0.6, qualpalr_colorspace = "pretty" ) ## Default S3 method: write_shiny_app( object, metadata = NULL, assay_name = NULL, clustassess_object, project_folder, compression_level = 6, summary_function = stats::median, shiny_app_title = "", organism_enrichment = "hsapiens", height_ratio = 0.6, qualpalr_colorspace = "pretty" )
write_shiny_app( object, metadata = NULL, assay_name = NULL, clustassess_object, project_folder, compression_level = 6, summary_function = stats::median, shiny_app_title = "", organism_enrichment = "hsapiens", height_ratio = 0.6, qualpalr_colorspace = "pretty" ) ## S3 method for class 'Seurat' write_shiny_app( object, metadata = NULL, assay_name, clustassess_object, project_folder, compression_level = 6, summary_function = stats::median, shiny_app_title = "", organism_enrichment = "hsapiens", height_ratio = 0.6, qualpalr_colorspace = "pretty" ) ## Default S3 method: write_shiny_app( object, metadata = NULL, assay_name = NULL, clustassess_object, project_folder, compression_level = 6, summary_function = stats::median, shiny_app_title = "", organism_enrichment = "hsapiens", height_ratio = 0.6, qualpalr_colorspace = "pretty" )
object |
A Seurat object or an expression matrix |
metadata |
The metadata dataframe. This parameter will be ignored if the object is a Seurat object. |
assay_name |
The name of the assay to be used to extract the expression matrix from the Seurat object. This parameter will be ignored if the object is not a Seurat object. |
clustassess_object |
The output of the ClustAssess automatic pipeline |
project_folder |
The folder where the files will be written |
compression_level |
The compression level for the h5 files (See 'rhdf5::h5createFile“ for more details) |
summary_function |
The function used for summarizing the stability values; the default is |
shiny_app_title |
The title of the shiny app |
organism_enrichment |
The organism used for the enrichment analysis; the default is |
height_ratio |
The ratio of the height of the plot to the height of the browser; the default is |
qualpalr_colorspace |
The colorspace used for generating the colors; the default is |