API
Import:
import spycone as spy
Import dataset
- class spycone.dataset(ts, species, keytype, reps1, timepts, gtf=None, gene_id=None, transcript_id=None, timeserieslist=None, symbs=None, discretization_steps=None)[source]
Input dataset.
- Parameters
ts – matrix or dataframe of the time series dataset.
gene_id – list of gene id. (id that matched the biological network, default network : entrez ID). If gene_id is not given, entrez gene id will be mapped.
transcript_id – list of transcript id if the expression matrix is in transcript-level.
keytype – type of ID in gene list (‘entrezgeneid’, ‘ensemblgeneid’). If only transcript ID is given, this serves as the keytype that will be mapped to gene_id.
species – Specifying species ID for annotation (human : 9606, mouse: 10090)
reps1 – number of replicates
timepts – number of time points
gtf ((optional)) – provide corresponding gtf file for mapping gene names
symbs ((optional)) – list of gene symbols or gene names (can be automatically mapped for human and mouse)
discretization_steps ((optional)) – discretize the expression matrix according to the number of steps given
- timeserieslist
automatically generated 3D-array for analysis
- remove_objects()
remove the given index of the object
Import and store biological network
- class spycone.BioNetwork(path='human', keytype='entrezid', **kwargs)[source]
Storage of biological network.
- Parameters
path – dir path to the network file or “human” as default human biogrid network
- connected_nodes
- total_node_degree_counts
- total_undirected_edgecount_nodedegree
- max_degree
- all_degree
- g :
generate network from file path if type is file or return itself if type is network
- lst_g :
return list of node names
- adj :
return adjcency matrix
- removing_nodes :
remove nodes given the index
Preprocessing
- class spycone.preprocess(DataSet, BioNetwork=None, remove_low_var=False, cutoff=0)[source]
Preprocess data, remove objects without expression along all timepoints.
- Parameters
DataSet (Dataset object.) –
BioNetwork (BioNetwork object. If provided, objects that is not in the network will be removed.) –
remove_low_var ((boolean) default=False. If true, objects with variance 0 will be removed.) –
cutoff (default=0. If given, objects will mean expression across all timepoints lower than the cutoff will be removed.) –
- Return type
None, changes made directly in the DataSet object
Isoform-level function
- class spycone.iso_function(dataset)[source]
Isoform level analysis
- Parameters
DataSet – DataSet object
Species – Species ID
- detect_isoform_switch :
- Parameters
combine ({'median', 'mean'}) – aggregation methods for replicates. Default=”median”
filtering ((boolean, default=True)) – if True, low expression genes will be filtered out.
filter_cutoff (default=2) – expression mean cutoff to filter.
corr_cutoff (default = 0.7) – minimum correlation of isoform pairs to be included in the output.
p_val_cutoff (default = 0.05) – significant p-value cutoff to be included in the output.
min_diff (default = 0.1) – minimum differences of relative abundance to be included in the output.
event_im_cutoff (default = 0.1) – minimum event importance to be included in the output.
adjustp (str {'fdr_bh' (default), 'holm_bonf', 'bonf'}) – Method for multiple testing bonf: Bonferroni method holm_bonf: holm-bonferroni method fdr_bh: Benjamin-hochberg false discovery rate
n_permutations – Number of permutations if permutation test is used.
- total_isoform_usage :
- Parameters
ids_result – the result dataframe of isoform switch detection
norm (boolean, default=True) – if True, it normalizes time series matrix to relative abundance to gene expression.
gene_level (boolean, default=True) – if True, it calculates total isoform usage for each gene, otherwise individual isoform usage for each isoform
- Return type
Create an instance for isoform switch analysis.
Perfom clustering
- class spycone.clustering(DataSet, algorithm, input_type, n_clusters=10, composite=False, BioNetwork=None, metric='euclidean', prototypefunction='median', linkage='average', searchspace=20, seed=1234321, transform=None, **kwargs)[source]
Clustering object
- Parameters
DataSet – DataSet object
True) (BioNetwork (needed if composite is) – BioNetwork object
input_type (str) – clustering expression data put “expression”, and clustering total isoform usage put “isoformusage”
algorithm ({'kmeans', 'kmedoids', 'dbscan', 'hierarchical', 'optics'}) – clustering algorithms from sklearn
composite (boolean, default=True) – if True, distance metrics is composited with inverse shortest path
metric ({'euclidean', 'correlation'}) – metrics from sklearn
linkage (only for 'hierarchical' clustering {default='average', 'complete', 'ward'}) –
prototypefunction ({default='median', 'mean'}) – aggregation function for cluster prototypes
searchspace ((default=20)) – range to search for optimal number of clusters
- _prototype
- Type
dictionary of prototypes for each cluster (keys)
- _lables
The cluster label of each object
- Type
array with length of object
- genelist_clusters
Key and values pair of clustering with entrez ID (gene_list ID)
- Type
dictionary of clusters
- index_clusters
Key and values pair of clustering with indices
- Type
dictionary of clusters
- symbs_clusters
Key and values pair of clustering with gene symbols
- Type
dictionary of clusters
- _final_n_cluster
number of clusters
- _silhouette_index
silhouette index of this clustering
Examples
GO terms enrichment analysis
- spycone.list_gsea(genelist, species, gene_sets=None, p_adjust_method='fdr_bh', cutoff=0.05, method='gsea', term_source='all')[source]
Perform gene set enrichment on a list of gene
Parameters:
genelist species: input taxonomy ID if method is “nease”, species name for “gsea” (e.g. hsapiens, mmusculus…) gene_sets: input a valid database name for “nease”, ignore for “gsea” p_adjust_method: input one of the following: “fdr_bh”, “bonf”, “holm_bonf” cutoff: for adjusted p-value
- spycone.clusters_gsea(DataSet, species, gene_sets=None, is_results=None, cutoff=0.05, p_adjust_method='fdr_bh', method='nease', term_source='all')[source]
Perform gene set enrichment on clusters (cluster object)
Parameters:
DataSet: Spycone dataset object
species: input taxonomy ID if method is “nease”, species name for “gsea” (e.g. hsapiens, mmusculus…)
p_adjust_method: input one of the following: “fdr_bh”, “bonf”, “holm_bonf”
cutoff: for adjusted p-value
gene_sets : needed when method is “nease”, input one of the database : ‘PharmGKB’,’HumanCyc’,’Wikipathways’,’Reactome’,’KEGG’,’SMPDB’,’Signalink’,’NetPath’,’EHMN’,’INOH’,’BioCarta’,’PID’
method: “nease” or “gsea”
Return:
It returns two objects:
Dictionary containing the enrichment dataframes for each cluster.
If method is “nease”, it returns nease object in the second object. For “gsea”, it returns None.
Run DOMINO
- spycone.run_domino(target, name=None, is_results=None, scores=None, network_file='/home/docs/checkouts/readthedocs.org/user_builds/spycone/checkouts/latest/spycone/data/network/mouse_biogrid_entrez.tab', output_file_path='./slices/slices.txt', run_cluster=None, slice_threshold=0.3, module_threshold=0.05, prize_factor=0, n_steps=20)[source]
- Parameters
target – clustering object from spycone or gene list in entrez ID
is_results (DataFrame) – Data Frame of isoform switch detection result
scores (None) – activity scores of the genes (e.g. p-values from differential expression analysis)
run_cluster – Specify the cluster name if you only want to run a specific cluster
file (Network) – default: “data/network/network_human_PPIDDI.tab”
path (output file) – default: output slices file for DOMINO.
slice_threshold (float) –
module_threshold (float) –
prize_factor (float) –
n_steps (int) –
- spycone.run_domain_domino(target, is_results, name=None, scores=None, network_file='/home/docs/checkouts/readthedocs.org/user_builds/spycone/checkouts/latest/spycone/data/network/network_human_PPIDDI.tab', output_file_path='slices.txt', run_cluster=None, slice_threshold=0.3, module_threshold=0.05, prize_factor=0, n_steps=20)[source]
- Parameters
target – clustering object from spycone or gene list in entrez ID
is_results (DataFrame) – Data Frame of isoform switch detection result
scores (None) – activity scores of the genes (e.g. p-values from differential expression analysis)
run_cluster – Specify the cluster name if you only want to run a specific cluster
file (Network) – default: “data/network/network_human_PPIDDI.tab”
path (output file) – default: output slices file for DOMINO.
slice_threshold (float) –
module_threshold (float) –
prize_factor (float) –
n_steps (int) –
Visualization
- spycone.vis_all_clusters(clusterObj, x_label='time points', y_label='expression', Titles='Cluster {col_name}', xtickslabels=None, **kwargs)[source]
Visualize all the clusters with cluster prototype
- Parameters
clusterObj – input clustering object with results
x_label – x-axis label of the plot
y_label – y-axis label of the plot
Titles ("Cluster {col_name}") – titles for each cluster
- spycone.switch_plot(gene, DataSet, ascov, xaxis_label=None, all_isoforms=False, relative_abundance=False)[source]
Switching plot for isoforms / Expression plot for non-switched genes if the input gene is not isoform switched, the expression plot will be plotted.
- Parameters
gene (str) – Input gene ID / symbs you would like to plot
DataSet (DataSet obj) –
ascov (DataFrame) – the result dataframe of your isoform switch detection
xaxis_label (list) – x axis label for the plots
- spycone.gsea_plot(gsea_result, cluster, modules=None, nterms=None)[source]
Visualizing the functional enrichment
- Parameters
gsea_result (dict) – the results of gsea
cluster (str) – the cluster number you would like to visualize
nterms (int) – (optional) if you would like to visualize only subset of terms e.g. the top 10 terms
- spycone.vis_modules(mods, dataset, cluster, size=5, outputpng=None)[source]
Visualize all modules from one cluster in one figure.
Parameters:
mods: modules result
dataset: spycone dataset object
cluster: cluster number to visualize
size: minimum number of nodes in one module
outputpng: file path to save the figure (png)
- spycone.vis_better_modules(dataset, mod, cluster, dir, related_genes={}, module=None)[source]
Visualize modules with pydot in SVG format.
- Parameters
dataset – dataset obj
mod – DOMINO results
cluster – Cluster number to visualize
dir – Local directory to save the images
related_genes ((Optional)) – Set of genes to change the color of the nodes (color of the node outer border)
Splicing factor analysis
- spycone.SF_coexpression(dataset, padj_method='bonf', corr_cutoff=0.7, padj_cutoff=0.05, method='pearson')[source]
Return the coexpression between Splicing factor and isoforms
- Parameters
dataset (input dataset object) –
padj_method (Multiple testing method. Default=Bonferonni) –
corr_cutoff (Correlation coefficient cutoff. Dafault=0.7) –
padj_cutoff (Adjusted p-value cutoff. Default=0.05) –
method (Method to calculate the correlation value.) –
- Return type
Create an instance for co-expression analysis of splicing factors and transcript abundance.
- spycone.SF_motifsearch(list_SF, list_genes, dataset, gtf, gc_ratio=0.6, flanking=400)[source]
Return the PSSM score of target SF and exons binding.
- Parameters
list_SF (list) – List of splicing factors / RBPs. E.g. the SF that is co-expressed to the isoform abundance.
list_genes (list) – List of genes you want to check for SF binding sites. E.g. the cluster of genes that the input SF is co-expressed.
gtf_df (str or dataframe) – gtf file path / dataframe
gc_ratio (GC content that makes up the background for PSSM score calculation. Default=0.6.) –
flanking (Flanking region size) –
- Return type
Create an instance for SF motif enrichment analysis.