API

Import:

import spycone as spy

Import dataset

class spycone.dataset(ts, species, keytype, reps1, timepts, gtf=None, gene_id=None, transcript_id=None, timeserieslist=None, symbs=None, discretization_steps=None)[source]

Input dataset.

Parameters

ts – matrix or dataframe of the time series dataset.
gene_id – list of gene id. (id that matched the biological network, default network : entrez ID). If gene_id is not given, entrez gene id will be mapped.
transcript_id – list of transcript id if the expression matrix is in transcript-level.
keytype – type of ID in gene list (‘entrezgeneid’, ‘ensemblgeneid’). If only transcript ID is given, this serves as the keytype that will be mapped to gene_id.
species – Specifying species ID for annotation (human : 9606, mouse: 10090)
reps1 – number of replicates
timepts – number of time points
gtf ((optional)) – provide corresponding gtf file for mapping gene names
symbs ((optional)) – list of gene symbols or gene names (can be automatically mapped for human and mouse)
discretization_steps ((optional)) – discretize the expression matrix according to the number of steps given

timeserieslist: automatically generated 3D-array for analysis

__copy__()[source]: copy function for the dataset object

remove_objects(): remove the given index of the object

Import and store biological network

class spycone.BioNetwork(path='human', keytype='entrezid', **kwargs)[source]

Storage of biological network.

Parameters: path – dir path to the network file or “human” as default human biogrid network

connected_nodes

total_node_degree_counts

total_undirected_edgecount_nodedegree

max_degree

all_degree

g :: generate network from file path if type is file or return itself if type is network

lst_g :: return list of node names

adj :: return adjcency matrix

removing_nodes :: remove nodes given the index

Preprocessing

class spycone.preprocess(DataSet, BioNetwork=None, remove_low_var=False, cutoff=0)[source]

Preprocess data, remove objects without expression along all timepoints.

Parameters

DataSet (Dataset object.) –
BioNetwork (BioNetwork object. If provided, objects that is not in the network will be removed.) –
remove_low_var ((boolean) default=False. If true, objects with variance 0 will be removed.) –
cutoff (default=0. If given, objects will mean expression across all timepoints lower than the cutoff will be removed.) –

Return type

None, changes made directly in the DataSet object

Isoform-level function

class spycone.iso_function(dataset)[source]

Isoform level analysis

Parameters

DataSet – DataSet object
Species – Species ID

detect_isoform_switch :

Parameters

combine ({'median', 'mean'}) – aggregation methods for replicates. Default=”median”
filtering ((boolean, default=True)) – if True, low expression genes will be filtered out.
filter_cutoff (default=2) – expression mean cutoff to filter.
corr_cutoff (default = 0.7) – minimum correlation of isoform pairs to be included in the output.
p_val_cutoff (default = 0.05) – significant p-value cutoff to be included in the output.
min_diff (default = 0.1) – minimum differences of relative abundance to be included in the output.
event_im_cutoff (default = 0.1) – minimum event importance to be included in the output.
adjustp (str {'fdr_bh' (default), 'holm_bonf', 'bonf'}) – Method for multiple testing bonf: Bonferroni method holm_bonf: holm-bonferroni method fdr_bh: Benjamin-hochberg false discovery rate
n_permutations – Number of permutations if permutation test is used.

total_isoform_usage :

Parameters

ids_result – the result dataframe of isoform switch detection
norm (boolean, default=True) – if True, it normalizes time series matrix to relative abundance to gene expression.
gene_level (boolean, default=True) – if True, it calculates total isoform usage for each gene, otherwise individual isoform usage for each isoform

Return type: Create an instance for isoform switch analysis.

Perfom clustering

class spycone.clustering(DataSet, algorithm, input_type, n_clusters=10, composite=False, BioNetwork=None, metric='euclidean', prototypefunction='median', linkage='average', searchspace=20, seed=1234321, transform=None, **kwargs)[source]

Clustering object

Parameters

DataSet – DataSet object
True) (BioNetwork (needed if composite is) – BioNetwork object
input_type (str) – clustering expression data put “expression”, and clustering total isoform usage put “isoformusage”
algorithm ({'kmeans', 'kmedoids', 'dbscan', 'hierarchical', 'optics'}) – clustering algorithms from sklearn
composite (boolean, default=True) – if True, distance metrics is composited with inverse shortest path
metric ({'euclidean', 'correlation'}) – metrics from sklearn
linkage (only for 'hierarchical' clustering {default='average', 'complete', 'ward'}) –
prototypefunction ({default='median', 'mean'}) – aggregation function for cluster prototypes
searchspace ((default=20)) – range to search for optimal number of clusters

_prototype

Type: dictionary of prototypes for each cluster (keys)

_lables

The cluster label of each object

Type: array with length of object

genelist_clusters

Key and values pair of clustering with entrez ID (gene_list ID)

Type: dictionary of clusters

index_clusters

Key and values pair of clustering with indices

Type: dictionary of clusters

symbs_clusters

Key and values pair of clustering with gene symbols

Type: dictionary of clusters

_final_n_cluster: number of clusters

_silhouette_index: silhouette index of this clustering

Examples

GO terms enrichment analysis

spycone.list_gsea(genelist, species, gene_sets=None, p_adjust_method='fdr_bh', cutoff=0.05, method='gsea', term_source='all')[source]: Perform gene set enrichment on a list of gene

Parameters:

genelist species: input taxonomy ID if method is “nease”, species name for “gsea” (e.g. hsapiens, mmusculus…) gene_sets: input a valid database name for “nease”, ignore for “gsea” p_adjust_method: input one of the following: “fdr_bh”, “bonf”, “holm_bonf” cutoff: for adjusted p-value

spycone.clusters_gsea(DataSet, species, gene_sets=None, is_results=None, cutoff=0.05, p_adjust_method='fdr_bh', method='nease', term_source='all')[source]

Perform gene set enrichment on clusters (cluster object)

Parameters:

DataSet: Spycone dataset object

species: input taxonomy ID if method is “nease”, species name for “gsea” (e.g. hsapiens, mmusculus…)

p_adjust_method: input one of the following: “fdr_bh”, “bonf”, “holm_bonf”

cutoff: for adjusted p-value

gene_sets : needed when method is “nease”, input one of the database : ‘PharmGKB’,’HumanCyc’,’Wikipathways’,’Reactome’,’KEGG’,’SMPDB’,’Signalink’,’NetPath’,’EHMN’,’INOH’,’BioCarta’,’PID’

method: “nease” or “gsea”

Return:

It returns two objects:

Dictionary containing the enrichment dataframes for each cluster.
If method is “nease”, it returns nease object in the second object. For “gsea”, it returns None.

spycone.modules_gsea(X, clu, species, type='PPI', p_adjust_method='fdr_bh', cutoff=0.05, method='nease', term_source='all')[source]: Perform gene set enrichment on network modules after domino

Run DOMINO

spycone.run_domino(target, name=None, is_results=None, scores=None, network_file='/home/docs/checkouts/readthedocs.org/user_builds/spycone/checkouts/latest/spycone/data/network/mouse_biogrid_entrez.tab', output_file_path='./slices/slices.txt', run_cluster=None, slice_threshold=0.3, module_threshold=0.05, prize_factor=0, n_steps=20)[source]

Parameters

target – clustering object from spycone or gene list in entrez ID
is_results (DataFrame) – Data Frame of isoform switch detection result
scores (None) – activity scores of the genes (e.g. p-values from differential expression analysis)
run_cluster – Specify the cluster name if you only want to run a specific cluster
file (Network) – default: “data/network/network_human_PPIDDI.tab”
path (output file) – default: output slices file for DOMINO.
slice_threshold (float) –
module_threshold (float) –
prize_factor (float) –
n_steps (int) –

spycone.run_domain_domino(target, is_results, name=None, scores=None, network_file='/home/docs/checkouts/readthedocs.org/user_builds/spycone/checkouts/latest/spycone/data/network/network_human_PPIDDI.tab', output_file_path='slices.txt', run_cluster=None, slice_threshold=0.3, module_threshold=0.05, prize_factor=0, n_steps=20)[source]

Parameters

target – clustering object from spycone or gene list in entrez ID
is_results (DataFrame) – Data Frame of isoform switch detection result
scores (None) – activity scores of the genes (e.g. p-values from differential expression analysis)
run_cluster – Specify the cluster name if you only want to run a specific cluster
file (Network) – default: “data/network/network_human_PPIDDI.tab”
path (output file) – default: output slices file for DOMINO.
slice_threshold (float) –
module_threshold (float) –
prize_factor (float) –
n_steps (int) –

Visualization

spycone.vis_all_clusters(clusterObj, x_label='time points', y_label='expression', Titles='Cluster {col_name}', xtickslabels=None, **kwargs)[source]

Visualize all the clusters with cluster prototype

Parameters

clusterObj – input clustering object with results
x_label – x-axis label of the plot
y_label – y-axis label of the plot
Titles ("Cluster {col_name}") – titles for each cluster

spycone.switch_plot(gene, DataSet, ascov, xaxis_label=None, all_isoforms=False, relative_abundance=False)[source]

Switching plot for isoforms / Expression plot for non-switched genes if the input gene is not isoform switched, the expression plot will be plotted.

Parameters

gene (str) – Input gene ID / symbs you would like to plot
DataSet (DataSet obj) –
ascov (DataFrame) – the result dataframe of your isoform switch detection
xaxis_label (list) – x axis label for the plots

spycone.gsea_plot(gsea_result, cluster, modules=None, nterms=None)[source]

Visualizing the functional enrichment

Parameters

gsea_result (dict) – the results of gsea
cluster (str) – the cluster number you would like to visualize
nterms (int) – (optional) if you would like to visualize only subset of terms e.g. the top 10 terms

spycone.vis_modules(mods, dataset, cluster, size=5, outputpng=None)[source]: Visualize all modules from one cluster in one figure.

Parameters:

mods: modules result

dataset: spycone dataset object

cluster: cluster number to visualize

size: minimum number of nodes in one module

outputpng: file path to save the figure (png)

spycone.vis_better_modules(dataset, mod, cluster, dir, related_genes={}, module=None)[source]

Visualize modules with pydot in SVG format.

Parameters

dataset – dataset obj
mod – DOMINO results
cluster – Cluster number to visualize
dir – Local directory to save the images
related_genes ((Optional)) – Set of genes to change the color of the nodes (color of the node outer border)

Splicing factor analysis

spycone.SF_coexpression(dataset, padj_method='bonf', corr_cutoff=0.7, padj_cutoff=0.05, method='pearson')[source]

Return the coexpression between Splicing factor and isoforms

Parameters

dataset (input dataset object) –
padj_method (Multiple testing method. Default=Bonferonni) –
corr_cutoff (Correlation coefficient cutoff. Dafault=0.7) –
padj_cutoff (Adjusted p-value cutoff. Default=0.05) –
method (Method to calculate the correlation value.) –

Return type

Create an instance for co-expression analysis of splicing factors and transcript abundance.

spycone.SF_motifsearch(list_SF, list_genes, dataset, gtf, gc_ratio=0.6, flanking=400)[source]

Return the PSSM score of target SF and exons binding.

Parameters

list_SF (list) – List of splicing factors / RBPs. E.g. the SF that is co-expressed to the isoform abundance.
list_genes (list) – List of genes you want to check for SF binding sites. E.g. the cluster of genes that the input SF is co-expressed.
gtf_df (str or dataframe) – gtf file path / dataframe
gc_ratio (GC content that makes up the background for PSSM score calculation. Default=0.6.) –
flanking (Flanking region size) –

Return type

Create an instance for SF motif enrichment analysis.