API

Import:

import spycone as spy

Import dataset

class spycone.dataset(ts, species, keytype, reps1, timepts, gtf=None, gene_id=None, transcript_id=None, timeserieslist=None, symbs=None, discretization_steps=None)[source]

Input dataset.

Parameters
  • ts – matrix or dataframe of the time series dataset.

  • gene_id – list of gene id. (id that matched the biological network, default network : entrez ID). If gene_id is not given, entrez gene id will be mapped.

  • transcript_id – list of transcript id if the expression matrix is in transcript-level.

  • keytype – type of ID in gene list (‘entrezgeneid’, ‘ensemblgeneid’). If only transcript ID is given, this serves as the keytype that will be mapped to gene_id.

  • species – Specifying species ID for annotation (human : 9606, mouse: 10090)

  • reps1 – number of replicates

  • timepts – number of time points

  • gtf ((optional)) – provide corresponding gtf file for mapping gene names

  • symbs ((optional)) – list of gene symbols or gene names (can be automatically mapped for human and mouse)

  • discretization_steps ((optional)) – discretize the expression matrix according to the number of steps given

timeserieslist

automatically generated 3D-array for analysis

__copy__()[source]

copy function for the dataset object

remove_objects()

remove the given index of the object

Import and store biological network

class spycone.BioNetwork(path='human', keytype='entrezid', **kwargs)[source]

Storage of biological network.

Parameters

path – dir path to the network file or “human” as default human biogrid network

connected_nodes
total_node_degree_counts
total_undirected_edgecount_nodedegree
max_degree
all_degree
g :

generate network from file path if type is file or return itself if type is network

lst_g :

return list of node names

adj :

return adjcency matrix

removing_nodes :

remove nodes given the index

Preprocessing

class spycone.preprocess(DataSet, BioNetwork=None, remove_low_var=False, cutoff=0)[source]

Preprocess data, remove objects without expression along all timepoints.

Parameters
  • DataSet (Dataset object.) –

  • BioNetwork (BioNetwork object. If provided, objects that is not in the network will be removed.) –

  • remove_low_var ((boolean) default=False. If true, objects with variance 0 will be removed.) –

  • cutoff (default=0. If given, objects will mean expression across all timepoints lower than the cutoff will be removed.) –

Return type

None, changes made directly in the DataSet object

Isoform-level function

class spycone.iso_function(dataset)[source]

Isoform level analysis

Parameters
  • DataSet – DataSet object

  • Species – Species ID

detect_isoform_switch :
Parameters
  • combine ({'median', 'mean'}) – aggregation methods for replicates. Default=”median”

  • filtering ((boolean, default=True)) – if True, low expression genes will be filtered out.

  • filter_cutoff (default=2) – expression mean cutoff to filter.

  • corr_cutoff (default = 0.7) – minimum correlation of isoform pairs to be included in the output.

  • p_val_cutoff (default = 0.05) – significant p-value cutoff to be included in the output.

  • min_diff (default = 0.1) – minimum differences of relative abundance to be included in the output.

  • event_im_cutoff (default = 0.1) – minimum event importance to be included in the output.

  • adjustp (str {'fdr_bh' (default), 'holm_bonf', 'bonf'}) – Method for multiple testing bonf: Bonferroni method holm_bonf: holm-bonferroni method fdr_bh: Benjamin-hochberg false discovery rate

  • n_permutations – Number of permutations if permutation test is used.

total_isoform_usage :
Parameters
  • ids_result – the result dataframe of isoform switch detection

  • norm (boolean, default=True) – if True, it normalizes time series matrix to relative abundance to gene expression.

  • gene_level (boolean, default=True) – if True, it calculates total isoform usage for each gene, otherwise individual isoform usage for each isoform

Return type

Create an instance for isoform switch analysis.

Perfom clustering

class spycone.clustering(DataSet, algorithm, input_type, n_clusters=10, composite=False, BioNetwork=None, metric='euclidean', prototypefunction='median', linkage='average', searchspace=20, seed=1234321, transform=None, **kwargs)[source]

Clustering object

Parameters
  • DataSet – DataSet object

  • True) (BioNetwork (needed if composite is) – BioNetwork object

  • input_type (str) – clustering expression data put “expression”, and clustering total isoform usage put “isoformusage”

  • algorithm ({'kmeans', 'kmedoids', 'dbscan', 'hierarchical', 'optics'}) – clustering algorithms from sklearn

  • composite (boolean, default=True) – if True, distance metrics is composited with inverse shortest path

  • metric ({'euclidean', 'correlation'}) – metrics from sklearn

  • linkage (only for 'hierarchical' clustering {default='average', 'complete', 'ward'}) –

  • prototypefunction ({default='median', 'mean'}) – aggregation function for cluster prototypes

  • searchspace ((default=20)) – range to search for optimal number of clusters

_prototype
Type

dictionary of prototypes for each cluster (keys)

_lables

The cluster label of each object

Type

array with length of object

genelist_clusters

Key and values pair of clustering with entrez ID (gene_list ID)

Type

dictionary of clusters

index_clusters

Key and values pair of clustering with indices

Type

dictionary of clusters

symbs_clusters

Key and values pair of clustering with gene symbols

Type

dictionary of clusters

_final_n_cluster

number of clusters

_silhouette_index

silhouette index of this clustering

Examples

GO terms enrichment analysis

spycone.list_gsea(genelist, species, gene_sets=None, p_adjust_method='fdr_bh', cutoff=0.05, method='gsea', term_source='all')[source]

Perform gene set enrichment on a list of gene

Parameters:

genelist species: input taxonomy ID if method is “nease”, species name for “gsea” (e.g. hsapiens, mmusculus…) gene_sets: input a valid database name for “nease”, ignore for “gsea” p_adjust_method: input one of the following: “fdr_bh”, “bonf”, “holm_bonf” cutoff: for adjusted p-value

spycone.clusters_gsea(DataSet, species, gene_sets=None, is_results=None, cutoff=0.05, p_adjust_method='fdr_bh', method='nease', term_source='all')[source]

Perform gene set enrichment on clusters (cluster object)

Parameters:

DataSet: Spycone dataset object

species: input taxonomy ID if method is “nease”, species name for “gsea” (e.g. hsapiens, mmusculus…)

p_adjust_method: input one of the following: “fdr_bh”, “bonf”, “holm_bonf”

cutoff: for adjusted p-value

gene_sets : needed when method is “nease”, input one of the database : ‘PharmGKB’,’HumanCyc’,’Wikipathways’,’Reactome’,’KEGG’,’SMPDB’,’Signalink’,’NetPath’,’EHMN’,’INOH’,’BioCarta’,’PID’

method: “nease” or “gsea”

Return:

It returns two objects:

  1. Dictionary containing the enrichment dataframes for each cluster.

  2. If method is “nease”, it returns nease object in the second object. For “gsea”, it returns None.

spycone.modules_gsea(X, clu, species, type='PPI', p_adjust_method='fdr_bh', cutoff=0.05, method='nease', term_source='all')[source]

Perform gene set enrichment on network modules after domino

Run DOMINO

spycone.run_domino(target, name=None, is_results=None, scores=None, network_file='/home/docs/checkouts/readthedocs.org/user_builds/spycone/checkouts/latest/spycone/data/network/mouse_biogrid_entrez.tab', output_file_path='./slices/slices.txt', run_cluster=None, slice_threshold=0.3, module_threshold=0.05, prize_factor=0, n_steps=20)[source]
Parameters
  • target – clustering object from spycone or gene list in entrez ID

  • is_results (DataFrame) – Data Frame of isoform switch detection result

  • scores (None) – activity scores of the genes (e.g. p-values from differential expression analysis)

  • run_cluster – Specify the cluster name if you only want to run a specific cluster

  • file (Network) – default: “data/network/network_human_PPIDDI.tab”

  • path (output file) – default: output slices file for DOMINO.

  • slice_threshold (float) –

  • module_threshold (float) –

  • prize_factor (float) –

  • n_steps (int) –

spycone.run_domain_domino(target, is_results, name=None, scores=None, network_file='/home/docs/checkouts/readthedocs.org/user_builds/spycone/checkouts/latest/spycone/data/network/network_human_PPIDDI.tab', output_file_path='slices.txt', run_cluster=None, slice_threshold=0.3, module_threshold=0.05, prize_factor=0, n_steps=20)[source]
Parameters
  • target – clustering object from spycone or gene list in entrez ID

  • is_results (DataFrame) – Data Frame of isoform switch detection result

  • scores (None) – activity scores of the genes (e.g. p-values from differential expression analysis)

  • run_cluster – Specify the cluster name if you only want to run a specific cluster

  • file (Network) – default: “data/network/network_human_PPIDDI.tab”

  • path (output file) – default: output slices file for DOMINO.

  • slice_threshold (float) –

  • module_threshold (float) –

  • prize_factor (float) –

  • n_steps (int) –

Visualization

spycone.vis_all_clusters(clusterObj, x_label='time points', y_label='expression', Titles='Cluster {col_name}', xtickslabels=None, **kwargs)[source]

Visualize all the clusters with cluster prototype

Parameters
  • clusterObj – input clustering object with results

  • x_label – x-axis label of the plot

  • y_label – y-axis label of the plot

  • Titles ("Cluster {col_name}") – titles for each cluster

spycone.switch_plot(gene, DataSet, ascov, xaxis_label=None, all_isoforms=False, relative_abundance=False)[source]

Switching plot for isoforms / Expression plot for non-switched genes if the input gene is not isoform switched, the expression plot will be plotted.

Parameters
  • gene (str) – Input gene ID / symbs you would like to plot

  • DataSet (DataSet obj) –

  • ascov (DataFrame) – the result dataframe of your isoform switch detection

  • xaxis_label (list) – x axis label for the plots

spycone.gsea_plot(gsea_result, cluster, modules=None, nterms=None)[source]

Visualizing the functional enrichment

Parameters
  • gsea_result (dict) – the results of gsea

  • cluster (str) – the cluster number you would like to visualize

  • nterms (int) – (optional) if you would like to visualize only subset of terms e.g. the top 10 terms

spycone.vis_modules(mods, dataset, cluster, size=5, outputpng=None)[source]

Visualize all modules from one cluster in one figure.

Parameters:

mods: modules result

dataset: spycone dataset object

cluster: cluster number to visualize

size: minimum number of nodes in one module

outputpng: file path to save the figure (png)

spycone.vis_better_modules(dataset, mod, cluster, dir, related_genes={}, module=None)[source]

Visualize modules with pydot in SVG format.

Parameters
  • dataset – dataset obj

  • mod – DOMINO results

  • cluster – Cluster number to visualize

  • dir – Local directory to save the images

  • related_genes ((Optional)) – Set of genes to change the color of the nodes (color of the node outer border)

Splicing factor analysis

spycone.SF_coexpression(dataset, padj_method='bonf', corr_cutoff=0.7, padj_cutoff=0.05, method='pearson')[source]

Return the coexpression between Splicing factor and isoforms

Parameters
  • dataset (input dataset object) –

  • padj_method (Multiple testing method. Default=Bonferonni) –

  • corr_cutoff (Correlation coefficient cutoff. Dafault=0.7) –

  • padj_cutoff (Adjusted p-value cutoff. Default=0.05) –

  • method (Method to calculate the correlation value.) –

Return type

Create an instance for co-expression analysis of splicing factors and transcript abundance.

spycone.SF_motifsearch(list_SF, list_genes, dataset, gtf, gc_ratio=0.6, flanking=400)[source]

Return the PSSM score of target SF and exons binding.

Parameters
  • list_SF (list) – List of splicing factors / RBPs. E.g. the SF that is co-expressed to the isoform abundance.

  • list_genes (list) – List of genes you want to check for SF binding sites. E.g. the cluster of genes that the input SF is co-expressed.

  • gtf_df (str or dataframe) – gtf file path / dataframe

  • gc_ratio (GC content that makes up the background for PSSM score calculation. Default=0.6.) –

  • flanking (Flanking region size) –

Return type

Create an instance for SF motif enrichment analysis.