geometricus package¶
Submodules¶
geometricus.geometricus module¶
- geometricus.geometricus.Shapemer¶
An integer (in the case of model) or a list of integers for each moment (the old way)
alias of
Union
[bytes
,tuple
]
- geometricus.geometricus.Shapemers¶
A list of Shapemer types
alias of
List
[Union
[bytes
,tuple
]]
- class geometricus.geometricus.Geometricus(protein_keys: List[Union[str, Tuple[str, str]]], shapemer_to_protein_indices: Dict[Union[bytes, tuple], List[Tuple[Union[str, Tuple[str, str]], int]]], proteins_to_shapemers: Dict[Union[str, Tuple[str, str]], List[Union[bytes, tuple]]], shapemer_keys: List[Union[bytes, tuple]], proteins_to_shapemer_residue_indices: Dict[Union[str, Tuple[str, str]], List[Union[bytes, tuple]]], resolution: Optional[Union[float, ndarray]] = None)[source]¶
Bases:
object
Class for storing embedding information
- protein_keys: List[Union[str, Tuple[str, str]]]¶
List of protein names = rows of the output embedding
- shapemer_to_protein_indices: Dict[Union[bytes, tuple], List[Tuple[Union[str, Tuple[str, str]], int]]]¶
Maps each shapemer to the proteins which have it and to the corresponding residue indices within these proteins
- proteins_to_shapemers: Dict[Union[str, Tuple[str, str]], List[Union[bytes, tuple]]]¶
Maps each protein to a list of shapemers in order of its residues
- shapemer_keys: List[Union[bytes, tuple]]¶
List of shapemers found
- proteins_to_shapemer_residue_indices: Dict[Union[str, Tuple[str, str]], List[Union[bytes, tuple]]]¶
Maps each protein to a set of residue indices covered by the current residue’s shapemer in order of its residues
- resolution: Union[float, ndarray] = None¶
Multiplier that determines how coarse/fine-grained each shape is. This can be a single number, multiplied to all four moment invariants or a numpy array of four numbers, one for each invariant (This is for the old way of binning shapemers)
- classmethod from_protein_files(input_files: Union[Path, str, List[str]], model: Optional[ShapemerLearn] = None, split_infos: Optional[List[SplitInfo]] = None, moment_types: Optional[List[str]] = None, resolution: Optional[Union[float, ndarray]] = None, n_threads: int = 1, verbose: bool = True)[source]¶
Creates a Geometricus object from protein structure files
- Parameters:
input_files –
Can be
A list of structure files (.pdb, .pdb.gz, .cif, .cif.gz), A list of (structure_file, chain) A list of PDBIDs or PDBID_chain or (PDB ID, chain) A folder with input structure files, A file which lists structure filenames or “structure_filename, chain” on each line, A file which lists PDBIDs or PDBID_chain or PDBID, chain on each line
model – trained ShapemerLearn model if this is not None, shapemers are generated using the trained model and split_infos, moment_types, and resolution is ignored
split_infos – List of SplitInfo objects
moment_types – List of moment types to use
resolution – Multiplier that determines how coarse/fine-grained each shape is. This can be a single number, multiplied to all four moment invariants or a numpy array of four numbers, one for each invariant (This is for the old way of binning shapemers)
n_threads – Number of threads to use
verbose – Whether to print progress
- Return type:
Geometricus object
- classmethod from_invariants(invariants: Union[Generator[MultipleMomentInvariants], List[MultipleMomentInvariants]], protein_keys: Optional[List[ProteinKey]] = None, model: Optional[ShapemerLearn] = None, resolution: Optional[Union[float, np.ndarray]] = None)[source]¶
Make a GeometricusEmbedding object from a list of MultipleMomentInvariant objects
- Parameters:
invariants – List of MultipleMomentInvariant objects
protein_keys – list of protein names = rows of the output embedding. if None, takes all keys in invariants
model – if given, uses this model to make the shapemers
resolution – multiplier that determines how coarse/fine-grained each shape is this can be a single number, multiplied to all four moment invariants or a numpy array of four numbers, one for each invariant (This is for the old way of binning shapemers)
- map_shapemers_to_indices(protein_keys=None)[source]¶
Maps each shapemer to the proteins which have it and to the corresponding residue indices within these proteins Maps shapemer to (protein_key, residue_index)
- map_protein_to_shapemer_indices(protein_keys=None, shapemer_keys=None)[source]¶
Maps each protein to a list of shapemer indices where the index corresponds to the shapemer in shapemer_keys in order of its residues
geometricus.moment_utility module¶
- geometricus.moment_utility.nb_mean_axis_0(array: ndarray) ndarray [source]¶
Same as np.mean(array, axis=0) but njitted
- class geometricus.moment_utility.MomentInfo(moment_function: Callable[[int, int, int, numpy.ndarray, numpy.ndarray], float], mu_arguments: List[Tuple[int, int, int]])[source]¶
Bases:
object
- moment_function: Callable[[int, int, int, ndarray, ndarray], float]¶
- mu_arguments: List[Tuple[int, int, int]]¶
- geometricus.moment_utility.F(mu_201, mu_021, mu_210, mu_300, mu_111, mu_012, mu_003, mu_030, mu_102, mu_120)[source]¶
- geometricus.moment_utility.make_formula(name, formula_string)[source]¶
Generate code from one of the formula in Appendix 4A of “2D and 3D Image Analysis by Moments”
- Parameters:
name – moment_name
formula_string – formula copy-pasted from PDF
- geometricus.moment_utility.phi_4(mu_030, mu_021, mu_120, mu_003, mu_111, mu_201, mu_102, mu_210, mu_012, mu_300)[source]¶
- geometricus.moment_utility.phi_5(mu_030, mu_021, mu_120, mu_003, mu_201, mu_102, mu_210, mu_012, mu_300)[source]¶
- geometricus.moment_utility.phi_6(mu_030, mu_021, mu_120, mu_003, mu_111, mu_201, mu_102, mu_210, mu_012, mu_300)[source]¶
- geometricus.moment_utility.phi_7(mu_030, mu_021, mu_120, mu_003, mu_111, mu_201, mu_102, mu_210, mu_012, mu_300)[source]¶
- geometricus.moment_utility.phi_8(mu_030, mu_021, mu_120, mu_003, mu_111, mu_201, mu_102, mu_210, mu_012, mu_300)[source]¶
- geometricus.moment_utility.phi_9(mu_030, mu_021, mu_120, mu_101, mu_003, mu_200, mu_110, mu_201, mu_111, mu_102, mu_210, mu_020, mu_012, mu_002, mu_011, mu_300)[source]¶
- geometricus.moment_utility.phi_10(mu_030, mu_021, mu_120, mu_101, mu_003, mu_200, mu_110, mu_201, mu_111, mu_102, mu_210, mu_020, mu_012, mu_002, mu_011, mu_300)[source]¶
- geometricus.moment_utility.phi_11(mu_030, mu_021, mu_120, mu_101, mu_003, mu_200, mu_110, mu_201, mu_102, mu_210, mu_012, mu_020, mu_002, mu_011, mu_300)[source]¶
- geometricus.moment_utility.phi_12(mu_030, mu_021, mu_120, mu_101, mu_003, mu_200, mu_110, mu_201, mu_111, mu_102, mu_210, mu_020, mu_012, mu_002, mu_011, mu_300)[source]¶
- geometricus.moment_utility.phi_13(mu_030, mu_021, mu_120, mu_101, mu_003, mu_200, mu_110, mu_201, mu_111, mu_102, mu_210, mu_012, mu_020, mu_002, mu_011, mu_300)[source]¶
- geometricus.moment_utility.CI(mu_000, mu_200, mu_020, mu_002, mu_110, mu_101, mu_011, mu_111, mu_210, mu_201, mu_120, mu_021, mu_012, mu_102, mu_003, mu_030, mu_300, mu_013, mu_103, mu_130, mu_310, mu_301, mu_031, mu_112, mu_121, mu_211, mu_022, mu_202, mu_220, mu_400, mu_040, mu_004)[source]¶
- class geometricus.moment_utility.MomentType(value)[source]¶
Bases:
Enum
Different rotation invariant moments (order 2 and order 3)
Choose from [‘O_3’, ‘O_4’, ‘O_5’, ‘F’, ‘phi_2’, ‘phi_3’, ‘phi_4’, ‘phi_5’, ‘phi_6’, ‘phi_7’, ‘phi_8’, ‘phi_9’, ‘phi_10’, ‘phi_11’, ‘phi_12’, ‘phi_13’]
O_3, O_4, and O_5 are second order moments from [1] and F is a third order moment from [2]. These four moments are used in the original Geometricus manuscript [3].
phi_{2-13} are independent third order moments from [4].
CI is the chiral invariant moment from [5].
[1] Mamistvalov, Alexander G. “N-dimensional moment invariants and conceptual mathematical theory of recognition n-dimensional solids.” IEEE Transactions on pattern analysis and machine intelligence 20.8 (1998): 819-831.
[2] Flusser, Jan, Jirí Boldys, and Barbara Zitová. “Moment forms invariant to rotation and blur in arbitrary number of dimensions.” IEEE Transactions on Pattern Analysis and Machine Intelligence 25.2 (2003): 234-246.
[3] Durairaj, Janani, et al. “Geometricus represents protein structures as shape-mers derived from moment invariants.” Bioinformatics 36.Supplement_2 (2020): i718-i725.
[4] Flusser, Jan, Tomas Suk, and Barbara Zitová. 2D and 3D image analysis by moments. John Wiley & Sons, 2016.
[5] Hattne, Johan, and Victor S. Lamzin. “A moment invariant for evaluating the chirality of three-dimensional objects.” Journal of The Royal Society Interface 8.54 (2011): 144-151.
- O_3 = MomentInfo(moment_function=CPUDispatcher(<function O_3>), mu_arguments=[(2, 0, 0), (0, 2, 0), (0, 0, 2)])¶
- O_4 = MomentInfo(moment_function=CPUDispatcher(<function O_4>), mu_arguments=[(2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1), (0, 1, 1)])¶
- O_5 = MomentInfo(moment_function=CPUDispatcher(<function O_5>), mu_arguments=[(2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1), (0, 1, 1)])¶
- F = MomentInfo(moment_function=CPUDispatcher(<function F>), mu_arguments=[(2, 0, 1), (0, 2, 1), (2, 1, 0), (3, 0, 0), (1, 1, 1), (0, 1, 2), (0, 0, 3), (0, 3, 0), (1, 0, 2), (1, 2, 0)])¶
- phi_2 = MomentInfo(moment_function=CPUDispatcher(<function phi_2>), mu_arguments=[(0, 2, 0), (0, 1, 1), (1, 1, 0), (2, 0, 0), (0, 0, 2), (1, 0, 1)])¶
- phi_3 = MomentInfo(moment_function=CPUDispatcher(<function phi_3>), mu_arguments=[(0, 2, 0), (0, 1, 1), (1, 1, 0), (2, 0, 0), (0, 0, 2), (1, 0, 1)])¶
- phi_4 = MomentInfo(moment_function=CPUDispatcher(<function phi_4>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (0, 0, 3), (1, 1, 1), (2, 0, 1), (1, 0, 2), (2, 1, 0), (0, 1, 2), (3, 0, 0)])¶
- phi_5 = MomentInfo(moment_function=CPUDispatcher(<function phi_5>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (0, 0, 3), (2, 0, 1), (1, 0, 2), (2, 1, 0), (0, 1, 2), (3, 0, 0)])¶
- phi_6 = MomentInfo(moment_function=CPUDispatcher(<function phi_6>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (0, 0, 3), (1, 1, 1), (2, 0, 1), (1, 0, 2), (2, 1, 0), (0, 1, 2), (3, 0, 0)])¶
- phi_7 = MomentInfo(moment_function=CPUDispatcher(<function phi_7>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (0, 0, 3), (1, 1, 1), (2, 0, 1), (1, 0, 2), (2, 1, 0), (0, 1, 2), (3, 0, 0)])¶
- phi_8 = MomentInfo(moment_function=CPUDispatcher(<function phi_8>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (0, 0, 3), (1, 1, 1), (2, 0, 1), (1, 0, 2), (2, 1, 0), (0, 1, 2), (3, 0, 0)])¶
- phi_9 = MomentInfo(moment_function=CPUDispatcher(<function phi_9>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (1, 0, 1), (0, 0, 3), (2, 0, 0), (1, 1, 0), (2, 0, 1), (1, 1, 1), (1, 0, 2), (2, 1, 0), (0, 2, 0), (0, 1, 2), (0, 0, 2), (0, 1, 1), (3, 0, 0)])¶
- phi_10 = MomentInfo(moment_function=CPUDispatcher(<function phi_10>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (1, 0, 1), (0, 0, 3), (2, 0, 0), (1, 1, 0), (2, 0, 1), (1, 1, 1), (1, 0, 2), (2, 1, 0), (0, 2, 0), (0, 1, 2), (0, 0, 2), (0, 1, 1), (3, 0, 0)])¶
- phi_11 = MomentInfo(moment_function=CPUDispatcher(<function phi_11>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (1, 0, 1), (0, 0, 3), (2, 0, 0), (1, 1, 0), (2, 0, 1), (1, 0, 2), (2, 1, 0), (0, 1, 2), (0, 2, 0), (0, 0, 2), (0, 1, 1), (3, 0, 0)])¶
- phi_12 = MomentInfo(moment_function=CPUDispatcher(<function phi_12>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (1, 0, 1), (0, 0, 3), (2, 0, 0), (1, 1, 0), (2, 0, 1), (1, 1, 1), (1, 0, 2), (2, 1, 0), (0, 2, 0), (0, 1, 2), (0, 0, 2), (0, 1, 1), (3, 0, 0)])¶
- phi_13 = MomentInfo(moment_function=CPUDispatcher(<function phi_13>), mu_arguments=[(0, 3, 0), (0, 2, 1), (1, 2, 0), (1, 0, 1), (0, 0, 3), (2, 0, 0), (1, 1, 0), (2, 0, 1), (1, 1, 1), (1, 0, 2), (2, 1, 0), (0, 1, 2), (0, 2, 0), (0, 0, 2), (0, 1, 1), (3, 0, 0)])¶
- CI = MomentInfo(moment_function=CPUDispatcher(<function CI>), mu_arguments=[(0, 0, 0), (2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1), (2, 1, 0), (2, 0, 1), (1, 2, 0), (0, 2, 1), (0, 1, 2), (1, 0, 2), (0, 0, 3), (0, 3, 0), (3, 0, 0), (0, 1, 3), (1, 0, 3), (1, 3, 0), (3, 1, 0), (3, 0, 1), (0, 3, 1), (1, 1, 2), (1, 2, 1), (2, 1, 1), (0, 2, 2), (2, 0, 2), (2, 2, 0), (4, 0, 0), (0, 4, 0), (0, 0, 4)])¶
- geometricus.moment_utility.get_moments_from_coordinates(coordinates: ~numpy.ndarray, moment_types: ~typing.List[~geometricus.moment_utility.MomentType] = (<MomentType.O_3: MomentInfo(moment_function=CPUDispatcher(<function O_3>), mu_arguments=[(2, 0, 0), (0, 2, 0), (0, 0, 2)])>, <MomentType.O_4: MomentInfo(moment_function=CPUDispatcher(<function O_4>), mu_arguments=[(2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1), (0, 1, 1)])>, <MomentType.O_5: MomentInfo(moment_function=CPUDispatcher(<function O_5>), mu_arguments=[(2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0), (1, 0, 1), (0, 1, 1)])>, <MomentType.F: MomentInfo(moment_function=CPUDispatcher(<function F>), mu_arguments=[(2, 0, 1), (0, 2, 1), (2, 1, 0), (3, 0, 0), (1, 1, 1), (0, 1, 2), (0, 0, 3), (0, 3, 0), (1, 0, 2), (1, 2, 0)])>)) List[float] [source]¶
Gets rotation-invariant moments for a set of coordinates
- Parameters:
coordinates –
moment_types – Which moments to calculate Choose from [‘O_3’, ‘O_4’, ‘O_5’, ‘F’, ‘phi_2’, ‘phi_3’, ‘phi_4’, ‘phi_5’, ‘phi_6’, ‘phi_7’, ‘phi_8’, ‘phi_9’, ‘phi_10’, ‘phi_11’, ‘phi_12’, ‘phi_13’, ‘CI’]
- Return type:
list of moments
geometricus.protein_utility module¶
- geometricus.protein_utility.ProteinKey¶
A protein key is either its PDB ID (str) or a tuple of (PDB ID, chain)
alias of
Union
[str
,Tuple
[str
,str
]]
- class geometricus.protein_utility.Structure(name: Union[str, Tuple[str, str]], length: int, coordinates: ndarray)[source]¶
Bases:
object
Class to store basic protein structure information
- name: Union[str, Tuple[str, str]]¶
PDB ID or (PDB ID, chain)
- length: int¶
Number of residues
- coordinates: ndarray¶
Coordinates
- geometricus.protein_utility.parse_structure_file(input_value: Union[Path, Path, str, str])[source]¶
Parse a protein structure file (.pdb, .pdb.gz, .cif, .cif.gz) or PDBID or PDBID_Chain and returns a prody AtomGroup object
- Parameters:
input_value (filename or (filename, chain) or PDBID or PDBID_Chain or (PDBID, chain)) –
- Return type:
prody AtomGroup object
- geometricus.protein_utility.get_structure_files(input_value: Union[Path, str, List[str]]) List[Union[str, str, str]] [source]¶
- Get a list of structure files or PDB IDs from a string representing:
A list of structure files (.pdb, .pdb.gz, .cif, .cif.gz), A list of (structure_file, chain) A list of PDBIDs or PDBID_chain or (PDB ID, chain) A folder with input structure files, A file which lists structure filenames or “structure_filename, chain” on each line, A file which lists PDBIDs or PDBID_chain or PDBID, chain on each line
- Parameters:
input_value –
- Return type:
List of structure files or (structure_file, chain) or PDBIDs or (PDB ID, chain)
- geometricus.protein_utility.group_indices(input_list: List[int]) List[List[int]] [source]¶
e.g [1, 1, 1, 2, 2, 3, 3, 3, 4] -> [[0, 1, 2], [3, 4], [5, 6, 7], [8]]
- geometricus.protein_utility.get_alpha_indices(protein: AtomGroup) List[int] [source]¶
Get indices of alpha carbons of pd AtomGroup object
- geometricus.protein_utility.get_beta_indices(protein: AtomGroup) List[int] [source]¶
Get indices of beta carbons of pd AtomGroup object (If beta carbon doesn’t exist, alpha carbon index is returned)
- geometricus.protein_utility.get_sequences_from_fasta_yield(fasta_file: Union[str, Path], comments='#') tuple [source]¶
Returns (accession, sequence) iterator :param fasta_file: :param comments: ignore lines containing any of these strings
- Return type:
(accession, sequence)
- geometricus.protein_utility.get_sequences_from_fasta(fasta_file: Union[str, Path], comments='#') dict [source]¶
Returns dict of accession to sequence from fasta file :param fasta_file: :param comments: ignore lines containing any of these strings
- Returns:
{accession
- Return type:
sequence}
- geometricus.protein_utility.get_rmsd(coords_1: np.ndarray, coords_2: np.ndarray) float [source]¶
RMSD of paired coordinates = normalized square-root of sum of squares of euclidean distances
- geometricus.protein_utility.get_rotation_matrix(coords_1: np.ndarray, coords_2: np.ndarray)[source]¶
Superpose paired coordinates on each other using Kabsch superposition (SVD) Assumes centered coordinates
- Parameters:
coords_1 – numpy array of coordinate data for the first protein; shape = (n, 3)
coords_2 – numpy array of corresponding coordinate data for the second protein; shape = (n, 3)
- Return type:
rotation matrix for optimal superposition