chemfp.highlevel.arena_tools module¶
This module should not be imported directly.
It contains internal implementation details of the high-level API available from the top-level chemfp module.
This module is included in the documentation because parts of this module are returned to the user, and are part of the public API.
- class chemfp.highlevel.arena_tools.SimHistogramResult(processor, *, times: dict, queries_close=None, targets_close=None)¶
Bases:
object
The result from
simhistogram()
histogram generation.This objects acts like a two-element tuple which matches the NumPy histogram API:
>>> import chemfp >>> a = chemfp.load_fingerprints("chembl_35.fpb") >>> bins, edges = chemfp.simhistogram(targets=a) >>> >>> import matplotlib.pyplot as plt >>> plt.stairs(bins, edges) >>> plt.show()
This object is a context manager which closes any files open when the queries or targets was a filename.
The public attributes are:
- NxN: bool¶
True for a symmetric histogram, otherwise False.
- closed: bool¶
True if close() has been called, otherwise False.
- edges: array.array("d")¶
The edge locations for the bins. If there are B bins then there are B+1 edges, with values [0.0, 1/B, 2/B, .. 1.0].
- bins: array.array("Q")¶
The bin counts.
- num_identical: int¶
The number of evaluated pairs with a score of 1.0.
- num_processed: int¶
The number of elements processed. It will be
num_samples
for a sample histogram andtotal_size
for a full histogram.
- num_samples: int¶
The number of samples for a sample histogram, or 0.
- sampled: bool¶
True if this is a sample histogram, otherwise False.
- seed: int¶
The seed used for a sample histogram, otherwise 0. If the simhistogram seed was -1 then this attribute contains the value from Python’s random.randrange(2**32)
- times: dict[str, float | None]¶
A dictionary with elapsed times for different parts of the histogram generation, mapping string labels to elapsed time in seconds, or None if not relevant. The labels are:
load_queries - the time to load the queries
load_targets - the time to load the targets
init - the time to initialize the underlying processor
process - the time to compute the histogram counts
total - the total elapsed time
- total_size: int¶
The total number of possible pairs. For symmetric search this is the size of the upper triangle (without the diagonal), which is N*(N-1)/2. For NxM search this is the N*M, that is the product of the query and target sizes.
- close() None ¶
Close any files which may be open and set the processor to None
If queries or targets is a memory-mapped FPB file then the respective arena keeps an open file handle so fingerprint and identifier lookups continue to work.
Call this close() to close them explicitly, or use this object as a context manager to close them when exiting the context.
The close() method also sets the processor to None because its queries and targets arena may refer to those open files.
The close() method may be called multiple times.
- get_description(include_times: bool = True) str ¶
Return a human-readable description of the histogram generation.
- Parameters:
include_times (bool) – if True, (the default), include the histogram generation time and the full time.
- Returns:
str
- get_times_description() str ¶
Return string containing a human-readable description of the timing details.
- property queries¶
The query arena (if present)
Returns None if the SimHistogramResult is closed.
- property targets¶
The target arena
This is also the arena used in NxN generation.
Returns None if the SimHistogramResult is closed.