chemfp.highlevel.arena_tools module¶

This module should not be imported directly.

It contains internal implementation details of the high-level API available from the top-level chemfp module.

This module is included in the documentation because parts of this module are returned to the user, and are part of the public API.

class chemfp.highlevel.arena_tools.SimHistogramResult(processor, *, times: dict, queries_close=None, targets_close=None)¶

Bases: object

The result from simhistogram() histogram generation.

This objects acts like a two-element tuple which matches the NumPy histogram API:

>>> import chemfp
>>> a = chemfp.load_fingerprints("chembl_35.fpb")
>>> bins, edges = chemfp.simhistogram(targets=a)
>>> 
>>> import matplotlib.pyplot as plt
>>> plt.stairs(bins, edges)
>>> plt.show()

This object is a context manager which closes any files open when the queries or targets was a filename.

The public attributes are:

NxN: bool¶: True for a symmetric histogram, otherwise False.

closed: bool¶: True if close() has been called, otherwise False.

edges: array.array("d")¶: The edge locations for the bins. If there are B bins then there are B+1 edges, with values [0.0, 1/B, 2/B, .. 1.0].

bins: array.array("Q")¶: The bin counts.

num_identical: int¶: The number of evaluated pairs with a score of 1.0.

num_processed: int¶: The number of elements processed. It will be num_samples for a sample histogram and total_size for a full histogram.

num_samples: int¶: The number of samples for a sample histogram, or 0.

sampled: bool¶: True if this is a sample histogram, otherwise False.

seed: int¶: The seed used for a sample histogram, otherwise 0. If the simhistogram seed was -1 then this attribute contains the value from Python’s random.randrange(2**32)

times: dict[str, float | None]¶

A dictionary with elapsed times for different parts of the histogram generation, mapping string labels to elapsed time in seconds, or None if not relevant. The labels are:

load_queries - the time to load the queries

load_targets - the time to load the targets

init - the time to initialize the underlying processor

process - the time to compute the histogram counts

total - the total elapsed time

total_size: int¶: The total number of possible pairs. For symmetric search this is the size of the upper triangle (without the diagonal), which is N*(N-1)/2. For NxM search this is the N*M, that is the product of the query and target sizes.

close() → None¶

Close any files which may be open and set the processor to None

If queries or targets is a memory-mapped FPB file then the respective arena keeps an open file handle so fingerprint and identifier lookups continue to work.

Call this close() to close them explicitly, or use this object as a context manager to close them when exiting the context.

The close() method also sets the processor to None because its queries and targets arena may refer to those open files.

The close() method may be called multiple times.

get_description(include_times: bool = True) → str¶

Return a human-readable description of the histogram generation.

Parameters:: include_times (bool) – if True, (the default), include the histogram generation time and the full time.
Returns:: str

get_times_description() → str¶: Return string containing a human-readable description of the timing details.

property queries¶

The query arena (if present)

Returns None if the SimHistogramResult is closed.

property targets¶

The target arena

This is also the arena used in NxN generation.

Returns None if the SimHistogramResult is closed.