chemfp.highlevel.arena_tools module

This module should not be imported directly.

It contains internal implementation details of the high-level API available from the top-level chemfp module.

This module is included in the documentation because parts of this module are returned to the user, and are part of the public API.

class chemfp.highlevel.arena_tools.SimHistogramResult(processor, *, times: dict, queries_close=None, targets_close=None)

Bases: object

The result from simhistogram() histogram generation.

This objects acts like a two-element tuple which matches the NumPy histogram API:

>>> import chemfp
>>> a = chemfp.load_fingerprints("chembl_35.fpb")
>>> bins, edges = chemfp.simhistogram(targets=a)
>>> 
>>> import matplotlib.pyplot as plt
>>> plt.stairs(bins, edges)
>>> plt.show()

This object is a context manager which closes any files open when the queries or targets was a filename.

The public attributes are:

NxN: bool

True for a symmetric histogram, otherwise False.

closed: bool

True if close() has been called, otherwise False.

edges: array.array("d")

The edge locations for the bins. If there are B bins then there are B+1 edges, with values [0.0, 1/B, 2/B, .. 1.0].

bins: array.array("Q")

The bin counts.

num_identical: int

The number of evaluated pairs with a score of 1.0.

num_processed: int

The number of elements processed. It will be num_samples for a sample histogram and total_size for a full histogram.

num_samples: int

The number of samples for a sample histogram, or 0.

sampled: bool

True if this is a sample histogram, otherwise False.

seed: int

The seed used for a sample histogram, otherwise 0. If the simhistogram seed was -1 then this attribute contains the value from Python’s random.randrange(2**32)

times: dict[str, float | None]

A dictionary with elapsed times for different parts of the histogram generation, mapping string labels to elapsed time in seconds, or None if not relevant. The labels are:

  • load_queries - the time to load the queries

  • load_targets - the time to load the targets

  • init - the time to initialize the underlying processor

  • process - the time to compute the histogram counts

  • total - the total elapsed time

total_size: int

The total number of possible pairs. For symmetric search this is the size of the upper triangle (without the diagonal), which is N*(N-1)/2. For NxM search this is the N*M, that is the product of the query and target sizes.

close() None

Close any files which may be open and set the processor to None

If queries or targets is a memory-mapped FPB file then the respective arena keeps an open file handle so fingerprint and identifier lookups continue to work.

Call this close() to close them explicitly, or use this object as a context manager to close them when exiting the context.

The close() method also sets the processor to None because its queries and targets arena may refer to those open files.

The close() method may be called multiple times.

get_description(include_times: bool = True) str

Return a human-readable description of the histogram generation.

Parameters:

include_times (bool) – if True, (the default), include the histogram generation time and the full time.

Returns:

str

get_times_description() str

Return string containing a human-readable description of the timing details.

property queries

The query arena (if present)

Returns None if the SimHistogramResult is closed.

property targets

The target arena

This is also the arena used in NxN generation.

Returns None if the SimHistogramResult is closed.