chemfp.fps_io module

I/O routines for the FPS format.

This is an internal chemfp module. It should not be imported by programs which use the public API. (Let me know if anything else should be part of the public API.)

This module contains class definitions for a few objects which are returned as part of the public API. The chemfp.open() function returns a FPSReader which reads from an FPS file, and the open_fingerprint_writer() function returns an FPSWriter to write to an FPS file.

class chemfp.fps_io.FPSReader(infile, close, metadata, location, block_reader)

Bases: FingerprintReader

FPS file reader

This class implements the chemfp.FingerprintReader API. It is also its own a context manager, which automatically closes the file when the manager exits.

The public attributes are:

metadata: Metadata

A chemfp.Metadata instance with information about the fingerprint type.

location: Location

A chemfp.io.Location instance with parser location and state information.

The FPSReader.location only tracks the “lineno” property and (if possible) the “position”, “end_position”, and “position_units” properties.

closed: bool

True if the file is open, else False

close()

Close the file

count_tanimoto_hits_arena(queries: _typing.FingerprintArena, threshold: float = 0.7)

Count the fingerprints which are sufficiently similar to each query fingerprint

Returns a list containing a count for each query fingerprint in the queries arena. The count is the number of fingerprints in the reader which are at least threshold similar to the query fingerprint.

The order of results is the same as the order of the queries.

Parameters:
  • queries (a FingerprintArena) – query fingerprints

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)

Returns:

list of integer counts, one for each query

count_tanimoto_hits_fp(query_fp: bytes, threshold: float = 0.7)

Count the fingerprints which are sufficiently similar to the query fingerprint

Return the number of fingerprints in the reader which are at least threshold similar to the query fingerprint query_fp.

Parameters:
  • query_fp (byte string) – query fingerprint

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)

Returns:

integer count

count_tversky_hits_fp(query_fp: bytes, threshold: float = 0.7, alpha: float = 1.0, beta: float = 1.0)

Count the fingerprints which are sufficiently similar to the query fingerprint

Return the number of fingerprints in the reader which are at least threshold similar to the query fingerprint query_fp.

Parameters:
  • query_fp (byte string) – query fingerprint

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)

Returns:

integer count

iter_blocks()

This is not part of the public API

iter_rows()

This is not part of the public API

knearest_tanimoto_search_arena(queries: _typing.FingerprintArena, k: int = 3, threshold: float = 0.0)

Find the k-nearest fingerprints which are sufficiently similar to each of the query fingerprints

For each fingerprint in the queries arena, find the fingerprints in this reader which are at least threshold similar to the query fingerprint, and of those, select the top k hits. The hits are returned as a SearchResults, where the hits in each SearchResult are sorted by similarity score.

Parameters:
  • queries (a FingerprintArena) – query fingerprints

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.0)

Returns:

a SearchResults

knearest_tanimoto_search_fp(query_fp: bytes, k: int = 3, threshold: float = 0.0)

Find the k-nearest fingerprints which are sufficiently similar to the query fingerprint

Find all of the fingerprints in this reader which are at least threshold similar to the query fingerprint, and of those, select the top k hits. The hits are returned as a SearchResult, sorted from highest score to lowest.

Parameters:
  • queries (a FingerprintArena) – query fingerprints

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.0)

Returns:

a SearchResult

knearest_tversky_search_fp(query_fp: bytes, k: int = 3, threshold: float = 0.0, alpha: float = 1.0, beta: float = 1.0)

Find the k-nearest fingerprints which are sufficiently similar to the query fingerprint

Find all of the fingerprints in this reader which are at least threshold similar to the query fingerprint, and of those, select the top k hits. The hits are returned as a SearchResult, sorted from highest score to lowest.

Parameters:
  • queries (a FingerprintArena) – query fingerprints

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.0)

Returns:

a SearchResult

next()

Return the next (id, fp) pair

threshold_tanimoto_search_arena(queries: _typing.FingerprintArena, threshold: float = 0.7)

Find the fingerprints which are sufficiently similar to each of the query fingerprints

For each fingerprint in the queries arena, find all of the fingerprints in this arena which are at least threshold similar. The hits are returned as a SearchResults, where the hits in each SearchResult is in arbitrary order.

Parameters:
  • queries (a FingerprintArena) – query fingerprints

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)

Returns:

a SearchResults

threshold_tanimoto_search_fp(query_fp: bytes, threshold: float = 0.7)

Find the fingerprints which are sufficiently similar to the query fingerprint

Find all of the fingerprints in this reader which are at least threshold similar to the query fingerprint query_fp. The hits are returned as a SearchResult, in arbitrary order.

Parameters:
  • query_fp (byte string) – query fingerprint

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)

Returns:

a SearchResult

threshold_tversky_search_fp(query_fp: bytes, threshold: float = 0.7, alpha: float = 1.0, beta: float = 1.0)

Find the fingerprints which are sufficiently similar to the query fingerprint

Find all of the fingerprints in this reader which are at least threshold similar to the query fingerprint query_fp. The hits are returned as a SearchResult, in arbitrary order.

Parameters:
  • query_fp (byte string) – query fingerprint

  • threshold (float between 0.0 and 1.0, inclusive) – minimum similarity threshold (default: 0.7)

Returns:

a SearchResult

class chemfp.fps_io.FPSWriter(output, writer, metadata, location=None)

Bases: FingerprintWriter

Write fingerprints in FPS format.

This is a subclass of chemfp.FingerprintWriter.

An FPSWriter is its own context manager, and will close the output file on context exit.

The public attributes are:

metadata: Metadata

A chemfp.Metadata instance describing the fingerprints being written.

format: str

The string ‘fps’.

closed: bool

False when the file is open, else True.

location: Location

A chemfp.io.Location instance which supports the “recno”, “output_recno”, and “lineno” properties.

close()

Close the writer

This will set self.closed to False.

write_fingerprint(id: str, fp: bytes)

Write a single fingerprint record with the given id and fp

Parameters:
  • id (string) – the record identifier

  • fp (bytes) – the fingerprint

write_fingerprints(id_fp_pairs: Iterable[Tuple[str, bytes]])

Write a sequence of fingerprint records

Parameters:

id_fp_pairs – An iterable of (id, fingerprint) pairs.