chemfp.fps_search module¶
FPS file similarity search and search result implementations.
Chemfp implements similarity search methods which work directly on FPS files. This might be useful in a streaming environment (where the FPS data is generated on-the-fly and not saved), and where you have at most a handful of queries. In that case, an FPS search is faster than an arena-based search because the FPS parsing overhead is about the same, but the FPS search have the arena creation or memory overhead an in-memory search would have.
- class chemfp.fps_search.FPSSearchResult(ids, scores, query_id=None)¶
Bases:
object
Search results for a query fingerprint against a target FPS reader.
The results contains a list of hits. Hits contain a target id and score. The hits can be reordered based on id or score.
- __getitem__(item)¶
Return the (id, score) pair for the given index, or pairs if item is a slice
- __iter__()¶
Iterate through the pairs of (target id, score) using the current ordering
- __len__()¶
Return the number of hits.
- get_ids()¶
The list of target identifiers in the current ordering.
This returns the same list each time.
- get_ids_and_scores()¶
The list of (target identifier, target score) pairs, in the current ordering
- get_scores()¶
The list of target scores, in the current ordering.
This returns the same list each time.
- query_id¶
The id of the query fingerprint, if available, otherwise None.
- reorder(order='decreasing-score')¶
Reorder the hits based on the requested ordering.
- The available orderings are:
increasing-score - sort by increasing score
decreasing-score - sort by decreasing score
increasing-score-plus - sort by increasing score, break ties by increasing index
decreasing-score-plus - sort by decreasing score, break ties by increasing index
increasing-id - sort by increasing target id
decreasing-id - sort by decreasing target id
move-closest-first - move the hit with the highest score to the first position
reverse - reverse the current ordering
- scores¶
The similarity scores for the hits.
- to_pandas(*, columns=['target_id', 'score'])¶
Return a pandas DataFrame with the target ids and scores
The first column contains the ids, the second column contains the ids. The default columns headers are “target_id” and “score”. Use columns to specify different headers.
- Parameters:
columns (a list of two strings) – column names for the returned DataFrame
- Returns:
a pandas DataFrame
- class chemfp.fps_search.FPSSearchResults(query_ids, results)¶
Bases:
object
Search results for a query arena against a target FPS reader.
- __getitem__(i)¶
Return a
SearchResult
by index
- __iter__()¶
Iterate through the search results
- __len__()¶
The number of search results in this collection
- iter_ids()¶
For each search result, yield the list of target identifiers
- iter_ids_and_scores()¶
For each search result, yield the list of target (id, score) tuples
- iter_scores()¶
For each search result, yield the list of target scores
- query_ids¶
A list of query ids, one for each result. This comes from the query arena’s ids.
- reorder_all(order='decreasing-score')¶
Reorder the hits for all of the rows based on the requested order.
The available orderings are:
increasing-score - sort by increasing score
decreasing-score - sort by decreasing score
increasing-id - sort by increasing target id
decreasing-id - sort by decreasing target id
move-closest-first - move the hit with the highest score to the first position
reverse - reverse the current ordering
- to_pandas(*, columns=['query_id', 'target_id', 'score'], empty=('*', None))¶
Return a pandas DataFrame with query_id, target_id and score columns.
Each query has zero or more hits. Each hit becomes a row in the output table, with the query id in the first column, the hit target id in the second, and the hit score in the third.
If a query has no hits then by default a row is added with the query id, ‘*’ as the target id, and None as the score (which pandas will treat as a NA value).
Use empty to specify different behavior for queries with no hits. If empty is None then no row is added to the table. If empty is a 2-element tuple the first element is used as the target id and the second is used as the score.
- Parameters:
columns (a list of three strings) – column names for the returned DataFrame
empty (a list of two strings, or None) – the target id and score used for queries with no hits, or None to not include a row for that case
- Returns:
a pandas DataFrame
- chemfp.fps_search.count_tanimoto_hits_fp(query_fp, target_reader, threshold=0.7)¶
Count the number of hits in target_reader at least threshold similar to the query_fp
This uses Tanimoto similarity.
- chemfp.fps_search.count_tanimoto_hits_arena(query_arena, target_reader, threshold=0.7)¶
For each fingerprint in query_arena, count the number of hits in target_reader at least threshold similar to it
This uses Tanimoto similarity.
- chemfp.fps_search.threshold_tanimoto_search_fp(query_fp, target_reader, threshold=0.7)¶
Find matches in the target reader which are at least threshold similar to the query fingerprint
- Returns:
an
FPSSearchResult
instance contain the result.
- chemfp.fps_search.threshold_tanimoto_search_arena(query_arena, target_reader, threshold)¶
Find matches in the target reader which are at least threshold similar to the query arena fingerprints
- Returns:
an
FPSSearchResults
instance containing a list of query results.
- chemfp.fps_search.knearest_tanimoto_search_fp(query_fp, target_reader, k=3, threshold=0.0)¶
Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint
This uses Tanimoto similarity.
- Returns:
an
FPSSearchResult
instance contain the result.
- chemfp.fps_search.knearest_tanimoto_search_arena(query_arena, target_reader, k=3, threshold=0.0)¶
Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints
This uses Tanimoto similarity.
- Returns:
an
FPSSearchResults
instance containing a list of query results.
- chemfp.fps_search.count_tversky_hits_fp(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶
Count the number of hits in target_reader at least threshold similar to the query_fp
This uses Tversky similarity with the specified values of alpha and beta.
- chemfp.fps_search.count_tversky_hits_arena(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶
Count the number of hits in target_reader at least threshold similar to the query_fp
This uses Tversky similarity with the specified values of alpha and beta.
- chemfp.fps_search.threshold_tversky_search_fp(query_fp, target_reader, threshold, alpha=1.0, beta=1.0)¶
Find matches in the target reader which are at least threshold similar to the query fingerprint
This uses Tversky similarity with the specified values of alpha and beta.
- Returns:
an
FPSSearchResult
instance contain the result.
- chemfp.fps_search.threshold_tversky_search_arena(query_arena, target_reader, threshold, alpha=1.0, beta=1.0)¶
Find matches in the target reader which are at least threshold similar to the query arena fingerprints
This uses Tversky similarity with the specified values of alpha and beta.
- Returns:
an
FPSSearchResults
instance containing a list of query results.
- chemfp.fps_search.knearest_tversky_search_fp(query_fp, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶
Find the nearest k matches in the target reader which are at least threshold similar to the query fingerprint
This uses Tversky similarity with the specified values of alpha and beta.
- Returns:
an
FPSSearchResult
instance contain the result.
- chemfp.fps_search.knearest_tversky_search_arena(query_arena, target_reader, k=3, threshold=0.0, alpha=1.0, beta=1.0)¶
Find the nearest k matches in the target reader which are at least threshold similar to the query arena fingerprints
This uses Tversky similarity with the specified values of alpha and beta.
- Returns:
an
FPSSearchResults
instance containing a list of query results.