chemfp.bitops module¶
Low-level fingerprint functions and global configuration.
The bitops
module contains functions that work on byte-encoded
and/or hex-encoded fingerprints, such as to compute the Tanimoto
between two byte fingerprints, or to count the number of bits in a
fingerprint.
It also contains functions to change the internal configuration of how chemfp does it’s bit-wise operations, and to report that configuration. Currently only two of those are part of the public API.
- chemfp.bitops.byte_contains(sub_fp, super_fp)¶
Return 1 if the on bits of sub_fp are also 1 bits in super_fp, that is, if super_fp contains sub_fp.
- chemfp.bitops.byte_contains_bit(fp, bit_index)¶
Return True if the the given bit position is on, otherwise False
- chemfp.bitops.byte_from_bitlist(fp[, num_bits=1024])¶
Convert a list of bit positions into a byte fingerprint, including modulo folding
- chemfp.bitops.byte_hex_tanimoto(fp1, fp2)¶
Compute the Tanimoto similarity between the byte fingerprint fp1 and the hex fingerprint fp2. Return a float between 0.0 and 1.0, or raise a ValueError if fp2 is not a hex fingerprint
- chemfp.bitops.byte_hex_tversky(fp1, fp2, alpha=1.0, beta=1.0)¶
Compute the Tversky index between the byte fingerprint fp1 and the hex fingerprint fp2. Return a float between 0.0 and 1.0, or raise a ValueError if fp2 is not a hex fingerprint
- chemfp.bitops.byte_intersect(fp1, fp2)¶
Return the intersection of the two byte strings, fp1 & fp2
- chemfp.bitops.byte_intersect_popcount(fp1, fp2)¶
Return the number of bits set in the intersection of the two byte fingerprints fp1 and fp2
- chemfp.bitops.byte_popcount(fp)¶
Return the number of bits set in the byte fingerprint fp
- chemfp.bitops.byte_tanimoto(fp1, fp2)¶
Compute the Tanimoto similarity between the two byte fingerprints fp1 and fp2
- chemfp.bitops.byte_to_bitlist(bitlist)¶
Return a sorted list of the on-bit positions in the byte fingerprint
- chemfp.bitops.byte_tversky(fp1, fp2, alpha=1.0, beta=1.0)¶
Compute the Tversky index between the two byte fingerprints fp1 and fp2
- chemfp.bitops.byte_union(fp1, fp2)¶
Return the union of the two byte strings, fp1 | fp2
- chemfp.bitops.byte_union_popcount(fp1, fp2)¶
Return the number of bits set in the union of the two byte fingerprints fp1 and fp2
- chemfp.bitops.byte_xor(fp1, fp2)¶
Return the xor (absolute difference) between the two byte strings, fp1 ^ fp2
- chemfp.bitops.byte_xor_popcount(fp1, fp2)¶
Return the number of bits set in the xor of the two byte fingerprints fp1 and fp2 (also called the Hamming, Manhattan, or taxicab distance)
- chemfp.bitops.get_tanimoto_precision(num_bits: int) int ¶
Return the minimum precision needed to distinguish Tanimoto all values with num_bit bits
Given two Tanimoto values from fingerprints of length num_bits, stored as a Python 64-bit float, how many decimal digits are needed to ensure they are distinct?
For example, for 2048-bit fingerprints you need at least 7 digits:
>>> "'%.6f' and '%.6f'" % (1/1023, 1/1022) "'0.000978' and '0.000978'" >>> "'%.7f' and '%.7f'" % (1/1023, 1/1022) "'0.0009775' and '0.0009785'"
This function returns the minumum number of required decimial digits, given 1 <= num_bits <= 2**18. For example:
>>> bitops.get_tanimoto_precision(2048) 7
This might be used as (‘%.7f’ % score) or f’{score:.7f}’.
- Parameters:
num_bits (integer between 1 and 2**18) – The number of bits in the fingerprint
- Returns:
the precision, as an integer
- chemfp.bitops.hex_contains(sub_fp, super_fp)¶
Return 1 if the on bits of sub_fp are also on bits in super_fp, otherwise 0. Return -1 if either string is not a hex fingerprint
- chemfp.bitops.hex_contains_bit(fp, bit_index)¶
Return True if the the given bit position is on, otherwise False.
This function does not validate that the hex fingerprint is actually in hex.
- chemfp.bitops.hex_decode(s)¶
Decode the hex-encoded value to a byte string
- chemfp.bitops.hex_encode(s)¶
Encode the byte string or ASCII string to hex. Returns a text string.
- chemfp.bitops.hex_encode_as_bytes(s)¶
Encode the byte string or ASCII string to hex. Returns a byte string.
- chemfp.bitops.hex_from_bitlist(fp[, num_bits=1024])¶
Convert a list of bit positions into a hex fingerprint, including modulo folding
- chemfp.bitops.hex_intersect(fp1, fp2)¶
Return the intersection of the two hex strings, fp1 & fp2. Raises a ValueError for non-hex fingerprints.
- chemfp.bitops.hex_intersect_popcount(fp1, fp2)¶
Return the number of bits set in the intersection of the two hex fingerprints fp1 and fp2, or raise a ValueError if either string is a non-hex string
- chemfp.bitops.hex_isvalid(s)¶
Return 1 if the string s is a valid hex fingerprint, otherwise 0
- chemfp.bitops.hex_popcount(fp)¶
Return the number of bits set in a hex fingerprint fp, or -1 for non-hex strings
- chemfp.bitops.hex_tanimoto(fp1, fp2)¶
Compute the Tanimoto similarity between two hex fingerprints. Return a float between 0.0 and 1.0, or raise a ValueError if either string is not a hex fingerprint
- chemfp.bitops.hex_to_bitlist(bitlist)¶
Return a sorted list of the on-bit positions in the hex fingerprint
- chemfp.bitops.hex_tversky(fp1, fp2, alpha=1.0, beta=1.0)¶
Compute the Tversky index between two hex fingerprints. Return a float between 0.0 and 1.0, or raise a ValueError if either string is not a hex fingerprint
- chemfp.bitops.hex_union(fp1, fp2)¶
Return the union of the two hex strings, fp1 | fp2. Raises a ValueError for non-hex fingerprints.
- chemfp.bitops.hex_xor(fp1, fp2)¶
Return the xor (absolute difference) between the two hex strings, fp1 ^ fp2. Raises a ValueError for non-hex fingerprints.
- chemfp.bitops.print_report(out=sys.stdout) None ¶
Print the configuration report to the given file (default: stout)
- chemfp.bitops.use_environment_variables(environ=None, outfile=sys.stderr) None ¶
Set the chemfp configuration using environment variables or a dictionary
By default, process os.environ to find chemfp environment variables (which all start with “CHEMFP_”) and use them to configure chemfp internals.
This is meant to be used by any program which wants the same configuration system as the core chemfp components.