sdf2fps¶
The “sdf2fps” command-line tool (also available as the “chemfp sdf2fps” subcommand) extracts the id and fingerprint from the title line and/or data items of each record in an SDF and outputs them in a fingerprint file format.
The chemfp.sdf2fps()
function implements similar functionality
in the Python API, as with this example.
The rest of this chapter contains the output from sdf2fps --help.
sdf2fps command-line options¶
The following comes from sdf2fps --help
:
Usage: sdf2fps [OPTIONS] [FILENAMES]...
Extract fingerprints from an SDF tag.
Options:
--id-tag TAG Get the record it from the tag TAG instead
of the first line of the record.
--fp-tag TEXT Get the fingerprint from tag TAG (required)
--in FORMAT The input format (one of "sdf", "sdf.gz", or
"sdf.zst")
--num-bits INT Use the first INT bits of the input. Use
only when the last 1-7 bits of the last byte
are not part of the fingerprint. Unexpected
errors will occur if these bits are not all
zero. [x>=1]
--errors [strict|report|ignore]
How should structure parse errors be
handled? (default=strict)
--software TEXT Use TEXT as the software description
--type TEXT Use TEXT as the fingerprint type description
--binary Encoded with the characters '0' and '1'. Bit
#0 comes first. Example: 00100000 encodes
the value 4
--binary-msb Encoded with the characters '0' and '1'. Bit
#0 comes last. Example: 00000100 encodes the
value 4
--hex Hex encoded. Bit #0 is the first bit (1<<0)
of the first byte. Example: 01f2 encodes the
value \x01\xf2 = 498
--hex-lsb Hex encoded. Bit #0 is the eigth bit (1<<7)
of the first byte. Example: 804f encodes the
value \x01\xf2 = 498
--hex-msb Hex encoded. Bit #0 is the first bit (1<<0)
of the last byte. Example: f201 encodes the
value \x01\xf2 = 498
--base64 Base-64 encoded. Bit #0 is first bit (1<<0)
of first byte. Example: AfI= encodes value
\x01\xf2 = 498
--cactvs CACTVS encoding, based on base64 and
includes a version and bit length
--daylight Daylight encoding, which is a base64 variant
--decoder DECODER Import and use the DECODER function to
decode the fingerprint
--pubchem decode CACTVS substructure keys used in
PubChem. Same as --software=CACTVS/unknown
--type 'CACTVS-E_SCREEN/1.0 extended=2'
--fp-tag=PUBCHEM_CACTVS_SUBSKEYS --cactvs
-o, --output FILENAME Save the fingerprints to FILENAME
(default=stdout)
--out FORMAT Output format, one of 'fps', 'fps.gz',
'fps.zst', 'fpb', or 'flush' (default
guesses from output filename, or is 'fps')
--include-metadata / --no-metadata
With --no-metadata, do not include the
header metadata for FPS output.
--no-date Do not include the 'date' metadata in the
output header
--date STR An ISO 8601 date (like
'2025-02-07T11:10:15') to use for the 'date'
metadata in the output header
--progress / --no-progress Show a progress bar (default: show unless
the output is a terminal)
--license-check Check the license and report results to
stdout.
--version Show the version and exit.
--license-file FILENAME Specify a chemfp license file
--traceback Print the traceback on KeyboardInterrupt
--version Show the version and exit.
--help Show this message and exit.
Examples:
1) Process the PubChem file Compound_016000001_016500000.sdf.gz to extract
the PubChem/CACTVS fingerprints, with the title as the id:
sdf2fps --pubchem Compound_016000001_016500000.sdf
2) Process stdin to extract a hex-encoded fingerprint in the "CIRCULAR" tag
and get the id from from the "SMILES" tag. Save the results to
"circular.fpb":
sdf2fps --hex --id-tag CIRCULAR --fp-tag SMILES -o circular.fpbe