ob2fps¶
The “ob2fps” command (also available as the “chemfp ob2fps” subcommand) uses the Open Babel toolkit to generate Open Babel fingerprints from structure files.
This functionality is also available from Python using the high-level
chemfp.ob2fps()
function, following chemfp’s “*2fps” API.
The rest of this chapter contains the output from ob2fps --help and ob2fps --help-formats.
ob2fps command-line options¶
The following comes from ob2fps --help
:
Usage: ob2fps [OPTIONS] [FILENAMES]...
Generate fingerprints from a structure file using Open Babel.
If specified, process the filenames, otherwise read from stdin.
Fingerprint types:
--FP2 Linear fragments up to 7 atoms (default)
--FP3 SMARTS patterns specified in the file
patterns.txt
--FP4 SMARTS patterns specified in the file
SMARTS_InteLigand.txt
--MACCS, --maccs, --maccs166 Open Babel's implementation of the MACCS 166
keys
--ECFP0 ECFP (circular) fingerprints with diameter 0
--ECFP2 ECFP (circular) fingerprints with diameter 2
--ECFP4 ECFP (circular) fingerprints with diameter 4
--ECFP6 ECFP (circular) fingerprints with diameter 6
--ECFP8 ECFP (circular) fingerprints with diameter 8
--ECFP10 ECFP (circular) fingerprints with diameter 10
--substruct chemfp's PubChem-like substructure
fingerprints
--rdmaccs, --rdmaccs/2 chemfp's MACCS fingerprints, version 2.
--rdmaccs/1 chemfp's MACCS fingerprints, version 1
--type TYPE_STR Specify a chemfp type string
--using FILENAME Get the fingerprint type from the metadata of
a fingerprint file
Fingerprint options:
--nBits INT number of bits in the fingerprint (default=4096) [ECFP]
Options:
--id-tag TAG Tag name containing the record id (SD files
only)
--in FORMAT Input structure format (default guesses from
filename)
-o, --output FILENAME Save the fingerprints to FILENAME
(default=stdout)
--out FORMAT Output structure format (default guesses
from output filename, or is 'fps')
--include-metadata / --no-metadata
With --no-metadata, do not include the
header metadata for FPS output.
--no-date Do not include the 'date' metadata in the
output header
--date STR An ISO 8601 date (like
'2025-02-07T11:10:15') to use for the 'date'
metadata in the output header
--delimiter VALUE Delimiter style for SMILES and InChI files.
Forces '-R delimiter=VALUE'.
--has-header Skip the first line of a SMILES or InChI
file. Forces '-R has_header=1'.
-R NAME=VALUE Specify a reader argument
--cxsmiles / --no-cxsmiles Use --no-cxsmiles to disable the default
support for CXSMILES extensions. Forces '-R
cxsmiles=1' or '-R cxsmiles=0'.
--errors [strict|report|ignore]
How should structure parse errors be
handled? (default=ignore)
--progress / --no-progress Show a progress bar (default: show unless
the output is a terminal)
--help-formats List the available formats and reader
arguments
--version Show the version and exit.
--license-check Check the license and report results to
stdout.
--help Show this message and exit.
By default the Open Babel structure reader determines the file format and
compression type based on the filename extension. Unknown filename
extensions are treated as a uncompressed SMILES files.
If the data comes from stdin, or the guess based on extension name is wrong,
then use "--in FORMAT" option to change the default input format. For
examples:
--in smi --in sdf.gz
Use `-R` to specify format-specific reader arguments.
Use `--help-formats` for a list of available formats and reader arguments.
Supported ob2fps formats¶
The following comes from ob2fps --help-formats
:
These are the structure file formats that chemfp can read when using the Open
Babel toolkit.
chemfp has special support for the SMILES, InChI, and SDF formats when using
the Open Babel toolkit.
For these formats, by default, chemfp uses the filename extension to determine
the format type. If the filename ends with ".gz" or ".zst" then it is
intepreted as a gzip or Zstandard compressed file, and the second-to-last
extension is used to determine the format type. Unknown or unsupported
extensions are then tested against Open Babel format names (see below), and if
still unknown, interpreted as a SMILES file.
You will need to use "-R implementation=chemfp" to enable zst support for the
SDF format.
You may instead specify the file format by name (see below), which is
especially important when reading from stdin, which has no associated filename
extension.
These specially supported filename extensions are:
File Type Extension(s)
========== =============
SMILES can, ism, isosmi, smi, usm
SDF sdf
InChI inchi
The format can also be specified by name using the '--in' option:
File Type Format name (append .gz or .zst if compressed)
========== ===========
SMILES smi, can, usm
SDF sdf
InChI inchi
The input format parsers can be configured with the "-R" option. For examples,
the following reader arguments tell the SMILES readers that the fields are
whitespace delimited and the first line is a header.
-R delimiter=whitespace -R has_header=true
All of the readers support the 'options' reader argument, which is a string
passed directly to OBConversion(). This is a compact way to encode all of the
Open Babel parameters used in the conversion. For example, 'ab"text"', would
set option 'a' to True, and option 'b' to the string "text".
The SMILES format parsers use three additional reader arguments:
* 'delimiter' specifies the delimiter type. The default is 'to-eol'.
The other values are 'tab', 'whitespace', 'space' and 'native'.
Use "-R delimiter=native" to match Open Babel's native delimiter
style, which is 'to-eol'.
* 'has_header', if false will skip the first line of the SMILES
file (because it is a header line).
* 'cxsmiles' describes how to handle CXSMILES extensions. Open
Babel does not handle CXSMILES. The default (true) will remove
the extension before processing. If false any extension will
be treated as part of the identifier.
The SDF format parser supports one additional reader argument:
* 'implementation': if "openbabel" or "native", use Open Babel's
native SDF parser. If "chemfp" use chemfp's own implementation
to find SDF records, which are then passed to Open Babel for
parsing. This gives more fine-grained error reporting, and
supports zst compression, and with similar performance.
(Note: Open Babel supports additional options.)
The InChI format parser supports one additional reader argument:
* 'delimiter' works the same as it does for the SMILES formats
In addition, you may specify an Open Babel formats, either by one of the
following format names, or by reading a filename ending with one of the format
names, optionally with a .gz suffix. Zstandard compression is not supported by
the native Open Babel reader.
Format Description and options
========= ==========================
CONFIG DL-POLY CONFIG
CONTCAR VASP format
s Output single bonds only
b Disable bonding entirely
CONTFF MDFF format
HISTORY DL-POLY HISTORY
MDFF MDFF format
POSCAR VASP format
s Output single bonds only
b Disable bonding entirely
POSFF MDFF format
VASP VASP format
s Output single bonds only
b Disable bonding entirely
abinit ABINIT Output Format
s Output single bonds only
b Disable bonding entirely
acesout ACES output format
s Output single bonds only
b Disable bonding entirely
acr ACR format
adfband ADF Band output format
adfdftb ADF DFTB output format
adfout ADF output format
s Output single bonds only
b Disable bonding entirely
alc Alchemy format
aoforce Turbomole AOFORCE output format
arc Accelrys/MSI Biosym/Insight II CAR format
s Output single bonds only
b Disable bonding entirely
axsf XCrySDen Structure Format
s Output single bonds only
b Disable bonding entirely
bgf MSI BGF format
box Dock 3.5 Box format
bs Ball and Stick format
c09out Crystal 09 output format
s Consider single bonds only
c3d1 Chem3D Cartesian 1 format
c3d2 Chem3D Cartesian 2 format
caccrt Cacao Cartesian format
s Output single bonds only
b Disable bonding entirely
car Accelrys/MSI Biosym/Insight II CAR format
s Output single bonds only
b Disable bonding entirely
castep CASTEP format
ccc CCC format
cdjson ChemDoodle JSON
c <num> coordinate multiplier (default: 20)
cdx ChemDraw binary format
m read molecules only; no reactions
d output CDX tree to OBText object
cdxml ChemDraw CDXML format
cif Crystallographic Information File
s Output single bonds only
b Disable bonding entirely
B Use bonds listed in CIF file from _geom_bond_etc records (overrides option b)
ck ChemKin format
f <file> File with standard thermo data: default therm.dat
z Use standard thermo only
L Reactions have labels (Usually optional)
cml Chemical Markup Language
2 read 2D rather than 3D coordinates if both provided
cmlr CML Reaction format
cof Culgi object file format
crk2d Chemical Resource Kit diagram(2D)
crk3d Chemical Resource Kit 3D format
ct ChemDraw Connection Table format
cub Gaussian cube format
b no bonds
s no multiple bonds
cube Gaussian cube format
b no bonds
s no multiple bonds
dallog DALTON output format
s Output single bonds only
dalmol DALTON input format
s Output single bonds only
b Disable bonding entirely
dat Generic Output file format
s Output single bonds only
b Disable bonding entirely
dmol DMol3 coordinates format
s Output single bonds only
b Disable bonding entirely
dx OpenDX cube format for APBS
ent Protein Data Bank format
s Output single bonds only
b Disable bonding entirely
c Ignore CONECT records
exyz Extended XYZ cartesian coordinates format
s Output single bonds only
b Disable bonding entirely
fa FASTA format
1 Output single-stranded DNA
t <turns> Use the specified number of base pairs per turn (e.g., 10)
s Output single bonds only
b Disable bonding entirely
fasta FASTA format
1 Output single-stranded DNA
t <turns> Use the specified number of base pairs per turn (e.g., 10)
s Output single bonds only
b Disable bonding entirely
fch Gaussian formatted checkpoint file format
fchk Gaussian formatted checkpoint file format
fck Gaussian formatted checkpoint file format
feat Feature format
s Output single bonds only
b Disable bonding entirely
fhiaims FHIaims XYZ format
s Output single bonds only
b Disable bonding entirely
fract Free Form Fractional format
s Output single bonds only
b Disable bonding entirely
fs Fastsearch format
t # Do similarity search:#mols or # as min Tanimoto
a Add Tanimoto coeff to title in similarity search
l # Maximum number of candidates. Default<4000>
e Exact match
Alternative to using exact in ``-s`` parameter, see above
n No further SMARTS filtering after fingerprint phase
fsa FASTA format
1 Output single-stranded DNA
t <turns> Use the specified number of base pairs per turn (e.g., 10)
s Output single bonds only
b Disable bonding entirely
g03 Gaussian Output
s Output single bonds only
b Disable bonding entirely
g09 Gaussian Output
s Output single bonds only
b Disable bonding entirely
g16 Gaussian Output
s Output single bonds only
b Disable bonding entirely
g92 Gaussian Output
s Output single bonds only
b Disable bonding entirely
g94 Gaussian Output
s Output single bonds only
b Disable bonding entirely
g98 Gaussian Output
s Output single bonds only
b Disable bonding entirely
gal Gaussian Output
s Output single bonds only
b Disable bonding entirely
gam GAMESS Output
s Output single bonds only
b Disable bonding entirely
c Read multiple conformers
gamess GAMESS Output
s Output single bonds only
b Disable bonding entirely
c Read multiple conformers
gamin GAMESS Input
gamout GAMESS Output
s Output single bonds only
b Disable bonding entirely
c Read multiple conformers
got GULP format
gpr Ghemical format
gro GRO format
s Consider single bonds only
gukin GAMESS-UK Input
gukout GAMESS-UK Output
gzmat Gaussian Z-Matrix Input
s Output single bonds only
b Disable bonding entirely
hin HyperChem HIN format
inp GAMESS Input
ins ShelX format
s Output single bonds only
b Disable bonding entirely
jin Jaguar input format
s Output single bonds only
b Disable bonding entirely
jout Jaguar output format
s Output single bonds only
b Disable bonding entirely
log Generic Output file format
s Output single bonds only
b Disable bonding entirely
lpmd LPMD format
s Output single bonds only
b Disable bonding entirely
mae Maestro format
maegz Maestro format
mcdl MCDL format
mcif Macromolecular Crystallographic Info
mdl MDL MOL format
s determine chirality from atom parity flags
The default setting for 2D and 3D is to ignore atom parity and
work out the chirality based on the bond
stereochemistry (2D) or coordinates (3D).
For 0D the default is already to determine the chirality
from the atom parity.
S do not read stereochemistry from 0D MOL files
Open Babel supports reading and writing cis/trans
and tetrahedral stereochemistry to 0D MOL files.
This is an extension to the standard which you can
turn off using this option.
T read title only
P read title and properties only
When filtering an sdf file on title or properties
only, avoid lengthy chemical interpretation by
using the ``T`` or ``P`` option together with the
:ref:`copy format <Copy_raw_text>`.
ml2 Sybyl Mol2 format
c Read UCSF Dock scores saved in comments preceding molecules
mmcif Macromolecular Crystallographic Info
mmd MacroModel format
mmod MacroModel format
mol MDL MOL format
s determine chirality from atom parity flags
The default setting for 2D and 3D is to ignore atom parity and
work out the chirality based on the bond
stereochemistry (2D) or coordinates (3D).
For 0D the default is already to determine the chirality
from the atom parity.
S do not read stereochemistry from 0D MOL files
Open Babel supports reading and writing cis/trans
and tetrahedral stereochemistry to 0D MOL files.
This is an extension to the standard which you can
turn off using this option.
T read title only
P read title and properties only
When filtering an sdf file on title or properties
only, avoid lengthy chemical interpretation by
using the ``T`` or ``P`` option together with the
:ref:`copy format <Copy_raw_text>`.
mol2 Sybyl Mol2 format
c Read UCSF Dock scores saved in comments preceding molecules
mold Molden format
b no bonds
s no multiple bonds
molden Molden format
b no bonds
s no multiple bonds
molf Molden format
b no bonds
s no multiple bonds
moo MOPAC Output format
s Output single bonds only
b Disable bonding entirely
mop MOPAC Cartesian format
s Output single bonds only
b Disable bonding entirely
mopcrt MOPAC Cartesian format
s Output single bonds only
b Disable bonding entirely
mopin MOPAC Internal
mopout MOPAC Output format
s Output single bonds only
b Disable bonding entirely
mpc MOPAC Cartesian format
s Output single bonds only
b Disable bonding entirely
mpo Molpro output format
s Output single bonds only
b Disable bonding entirely
mpqc MPQC output format
s Output single bonds only
b Disable bonding entirely
mrv Chemical Markup Language
2 read 2D rather than 3D coordinates if both provided
msi Accelrys/MSI Cerius II MSI format
nwo NWChem output format
s Output single bonds only
f Overwrite molecule if more than one
calculation with different molecules
is present in the output file
(last calculation will be prefered)
b Disable bonding entirely
orca ORCA output format
s Output single bonds only
b Disable bonding entirely
out Generic Output file format
s Output single bonds only
b Disable bonding entirely
outmol DMol3 coordinates format
s Output single bonds only
b Disable bonding entirely
output Generic Output file format
s Output single bonds only
b Disable bonding entirely
pc PubChem format
pcjson PubChem JSON
s disable stereo perception and just read stereo information from input
pcm PCModel Format
pdb Protein Data Bank format
s Output single bonds only
b Disable bonding entirely
c Ignore CONECT records
pdbqt AutoDock PDBQT format
b Disable automatic bonding
d Input file is in dlg (AutoDock docking log) format
png PNG 2D depiction
y <additional chunk ID> Look also in chunks with specified ID
pos POS cartesian coordinates format
s Output single bonds only
b Disable bonding entirely
pqr PQR format
s Output single bonds only
b Disable bonding entirely
pqs Parallel Quantum Solutions format
prep Amber Prep format
pwscf PWscf format
qcout Q-Chem output format
s Output single bonds only
b Disable bonding entirely
res ShelX format
s Output single bonds only
b Disable bonding entirely
rsmi Reaction SMILES format
rxn MDL RXN format
sd MDL MOL format
s determine chirality from atom parity flags
The default setting for 2D and 3D is to ignore atom parity and
work out the chirality based on the bond
stereochemistry (2D) or coordinates (3D).
For 0D the default is already to determine the chirality
from the atom parity.
S do not read stereochemistry from 0D MOL files
Open Babel supports reading and writing cis/trans
and tetrahedral stereochemistry to 0D MOL files.
This is an extension to the standard which you can
turn off using this option.
T read title only
P read title and properties only
When filtering an sdf file on title or properties
only, avoid lengthy chemical interpretation by
using the ``T`` or ``P`` option together with the
:ref:`copy format <Copy_raw_text>`.
siesta SIESTA format
smiles SMILES format
a Preserve aromaticity present in the SMILES
This option should only be used if reading aromatic SMILES
generated by the same version of Open Babel. Any other
use will lead to undefined behavior. The advantage of this
option is that it avoids aromaticity perception, thus speeding
up reading SMILES.
S Clean stereochemistry
By default, stereochemistry is accepted as given. If you wish
to clean up stereochemistry (e.g. by removing tetrahedral
stereochemistry where two of the substituents are identical)
then specifying this option will reperceive stereocenters.
smy SMILES format using Smiley parser
sy2 Sybyl Mol2 format
c Read UCSF Dock scores saved in comments preceding molecules
t41 ADF TAPE41 format
s Output single bonds only
b Disable bonding entirely
tdd Thermo format
e Terminate on "END"
text Read and write raw text
therm Thermo format
e Terminate on "END"
tmol TurboMole Coordinate format
s Output single bonds only
b Disable bonding entirely
a Input in Angstroms
txt Title format
txyz Tinker XYZ format
s Generate single bonds only
unixyz UniChem XYZ format
s Output single bonds only
b Disable bonding entirely
vmol ViewMol format
s Output single bonds only
b Disable bonding entirely
wln Wiswesser Line Notation
xml General XML format
n Read objects of first namespace only
xsf XCrySDen Structure Format
s Output single bonds only
b Disable bonding entirely
xyz XYZ cartesian coordinates format
s Output single bonds only
b Disable bonding entirely
yob YASARA.org YOB format
You will need to consult the Open Babel documentation (see
https://openbabel.org/wiki/List_of_extensions ) and implementation for full
details about how these options work.