Structures of molecules can be determined experimentally or predicted computationally. For small volatile molecules, such as carbon monoxide, and water, analysis of rotational lines in the microwave or infrared spectra of gaseous samples provides very accurate geometries. For larger volatile molecules, gas electron diffraction data are commonly used in structure determination, typically in conjunction with analysis of rotational spectrum. As a result of such efforts, bond lengths of nearly every stable diatomic and triatomic molecule are known with accuracy better than 0.005 angstroms. Experimental structures of many tetra-atomic and larger molecules are also known with similarly good accuracy. Structure determination of larger molecules in the gas phase becomes challenging as the spectra become more crowded and weaker due to low vapor pressure of larger molecules. The analysis is further complicated by the presence of multiple conformations in flexible molecules. However, structures of most organic molecules and of many biological macromolecules can be determined accurately in the solid state using X-ray or neutron diffraction.
Computational chemistry could, in principle, give accurate molecular structures for molecules of any size. In practice, however, accuracy comes with a steep price in computational time, and quantum chemical computations that provide bond lengths with accuracy better than 0.005 angstroms are only feasible with small molecules. Such rigorous methods are not routinely applicable for larger molecules, such as nucleotides, peptides, and many drug molecules. In such cases, one can use approximate computations that may yield structures with useful accuracy. Unfortunately, the approximate methods may also produce erroneous results. For example, a rather popular computational schemer designated as B3LYP/6-311G* overestimates the bond length of diatomic chlorine by 0.07 angstroms.
Frequently, chemists and biochemists are interested in the structure of molecules in the liquid or in the solution. For example, organic chemists might wonder what is the solution structure of a catalyst in order to predict its usefulness for a particular reaction, or biochemists might ponder how the structure of DNA changes when a transcription factor binds to it. In many cases, the effects of the medium on the structure are not too significant; in this case use of experimental crystal structures or computationally predicted gas phase structures is permissible for description of molecules in solution. For example, the structure of gaseous methylcyclohexane, and methylcyclohexane in chloroform solution are not too different: in both cases the molecule adopts a chair conformation with the methyl group in the equatorial position. Similarly, it is likely that the experimental crystal structure of the transcription factor-DNA complex accurately reflects structural changes that occur in solution upon binding. However, solvation can change the structure of molecules significantly; for example the structure of dehydrated DNA would differ significantly from the structure of well-hydrated DNA. In such cases advanced computational methods that account for effects of the environment could be applied.
Molecular structures can be stored in computer files. Minimally, one needs to define the atom type and its Cartesian coordinates to uniquely describe molecule's structure. However, largely due to historic reasons, we currently have several different file formats that represent the same structural data slightly differently. Some file formats that you may encounter are the PDB format, the MOL2 format, Tinker's XYZ format, Gaussian's Z-Matrix format, and GAMESS's XYZ format. In past, practitioners of computational chemistry had to learn how to hand-craft input files in a suitable format. In more modern times various model building programs help in this task, and programs such as Babel can convert between different formats.
Despite the emergence of large chemical information resources such as PubChem, finding molecular structures that are suitable starting points for computational modeling remains a challenge. For example, you could search PubChem for cubane, and then save its coordinates in the SDF format, but closer inspection (e.g. open the molecule in PyMOL and rotate) reveals serious problems. Databases such as Klotho provide model structures for some common small molecules. Chemicals with Pharmaceutical Activity from University of Oxford offers access to many drug models via JMol plugin from which the mol file can be saved. The ZINC database at UCSF provides MOL2 structure files for millions of compounds. The Protein Data Bank offers access to experimentally determined structures of macromolecules and macromolecular complexes.
Several web sites generate 3D molecular structures from the SMILES string using the program CORINA. CORINA uses built-in tables of standard bond lengths and angles to create a reasonable model for small or rigid molecules. However, the model geometry for larger and flexible molecules is likely to be quite different than the most prevalent geometry in aqueous solution. One such site that generates 3D model structures is Online SMILES Translator by National Institutes of Health. Practice creating a molecule of ethanol using this service. Start the Structure Editor and sketch an ethanol molecule consisting of two bonds and oxygen; do not worry about hydrogens at carbons. Hit Submit Molecule and notice that the SMILES string for ethanol was generated. On the right-side panel, select PDB and 3D and hit Translate. Right-click on the link to download the molecule and save it as ethanol.pdb into your directory. Examine the file with text editor.
Most chemists are well familiar with drawing 2D molecular structures and several programs allow effortlessly draw 2D representations of three-dimensional molecules. Two of the most popular 2D chemical diagram editors for Windows and Mac OS systems are ChemDraw from CambridgeSoft and MDL Draw from Elsevier MDL. Students can download a fully functional free chemical drawing program MDL Isis/Draw from Elsevier MDL website after registration. Some chemical drawing tools allow generation and export of 3D coordinates of the drawn molecule. The JME Molecular Editor allows to sketch simple molecules on-line and export these structures into the SMILES string.