This page describes how to use DENSS most effectively to reconstruct your particle from your solution scattering data. For additional instructions for visualizing and assessing the results, visit the Tips page.
To understand how to run DENSS and how to use all of its options most appropriately, it helps to understand how DENSS works. DENSS is an algorithm used for calculating ab initio electron density maps directly from solution scattering data. DENSS implements a novel iterative structure factor retrieval algorithm to cycle between real space density and reciprocal space structure factors, applying appropriate restraints in each domain to obtain a set of structure factors whose intensities are consistent with experimental data and whose electron density is consistent with expected real space properties of particles.
DENSS utilizes the NumPy Fast Fourier Transform for moving between real and reciprocal space domains. Each domain is represented by a grid of points (Cartesian), N x N x N. N (the number of samples in one dimension) is determined by the size of the system and the desired resolution. The real space size of the box is determined by the maximum dimension of the particle, D, and the desired sampling ratio. Larger sampling ratio results in a larger real space box and therefore a higher sampling in reciprocal space (i.e. distance between data points in q). Smaller voxel size in real space corresponds to higher spatial resolution and therefore to a larger maximum q value in reciprocal space.
The reciprocal space restraints are pretty straight forward, i.e. the data. The real space restraints are imposed by defining a “support”, i.e. the region of space (which voxels) are allowed to have density. Outside of this support the density is set to zero, inside the support the density is required to be positive and real valued. The selection of which voxels are part of the support is regularly updated throughout the reconstruction and is determined by the shrinkwrap algorithm.
There are two primary files needed to run DENSS: denss.py and saxstats.py. denss.py is the python script that you invoke on the command line to actually run DENSS. saxstats.py is a python module containing the core functions for reconstructing the density.
Another file in the denss directory that is useful is a bash script called superdenss, which runs the entire pipeline for reconstructing electron density maps, including running multiple (>20) individual runs of DENSS, followed by alignment and averaging with EMAN2.
Solution scattering data, in particular small angle X-ray scattering (SAXS) data, are typically highly oversampled, meaning that there are far more data points measured in an experiment than there are actual unique pieces of information available. SAXS data are often 20 to 50-fold oversampled, depending on the size of your particle and the geometry of your experiment.
However, running DENSS at such high oversampling ratios is computationally costly, as the size of the array grows as N3. As DENSS typically runs at an oversampling ratio of 3 to 5 (see below for details), we need to first reduce the highly oversampled SAXS data to something more manageable, while still retaining all the useful precision brought about by the experimental oversampling.
To do this, we utilize well-known procedures for performing an indirect Fourier transform of the SAXS data. Usually this is used for extracting pair distribution functions, here we need it for providing a smooth fit to the experimental scattering data. One of the most popular software routines for performing this calculation is called GNOM, from the ATSAS suite of SAXS programs.
DENSS can natively accept GNOM output files (with a .out extension) as input to the denss.py script. An example of such a file, 6lyz.out, is provided along with the DENSS download. This file is a GNOM formatted file calculated from simulated SAXS data from PDB entry 6LYZ (hen egg white lysozyme). The small, 14 kDa globular particle provides for convenient and quick testing.
However, any valid procedure you like may be used for calculating the smooth fit to the scattering data. These smoothed curves can be given to DENSS as simple ASCII text files with three columns: q (=4π sin(θ)/λ in Å-1); intensity; and error on the intensity. DENSS will interpolate the given scattering profile to the necessary q values used in the reconstruction.
It is not recommended to use the automated version of GNOM, called datgnom (or autognom). This version truncates the data to low resolution (something like q < 7*Rg) due to limitations of bead modeling algorithms (such as DAMMIF) that cannot utilize such higher resolution data. However DENSS is fully capable of using high resolution data (though due to information content may not improve the resolution of the final map). It is highly recommended to manually run GNOM (for example from within the Primus interface) and ensure that all high resolution data points are used. Even in cases where the high resolution data may not be the greatest quality, it is likely to be better than the simple Porod extrapolation that DENSS performs when the voxel sizes are smaller than the corresponding resolution of the data defined by qmax.
Here we will use the provided 6lyz.out file for this tutorial, created by GNOM. We will also show how to use the 6lyz.dat file (in case a GNOM formatted .out file is not available for your data).
First, to simply run DENSS with default parameters (suitable for many cases), type at the command prompt:
$ denss.py -f 6lyz.out
When using the GNOM .out file as input, the maximum particle dimension (Dmax) is extracted from the file and used for the -d option. The -d option is required to determine the appropriate real space box size to reconstruct the particle (along with the oversampling ratio option, -os). When using a standard .dat file containing the smoothed profile in a 3-column ASCII text format, the Dmax must be given explicitly using the -d option as follows:
This tells DENSS to create a real space box with an edge length of 150 Å, since the default oversampling ratio is 3. If you use a GNOM .out file and would like to force DENSS to use a different Dmax, invoke the -d option to override the GNOM Dmax.
FAST and SLOW Modes
DENSS now (v1.0.0 or later) has convenient options for selecting some very good defaults for the vast majority of cases, even complex shapes. Two “modes” are now included as options to select these defaults, FAST mode and SLOW mode. The default mode is SLOW, since this is suitable for most cases and takes about 20 minutes or so on a modern processor. If you have a relatively simple globular shape you can select FAST mode using the -m option of denss.py, which will take about 30 to 60 seconds.
SLOW mode will set the box size to be Dmax * oversampling, and will set the number of samples (N) to 64 by adjusting the voxel size. FAST mode will use 32 samples instead. Additionally, SLOW mode will not enable shrink-wrap (described below) until step 5000, whereas FAST mode will start shrink-wrap at step 1000. Other options affected by these modes can be found in the Advanced Options section below. All of the options can still be overridden when using these modes by explicitly setting them.
Since N must be an integer, the voxel size is adjusted to the nearest value which yields an integer value for N, even if you set an explicit voxel size. You can check the .log file and look for the line saying “Real space voxel size” to find out what the actual voxel size used was, to be distinguished from the line saying “Requested real space voxel size” which will tell you what the voxel size input to the program through the -v option was.
Other commonly used options include:
-v VOXEL, --voxel VOXEL Set desired voxel size, setting resolution of map -os, --oversampling OVERSAMPLING Sampling ratio (default 3.0) -n NSAMPLES, --nsamples NSAMPLES Number of samples along a single dimension. (sets voxel size, overridden by --voxel) --ne NE Number of electrons in object (for scaling final density map) -s STEPS, --steps STEPS Maximum number of steps (iterations) -o OUTPUT, --output OUTPUT Output map filename (default basename of input file) -m MODE, --mode MODE Mode. F(AST) sets default options to run quickly for simple particle shapes. S(LOW) useful for more complex molecules. (default SLOW)
An important option for complex cases is -os, the oversampling ratio. For scattering profiles with a lot of features, higher oversampling in reciprocal space helps to ensure that the final density has a scattering profile that matches the data. A handy option, often invoked with -os, is the -n option to set the number of samples. This option will set the voxel size to a value that results in the requested N. The FFT procedure works most efficiently with N as a power of 2, so good numbers are 32, 64, 128, however any number will be fine (if an odd number is given, it is increased to the next even number). More samples does not necessarily increase the final resolution of the reconstruction due to the lack of information in the SAXS profile, but in complex cases may be required for ensuring accurate comparison of the calculated scattering profile to the experimental scattering profile in reciprocal space (see the Tips page for how to assess this). In such cases try more samples with appropriately increased -os. It will take much longer, but its pretty quick as is. In extreme cases 128 may be used, but rarely is more than that necessary or useful. Remember that the total array size grows as N3.
The speed of the program is pretty much entirely dictated by N. In some cases the algorithm will converge in fewer steps than in other cases, so it does vary some in that sense. For your average case where N = 32, a single run of denss.py takes about 30 seconds or so on a modern processor (e.g. 2.5 GHz, Intel i5). For more complex cases where N = 64, it still only takes about 20 minutes; for N = 128 it takes over an hour. However, as noted below, running multiple reconstructions (20 for simple cases, maybe 100 for complex cases) is required, and the averaging process tends to take longer than the reconstructions themselves.
As the program runs, the current status will be printed to the screen like so:
$ denss.py -f 6lyz.out Step Chi2 Rg Support Volume ----- --------- ------- -------------- 349 8.35e+04 34.76 3375000
These values will update in line as the program progresses. “Step” is the current iteration, “Chi2” (or rather χ2) is the goodness of fit of the calculated intensity versus the interpolated experimental intensity. “Rg” is the radius of gyration calculated directly from the electron density map. “Support Volume” is the total volume in Å3 of the support, i.e. the voxels containing the particle which is determined by shrink-wrap (see below).
Additionally, when the –enforce_connectivity restraint is imposed (at step 6000 in SLOW mode, 2000 in FAST mode), an additional integer number will be printed to the right of the Support Volume and the results line will be kicked down one line. This number refers to the number of “features” that the –enforce_connectivity option counted, i.e. the number of separate blobs. If you see this number is 1, then that means shrinkwrap already got rid of all the other disconnected blobs of density and it won’t have much effect. This happens more often for simple globular particles. However, for less globular particle shapes, there are often several disconnected features at this early stage of the reconstruction, so the –enforce_connectivity restraint eliminates all of the minor features, retaining only the feature with the greatest density. It will look something like this:
$ denss.py -f 6lyz.out Step Chi2 Rg Support Volume ----- --------- ------- -------------- 5999 5.62e+01 16.36 52234 1 8259 1.31e+00 14.34 42135
Some notes about these values:
In most cases, the χ2 value reported should only be used as a relative indicator of whether or not convergence is occurring. χ2 is very sensitive to accurate error estimates and can easily be orders of magnitude off in scale (particularly here as we are interpolating values and using smooth curves which do not adjust errors to account for oversampling). However as the reconstruction progresses, you should notice a steady decline in the reported value. Do not be concerned if the value fluctuates up and down (particularly around the step where the –enforce_connectivity option is invoked), as this is often part of the convergence process.
Towards the beginning of the reconstruction, Rg values may appear drastically off and may even be negative. This is simply because the Rg calculation does not understand the concept of periodic boundaries in the FFT and when multiple separate blobs are present the calculation is inaccurate. However, typically after the –enforce_connectivity option removes extra blobs in the support and the density gets recentered, the Rg should change (often drastically) to more reasonable values and proceed to converge until completion. Due to the random nature of the starting seed of the algorithm, multiple reconstructions often vary in final Rg by a few (maybe 5 or so) percent.
The “Support Volume” is only the volume of the support region and should not be confused as the actual volume of the particle. The “Support Volume” will always be larger than the volume of the particle since it is the volume of the voxels containing the particle. However it should give you some idea of an upper bound of the particle volume. To actually estimate particle volume from the density, you can open up the .mrc file in Chimera and open the “Measure Volume and Area” tool. This will calculate the volume of the particle at the given density threshold (which is user defined).
Electron density maps are written in CCP4/MRC format (credit Andrew Bruno) and optionally as Xplor ASCII text format (with the –write_xplor option enabled). These files can be opened directly in some visualization programs such as Chimera and PyMOL. In particular, the PyMOL “volume” function is well suited for displaying these maps with density information displayed as varying color and opacity. Maps can be converted to other formats using tools such as the Situs map2map tool.
Output files include:
output.mrc electron density map (MRC format) output_support.mrc final support volume formatted as unitary electron density map output_stats_by_step.dat statistics as a function of step number. three columns: chi^2, Rg, support volume output_map.fit The fit of the calculated scattering profile to the experimental data. Experimental data has been interpolated to the q values used for scaling intensities and I(0) has been scaled to the square of the number of electrons in the particle. Columns are: q(data), I(data), error(data), q(calc), I(calc) output_*.png If plotting is enabled, these are plots of the results. output.log A log file containing parameters for the calculation and summary of the results.
Averaging with EMAN2
DENSS reconstructs particles in an iterative fashion which begins by filling the entire grid of points with random values. As a result of this, each run of DENSS with identical input parameters yields a slightly different result. These results, while different at high resolution, should be similar at low resolution. Therefore, to determine the low-resolution density, one must run DENSS multiple times and average the results. This process can be performed easily with EMAN2. EMAN2 is primarily a package for reconstructing 3D volumes from electron microscopy data, however it is also just a general grayscale image processing suite geared for scientific purposes. This makes EMAN2 a high-quality, easy-to-use, and well-supported solution for aligning and averaging multiple DENSS reconstructions.
For convenience, provided with the DENSS download is a bash script that can be used to run the entire pipeline of multiple DENSS reconstructions and perform averaging with EMAN2. This script is called superdenss. By default superdenss will run everything in parallel on a multicore machine (by default sets number of cores to one less than available on the system, adjustable with the -j option). To run the individual reconstructions in parallel, superdenss uses GNU parallel. However if you don’t have GNU parallel installed, superdenss will run the individual reconstructions in a loop, but will still run EMAN2 in parallel.
To run superdenss with default parameters, type:
If this is the first time you have run superdenss using this input file, this will create a folder called “6lyz” containing the results. If you have run this command before with the same input file or that directory name already exists, superdenss will create a new directory and append incrementing numbers to the end of the directory name so that your previous results are not overwritten. If you would like to change the output filename prefix, set the -o option. For example, to change the output name to “lysozyme”, type:
If you would like to enter additional options for running the individual DENSS reconstructions, you can invoke them using the -i option, and surround all the options you would like to give to denss.py with one set of double quotes. For example, to force DENSS to use a different Dmax than in the GNOM file, type:
Options for superdenss should be given outside of the quotes, while options for denss.py should be given inside the quotes after the -i option. For example, to run denss.py with greater oversampling and more samples, type:
However, if you wanted to also run more reconstructions than 20 (the default for superdenss), you can invoke the -n superdenss option as follows:
This will run 100 reconstructions with denss.py while setting the Dmax to 55, the oversampling ratio to 5, and the number of samples to 64.
This will create a folder named “lysozyme” with all the output files for each of the 20 individual runs of DENSS, a file called “lysozyme_avg.mrc” containing the final averaged density, and a file called “lysozyme_fsc.txt” containing the Fourier Shell Correlation used to estimate resolution. Additionally, the resolution estimated from the FSC curve will be printed to the screen.
Enantiomer Generation and Selection
The latest updates to DENSS (version 1.0.2 or later) now include a new option in the superdenss script to deal with possible enantiomer ambiguity. Enantiomers (i.e. mirror images of particles) are ambiguous in solution scattering and yield identical scattering profiles. It is possible for denss.py to create different enantiomers based on the random seed that it starts with. Because of this, different enantiomers may get averaged together. For some particle shapes, this is not so much of an issue because the possible enantiomers may only be distinguishable at resolutions higher than can be reconstructed. However for more complex particle shapes, different enantiomers may be clearly distinguishable and yield incorrect results if averaged together. Previously, you were required to manually sort or classify these particles into different sets for each enantiomer, and subsequently perform independent averaging. However, there is now a new option in superdenss (the -e option) which will turn on the ability to generate each of the eight enantiomers (two reflections in each of the three axes, i.e. 23) and compare each enantiomer to the reference volume to select which enantiomer agrees best. Then the averaging procedure will continue as normal using only the best enantiomers. Note that this process takes significantly longer (>8x) than without the -e option invoked, which is why it is disabled by default.
To run superdenss in slow mode (suitable for most complex particle shapes) with enantiomer generation and selection enabled, simply add the -e option to superdenss. Here we will take a look at a particularly complex particle shape, SusF (PDB ID 4FE9). This protein was selected from the PDB as exhibiting the single highest ambiguity score as estimated by AMBIMETER (a score of 3.019). The complexity of the shape makes incorrect enantiomer selection a serious problem. It is also a good demonstration of the ability of DENSS to reconstruct complex shapes. Note the image below on the left shows a single reconstruction of denss.py that created the correct enantiomer. Shown in the middle is an example of a single reconstruction that selected the incorrect enantiomer. Shown on the right is the final reconstruction after generating and selecting for the correct enantiomers.
To have superdenss create each of the eight enantiomers and select the best one for averaging for each of the 20 reconstructions, add the -e option to superdenss as follows:
This will create a folder named “4FE9” and at the end will create a new folder inside named spt_avg_01 containing the final_avg_ali2ref.hdf file with the fully averaged reconstruction, taking into account the best enantiomers. This runs denss.py in the default “slow” mode. This shows that for even quite complex cases, the default slow mode should work quite well in combination with enantiomer selection.
It should be noted that this procedure does not ensure that the final averaged reconstruction represents the actual correct enantiomer, it just ensures that all of the reconstructions used in the averaging procedure have the same handedness. There is no way to to select the correct enantiomer without some ancillary data, since enantiomers are ambiguous in solution scattering.
The averaging procedure works by dividing the 20 individual reconstructions into two halves, even and odd. It then proceeds to perform independent alignment and averaging for each set of 10 reconstructions. Then, it aligns the two halves and calculates the Fourier Shell Correlation between them and saves it in a file named “spt_avg_01/fsc_0.txt” that gets copied to the main folder and renamed “output_fsc.txt”, where “output” is replaced with your output name. This plot can be used to estimate resolution. Plot this in your favorite plotting program and find where the FSC curve falls below 0.5. Take the reciprocal of that x-axis value, and that is your estimated resolution in Å. For convenience this resolution is estimated and printed to the screen for you when running superdenss (v1.0.3). If the python module matplotlib is available, a plot of the FSC and estimated resolution will be saved to a png file. To estimate resolution by comparing with a known structure, see the Tips page.
For most circumstances, the above options are all you need. However, if you would really like to dig in to the algorithm, there are several advanced options that can be invoked. Most of these options adjust real space restraints, such as shrinkwrap. We will go through each in turn.
Recenter (Default: On)
The first option we will start with is recentering the particle. For the beginning part of the reconstruction, separate blobs of density often appear causing issues with accurate recentering. Thus, by default, recentering does not begin until after the –enforce_connectivity restraint is invoked. Recentering then occurs a few times afterwards just to make sure the particle stays close to the center while it converges. The particle is recentered by calculating the center of mass of the density and shifting the particle such that it aligns with the center of the grid. However this shift simply rolls the array values so as to avoid excessive interpolation (which can cause artifacts), so the particle is not exactly in the center, but just close to it. You can change during which steps the particle gets recentered by giving a space separated list of steps to the –recenter_steps option, or disable it entirely with the –recenter_off option.
Positivity (Default: On)
Positivity ensures that there are no negative density values throughout the box by making every negative value zero. You can disable this with the –positivity_off option.
Extrapolate (Default: On)
Oftentimes the experimental data do not go to the high resolution limit defined by the voxel size. For example, the default voxel size is 5 Å, corresponding to a maximum q value of 1.26 Å-1 (2π/voxel). SAXS data doesn’t typically extend to such high q values, though WAXS often exceeds this. To ensure realistic values for intensities in reciprocal space at q values higher than that which is provided by the experimental data, the profile is extrapolated using Porod’s law (I(q) ∝ q-4). You may disable this option with –extrapolate_off.
Limit Dmax (Default: Off)
This option removes any grid points outside of a sphere of radius Dmax/2 (actually 0.6*Dmax for some wiggle room) from the center of the grid from the support, effectively setting all voxels outside of this boundary to be zero. Provided your particle is properly centered inside the box, and does not extend beyond this boundary, this helps to prevent noise far out in the solvent region. However, this option is discouraged as it may chop off parts of the particle during the early stages, biasing further convergence (which is what the –dmax_start_step option is for). Hence it is disabled by default. If you really want to use it, but are concerned your particle may get cut off because your Dmax is too small, increase Dmax with the -d option. But if for some reason you would like to keep the same box size, decrease the oversampling with the -os option (which accepts non-integer values).
But, just don’t… Shrink-wrap coupled with –enforce_connectivity will do the job and do it much better. This is really just here for legacy reasons.
Shrink-wrap (Default: On)
Shrink-wrap is one of the most important features of DENSS and is crucial for determining which voxels belong to the support. Every 20 steps shrink-wrap makes a copy of the density* and convolves it with a Gaussian of width sigma (i.e. it “blurs” the density like an image in Photoshop). It then defines the support as all voxels whose blurred density is greater than some minimum threshold (by default set to 20% of the maximum of the blurred density). Towards the start of the reconstruction, most of the density is still approximately uniformly distributed, so the support is pretty much the entire grid. However as the reciprocal space restraints are imposed (i.e. the data), the density becomes more blob-like and globular, causing shrink-wrap to remove more and more voxels from the support, which in turn causes the density to become even more globular, and on and on. Eventually this process converges where the support changes very little when updating it with shrink-wrap.
*NB: shrink-wrap does not change the actual density. It just uses it to determine the support.
This option determines how much blurring occurs when running shrink-wrap. The value, sigma, is given in voxels. A larger number means greater blurring. At the beginning we don’t have much of a particle yet, so we want the support to change very gradually while we let the data make the density more globular. To do this we use a lot of blurring since that makes the voxel values more similar to their neighbors limiting how many voxels get removed from the support (i.e. –shrinkwrap_sigma_start = 3 by default). As the particle becomes more globular, sigma gradually decreases to a minimum value (–shrinkwrap_sigma_end = 1.5 by default). The rate of decrease is determined by the –shrinkwrap_sigma_decay (default = 0.99). If you want shrink-wrap to be less restrictive to start, increase the sigma_start value. If you want shrink-wrap to converge more slowly, increase the sigma_decay value (maximum of 1). Editing these values will likely result in a slower convergence process requiring more steps, so be sure to increase the -s option to allow enough time for the reconstruction to converge. If you would like to remove the -s option altogether and allow the algorithm to stop only once it has converged (determined by the –chi_end_fraction option), simply give a very large number to the -s option (like 1e8), but be prepared to wait. The total number of steps for updating the support using shrink-wrap (and thus roughly minimum number of steps for convergence) be calculated as
Shrink-wrap Threshold Fraction (Default: 0.20)
This option sets the minimum threshold, as a fraction of the maximum of the blurred density, that a voxel density must exceed to be a part of the support. If you think shrink-wrap might be chopping off parts of your particle, decrease this value. However, decreasing this value will likely result in noisier maps. If you decrease this value, you may want to add some steps to the –enforce_connectivity_steps option to make sure that you only end up with one connected region of the support. You will also want to increase the maximum number of total steps (-s) allowed for the reconstruction, as decreasing this value may cause the reconstruction to converge more slowly.
Shrink-wrap Iter (Default: 20)
This is the number of steps between updating the support with shrink-wrap. The density should converge a little within this time frame, and then the support should be updated again to more closely reflect the new density. You will likely see some spikes at regular intervals in your final plots. The interval between these spikes is most likely equal to shinkwrap_iter.
Shrink-wrap Minstep (Defaults: SLOW=5000; FAST=1000)
This option tells shrink-wrap when to start. It actually doesn’t start at step zero by default, but step 20, since that is the first possible iteration. This is intentional as it gives the density a little time to begin converging based solely on the data and the positivity restraint and become a bit more globular, before shrink-wrap starts removing voxels from the support. If you feel that shrink-wrap may be a bit too aggressive to start with, try increasing this number to a later step to give it more time to converge just with the positivity restraint without removing any voxels from the support (thus performing no solvent flattening). If you do that, you may want to increase the –enforce_connectivity_steps option also to make sure the initial blobs of density have time to form before it cuts them off. That in turn may affect your –recenter_steps option also, so you might want to change that too.
Enforce Connectivity (Default: On)
As the reconstruction progresses, the density become more globular. While shrink-wrap does a pretty good job of eliminating the noise in the solvent, occasionally multiple blobs of density will appear in the final reconstruction. This occurs because the blobs are typically very far apart and only contribute scattering at very low angles and thus do not affect the fit of the scattering profile substantially (each individual blob typically looks similar to the actual particle). However this can obviously cause unwanted artifacts in the density since the combination of blobs are now contributing to the scattering profile. To mitigate this problem, the –enforce_connectivity_on option is enabled by default. This option, after enough iterations have progressed to create particle-like globular densities, evaluates the support to define “features”, i.e. disconnected regions of the support, and eliminates all of the features except the one with the greatest density. It simply eliminates these voxels from the support. This often significantly improves the final reconstruction, particularly for more complex particle shapes.
Enforce Connectivity Steps (Defaults: SLOW=6000; FAST=2000)
This option defines which steps to initiate the enforce_connectivity routine. Make sure you give the algorithm enough time to reconstruct useful globular density. If you still find you have multiple blobs at the end of your reconstruction, add more steps to this option as a space separated list of integers. When doing so, you will probably also want to adjust the –recenter_steps accordingly to makes sure the remaining large blob of density gets moved to the center.
If you have any questions about any of these options or more generally how to run DENSS, please send an email using the contact form below.