Monday, September 9, 2019

Comparing NUS sampling schedules

Non uniform sampling (NUS) allows multi-dimensional NMR data to be collected in a quarter or less of the traditional time. This is done by collecting only a fraction of the usual data and predicting the missing points. The data that is collected is defined by a sampling schedule and many different ways of creating these sampling schedules have been published. Several papers have reported that not all sampling schedules, even those created in the same way, produce the same quality spectra. Here, the spectra produced by different sampling schedules are compared and methods for assessing the quality of NUS spectra trialed.

Many different sampling schemes have been proposed but it seems that the sinusoidal-weighted Poisson gap method1 is now generally accepted as the most robust. This is the default method implemented in the latest versions of Bruker's TopSpin software. The algorithm uses a random seed to generate a list of increments to sample. The "gaps" between the data that is actually collected are weighted so that more data is collected for smaller values of the incremented delay. This biases data collection to where the signal is stronger. The figure below shows ten different Poisson gap sampling schedules. Each column of points represents the 32 rows or increments (25% of 128) to be sampled. Note how the sampled rows are concentrated towards the bottom.


For a given amount of sampling there is more than one sampling schedule that can be generated. It has been reported that not all sampling schedules generated in the same way and using the same amount of sampling perform equally2, and so methods for scoring sampling schedules have been published. One of these methods, NUSscore3, uses the Poisson gap method to generate sampling schedules and calculates a score for their quality.

The NUSscore software was used to generate 1000 sampling schedules to sample 50, 25, 12.5 and 6.25% of 128 points in a multiplicity edited HSQC experiment recorded on a sample of strychnine in CDCl3. A normal, fully sampled (ie 100%) spectrum was recorded and the sampling schedules used to extract data to generate four sets of 1000 different NUS spectra. As a reference, a synthetic spectrum with no noise was generated using the peak positions and intensities in the fully sampled spectrum. Subtracting the NUS spectra from the synthetic spectrum gave difference spectra from which the impact of the sampling could be assessed.


Assessing the spectra requires choosing a metric by which to judge quality. Initially the rmsd of the difference spectra was used as this should give an average over the entire spectrum of any changes in peak position or intensity and the introduction of artifacts or additional noise. In the figure below the graph on the left shows the rmsd of the difference spectra plotted against the sampling schedule score. Each of the sampling levels clusters into a tight group with higher sampling corresponding to lower (better) scores, as expected. However, contrary to what was expected, the rmsd increases with higher levels of sampling. Examining individual spectra revealed that as the sampling is reduced the noise is reduced as well. In other words, the 6.25% sampled NUS spectra were less noisy than the 50% NUS spectra. I assume this is because the reconstruction algorithm produces data with higher precision than that of experimental data.



The graph on the right of the figure above shows an expansion of the 50% sampling cluster. The scatter of the data points indicates that different sampling schedules do produce different quality spectra. Despite the scatter the rmsd is correlated with the score, however, it is the opposite correlation of what was expected. The author of the NUSscore software was contacted but could not explain this correlation.

The sampling schedule score shows the expected discrimination between sampling rates, but not for schedules with the same amount of sampling. The rmsd of the difference spectrum does not show the expected behaviour, and neither does the noise (data not shown). As an alternative metric the difference between the maximum and minimum intensity in the difference spectrum, the total range in intensity, was investigated. The intensity range does behave as expected. The box and whisker plot below graphs the intensity range against the amount of sampling. The pairs of boxes correspond to the two inner quartiles of the distribution of values, the whiskers extend to the most distant point within 1.5 times the interquartile range, and black circles represent outliers. The horizontal dotted red line marks the intensity range obtained by subtracting the fully sampled spectrum from the noise less spectrum.


Here the data seems to make sense. As the amount of sampling is reduced the intensity range, or difference from the noise-less spectrum, increases. For 50 and 25% sampling the variation in the intensity range is small, indicating that there is little difference in the quality of the sampling schedules for this amount of sampling. As the amount of sampling is reduced the schedules show more variability in reproducing the full data. The 50 and 25% sampling schedules seem to be quite accurate as they produce results very close to the 100% sampling.

Having developed this method for validating the effectiveness of NUS sampling schedules, and shown that not all schedules are the same, it is now possible to identify the best schedules and use them to acquire the best possible spectra.

References
1. Hyberts SG, Takeuchi K, Wagner G
Poisson-gap sampling and forward maximum entropy reconstruction for enhancing the resolution and sensitivity of protein NMR data.
J Am Chem Soc. 2010 Feb 24;132(7):2145-7

2. Sidebottom PJ
A new approach to the optimisation of non‐uniform sampling schedules for use in the rapid acquisition of 2D NMR spectra of small molecules
Magn. Reson. Chem. 2016 May 9;54:689–694

3. Aoto PC, Fenwick RB, Kroon GJ, Wright PE
Accurate scoring of non-uniform sampling schemes for quantitative NMR.
J Magn Reson. 2014 Sep;246:31-5.

No comments:

Post a Comment