New implementation of RooFit data types. The implentation of data stored in RooDataSet and RooDataHist was historically handled by ROOT TTrees (though class RooTreeDataStore). The default storage type has now been changed to class RooVectorDataStore which stores the information in STL arrays. Existing datasets based on trees can be read in transparently, and are converted to vector form in the persistent-to-transient conversion (the datafile is not modified in this operation)
The vector store has two important advantages: 1) faster data access (raw data access times are 70 times faster than for TTrees), 2) ability to rewrite columns on the fly. The first advantage is important for the existing constant-term precalculation optimization in roofit likelihoods as these are now also stored in vectors rather than trees. The faster access speed of vectors make that the constant term optimization inside likelihoods results in a larger speed increase. This is particulatly noticable in pdfs with many constant expressions from pdfs that were moderately fast to begin with (e.g. RooHistPdf). The second advantages allows new types of algorithmic likelihood optimization in RooFit detailed below.
New algorithmic optimization in the caching of pdfs. So far - in the likelihood - two classes of objects are identified: those that change with every event (i.e. the pdf) and those that change only with the parameters (typically pdf normalization integrals). Pdfs are always recalculated for every event, whereas integrals are only evaluated when needed. The exception to the first type are pdfs that only depend on constant parameters (or no parameters) - these are identified at the beginning, and precalculated once to avoid recalculating an expression with the same outcome for every iteration of the likelihood calculation
For composite pdfs a further optimization has been included: for a M(x,a,b) = f*F(x,a)+(1-f)G(x,b) it is e.g. not needed to recalculate G(x,b) if only parameter a has changed w.r.t to the previous likelihood calculation. This optimization is now implemented by extending the value caching orignally designed for constant terms to be usable for non-constant terms, with a check executed at the beginning of each likelihood evaluation if selected columns need to be updated because parameters have changed. The speed gain of this optimization depends much on the structure of the pdf: in models with many free parameters most of the likelihood evaluations are executed when MINUIT calculates numerical likelihood derivatives which vary one parameter at a time and the speedup is potentially larger. In models with few free parameters the effect will be smaller.
The new per-component caching strategy is enabled by default for all pdfs that are a component of a RooAddPdf or a RooRealSumPdf, unless that component is a RooProdPdf or a RooProduct, in that case the components of the product are cached instead of the product itself. You can disable this new optimization by adding Optimize(1) to the RooAbsPdf::fitTo() command line (0 = no caching, 1 = cache constant terms only, 2 = cache also variable terms according to above mentioned strategy (DEFAULT))
It is also possible to tune this 'cache-and-track' optimization to perform a more fine-grained caching of components than Optimize(2) implements: to do so, call arg->setAttribute("CacheAndTrack") on each pdf component that you'd like to be cache-and-tracked individually.