RooFit

New tutorial macros available

A set of seventeen new tutorial macros has been added to $ROOTSYS/tutorials/roofit

Update of class documentation

The documentation in the code itself that is extracted by THtml to construct the online class documentation has been updated for all classes. Now all classes have (again) a short class description, as well as a (short) description of each member function and most data members. An update to the users manual is foreseen shortly after the 5.20 release.

RooWorkspace

A new feature has been added that allows to persist source code of RooFit classes that are not in ROOT distribution inside a RooWorkspace to facilitate sharing of custom code with others. To import code of custom classes call
         RooWorkspace::importClassCode()
after importing the objects themselves into the workspace. For all classes that are compiled with ACliC RooWorkspace can automatically find the source code using the ROOT TClass interface. For custom classes that are compiled externally and loaded into ROOT as shared library it might be necessary to provide the location of the source files manually using the static RooWorkspace member functions addClassDeclImportDir() and addClassImplImportDir().

When a TFile with a RooWorkspace containing source code is opened in a ROOT session that does not have the class code already loaded for the classes contained in the workspace, the code in the workspace is written to file, compiled and loaded into the ROOT session on the fly.

The code repository of RooWorkspace is designed to handle classes that have either their own implementation and header file, or are part of a group of classes that share a common header and implementation file. More complicated structuring of source code into files is not supported.

Also new accessors have been added for discrete-valued functions catfunc() and stored category functions are now also printed under their own heading in Print()

Parameterized ranges

It is now possible to use RooAbsReal derived functions as range definition for variables to construct ranges that vary as function of another variable. For example
         RooRealVar x("x","x",-10,10) ; // variable with fixed range [-10,10] 
         RooRealVar y("y","y",0,20) ; // variable with fixed range [-10,10] 
   
         RooFormulaVar x_lo("x_lo","y-20",y) ;      
         RooFormulaVar x_hi("x_hi","sin(y)*5",y) ;      
         x.setRange(x_lo,x_hi) ;  // Change x to have variable range depending on y
It is also possible to define parameterized named ranges in the same way
         x.setRange("signalRegion",x_lo,x_hi) ;
There are no fundamental limits to the complexity of the parameterized ranges that can be defined as long as the problem is uniquely defined. For example, given three observables x, y and z, one can define a parameterized named range 'R' of x in terms of y and of y in terms of z and ask to calculate the three dimensional integral of any function or p.d.f in terms of (x,y,z) over that range 'R' and it will be calculated correctly, taking recursive range dependencies into account. A definition of a range 'R' on the other hand where the bounds of x depend on y and the bounds of y depend on x is not allowed, and an error message will be printed to complain about the ambiguity of the problem definition. Integrals over non-rectangular regions are created the same way as integrals over rectangular regions using the RooAbsReal::createIntegral() function, the chosen mode of operation depends on the shape of the requestion integration range.

Note that in general integration over non (hyper)rectangular regions will be more computationally intensive as only a subset of the observables can be integrated analytically (all of those that do not have parameterized ranges plus those that have parameterized ranges but are not involved in the parameterization of others (e.g. x and y in the example above)

Running integrals and Cumulative distribution functions

It is now possible to create running integrals from any RooAbsReal function and to create cumulative distribution functions from any RooAbsPdf using the following methods:
        // Create int[xlo,x] f(x') dx' from f(x)
        RooAbsReal* runInt = func.createRunningIntegral(x) ;

        // Create int[xlo,x] f(x') dx' from p.d.f f(x) normalized over x
        RooAbsReal* cdf = pdf.createCdf(x) ;

        // Create int[xlo,x] f(x',y) dx' from p.d.f f(x,y) normalized over (x,y)
        RooAbsReal* cdf = pdf.createCdf(x,y) ;
As with the similarly styled function createIntegral running integrals and c.d.f. can be created over any number of observables, e.g createCdf(RooArgSet(x,y,z)) will create a three-dimensional cumulative distribution function. C.d.f and running integrals that are calculated from p.d.fs that have support for analytical integration are constructed from an appropriately reconnected RooRealIntegral. If numeric integration is required, the c.d.f or running integral is calculated by a dedicated class RooRunningIntegral that precalculates results for all observable values, which is more efficient in most use cases. Cumulative distributions functions that are calculated numerically are handled slightly differently that standard running integrals: their values is constructed to converge to exactly zero at the lower bound and exactly 1 at the upper bound so that algorithms that make use of that property of c.d.f can do so reliably.

Constraints management

New tools have been added to simplify studies with fits involving (external) constraints on parameters. The general philosophy is that constraints on parameters can be represented as probability density functions and can thus be modeled by RooAbsPdf classes (e.g. a RooGaussian for a simple Gaussian constraint on a parameter). There are two modes of operation: you can add parameter constraints to your problem definition by multiplying the constraint p.d.f.s with your 'master' p.d.f. or you specify them externally in each operation. The first mode of operation keeps all information in your master p.d.f and may make the logistics of non-trivial fitting problems easier. It works as follows: first you define your regular p.d.f, then you define your constraint p.d.f and you multiply them with RooProdPdf.
        // Construct constraint
        RooGaussian fconstraint("fconstraint","fconstraint",f,RooConst(0.8),RooConst(0.1)) ;

        // Multiply constraint with p.d.f
        RooProdPdf pdfc("pdfc","p.d.f with constraint",RooArgSet(p.d.f,fconstraint)) ;
If your top level p.d.f is already a RooProdPdf it also fine to multiply all terms together in one go. Constraints do not need to be specified a the top-level RooProdPdf, constraint p.d.f.s in any component RooProdPdf lower in the expression tree are used as well. Constraints are not used by default in fitting if present in a p.d.f. To activate the use of a constraint in fitting, use the Constrain() argument in fitTo()
        // Fit with internal constraint
        RooFitResult* r2 = pdfc.fitTo(*d,Constrain(f)) ;
This will instruct RooAbsPdf::fitTo() to included any constraint p.d.f on parameter f in the definition of the likelihood. It is possible to add multiple constraints on the same parameter to the 'master' p.d.f. If so, all constraints on a given parameter will be added to the likelihood.

The RooMCStudy class has been extended to accept the Constrain() argument as well in its constructor. If specified it will do two things: 1) it will pass the constrain argument to the fitting pass of the toy study and 2) it will modify the generation step into a two-step procedure: for each toy in the study it will first sample a value of each constrained parameter from the joint constraints p.d.f and it will then generate the observables for that experiment with the thus obtained parameter values. In this mode of operation the parameter values for each toy may thus be different. The actual parameter for each toy can be obtained with the newly added RooMCStudy::genParDataSet() member function. The calculation of the pull values for each parameter has been modified accordingly.

Alternatively, it is possible to specify constraints to both RooAbsPdf::fitTo() and the RooMCStudy constructor using the ExternalConstraint() named argument to supply constraint p.d.f.s that are not part of the 'master' p.d.f but rather an ad-hoc supplied external constraint. The argument supplied to ExternalConstraint() should be (a set of) constraint p.d.f(s), rather than (a set of) parameters for which internal constraint p.d.f.s should be picked up.

New operator class RooLinearMorph

A new numeric operator class RooLinearMorph has been added that provides a continuous transformation between two p.d.f.s shapes in terms of a linear parameter alpha. The algorithm for histograms is described in the paper by Alex Read in NUM A 425 (1999) 357-369 'Linear interpolation of histograms'. The implementation in RooLinearMorph is for continuous functions.
        // Observable and sampling binning to be used by RooLinearMorph ("cache")
        RooRealVar x("x","x",-20,20) ;
        x.setBins(1000,"cache") ;

        // End point shapes : a gaussian on one end, a polynomial on the other
        RooGaussian f1("f1","f1",x,RooConst(-10),RooConst(2)) ;
        RooPolynomial f2("f2","f2",x,RooArgSet(RooConst(-0.03),RooConst(-0.001))) ;

        // Interpolation parameter: rlm=f1 at alpha=0, rlm=f2 at alpha=1
        RooRealVar alpha("alpha","alpha",0,1.0) ;
        RooLinearMorph rlm("rlm","rlm",g1,g2,x,alpha) ;

        // Plot halfway shape
        alpha=0.5
        RooPlot* frame = x.frame() ;
        rlm.plotOn(frame) ;
In short the algorithm works as follows: for both f1(x) and f2(x), the cumulative distribution functions F1(x) and F2(x) are calculated. One finds takes a value 'y' of both c.d.fs and determines the corresponding x values x1,x2 at which F1(x1)=F2(x2)=y. The value of the interpolated p.d.f fbar(x) is then calculated as fbar(alpha*x1+(1-alpha)*x2) = f1(x1)*f2(x2) / ( alpha*f2(x2) + (1-alpha)*f1(x1) ). Given that it is not easily possible to calculate the value of RooLinearMorph at a given value of x, the value for all values of x are calculated in one by (through a scan over y) and stored in a cache. NB: The range of the interpolation parameter does not need to be [0,1], it can be anything.

New workspace tool RooSimWSTool

A new tool to clone and customize p.d.f.s into a RooSimultaneous p.d.f has been added. This new tool succeeds the original RooSimPdfBuilder tool which had a similar functionality but has a much cleaner interface, partly thanks to its use of the RooWorkspace class for both input of prototype p.d.fs and output of built p.d.f.s

The simplest use case to to take a workspace p.d.f as prototype and 'split' a parameter of that p.d.f into two specialized parameters depending on a category in the dataset. For example, given a Gaussian p.d.f G(x,m,s) we want to construct a G_a(x,m_a,s) and a G_b(x,m_b,s) with different mean parameters to be fit to a dataset with observables (x,c) where c is a category with states 'a' and 'b'. Using RooSimWSTool one can create a simultaneous p.d.f from G_a and G_b from G with the following command
        RooSimWSTool wst(wspace) ;
        wst.build("G_sim","G",SplitParam("m","c")) ;
From this simple example one can go to builds of arbitrary complexity by specifying multiple SplitParam arguments on multiple parameters involving multiple splitting categories. Splits can also be performed in the product multiple categories, e.g.
        SplitParam("m","c,d")) ;
splits parameter m in the product of states of c and d. Another possibility is the 'constrained' split which clones the parameter for all but one state and insert a formula specialization in a chosen state that evaluates to 1 - sum_i(a_i) where a_i are all other specializations. For example, given a category c with state "A","B","C","D" the specification
     SplitParamConstrained("m","c","D")
will result in parameters m_A,m_B,m_C and a formula expression m_D that evaluates to (1-(m_A+m_B+m_C)). Constrained split can also be specified in product of categories. In that case the name of the remainder state follows the syntax {State1;State2} where State1 and State2 are the state names of the two spitting categories. Additional functionality exists to work with multiple prototype p.d.f.s simultaneously.

Improved infrastructure for caching p.d.f and functions

The infrastructure that exists for caching p.d.f.s, i.e. p.d.f that precalculate their value for all observable values at one and cache those in a histogram that is returned as p.d.f shape (with optional interpolation), has been expanded. This infrastructure comprises RooAbsCached the base class for all caching p.d.fs, RooAbsSelfCachedPdf a base class for end-user caching p.d.f implementations that simply cache the result of evaluate() and RooCachedPdf that can wrap and cache any input p.d.f specified in its constructor.

By default a p.d.f is sampled and cached in all observables in any given use context, with no need to specify what those are in advance. The internal code has also been changed such that all cache histograms now store pre-normalized p.d.f, which is more efficient than 'raw' p.d.f histograms that are explicitly post-normalized through integration. Multiple different use cases (e.g. definitions of what are observables vs parameters) can be cached simultaneously. Now it is also possible to specify that p.d.f.s should be sampled and cached in one or more parameter dimensionsal in addition to the automatically determined set of observables. as well.

Also a complete new line of classes with similar functionality has been added inheriting from RooAbsReal. These are RooAbsCachedReal,RooAbsSelfCachedReal and RooCachedReal. A newly added class RooHistFunc presents these shapes and is capable of handling negative entries.

New PDF error handling structure

New infrastructure has been put into place to propagate and process p.d.f evaluation errors during fitting. Previously evaluation errors were marked with a zero p.d.f value and propagated as a special condition in RooAddPdf, RooProdPdf etc to result in a zero top-level p.d.f value that was caught by the RooFit minuit interface as a special condition. Summary information on the value of the parameters and the observables was printed for the first 10 occurrences of such conditions.

Now, each p.d.f component that generates an error in its evaluation logs the error into a separate facility during fitting and the the RooFit minuit interface polls this error logging facility for problems. This allows much more detailed and accurate warning messages during the minimization phase. The level of verbosity of this new error facility can be controlled with a new
        PrintEvalErrors(Int_t code)
argument to fitTo().

The new-style error logging is active whenever MINUIT is operating on such a p.d.f. The default value for N is 3. Outside the MINUIT context the evaluation error each evualuation error will generate a separate message through RooMsgService

Other new features