PROOF
- New functionality
- PROOF-Lite
- 2-tier
realization of PROOF intended for multi-core machines; the client
starts directly the workers; no daemon is required. To start a session
just use TProof::Open("") or TProof::Open("lite"). From there on
everything should be as in normal PROOF, though some functionality may
not have been ported yet. To start a standard PROOF
session (i.e. via daemons) on the localhost use
TProof::Open("localhost").
- XrdProofd plug-in
- Possibility to define the list worker directly in the
xrootd config file (new directive xpd.worker, see Wiki reference pages)
- Support for automatic reconnections in the case xrootd
is restarted
- Dedicated admin area (under <xrd.admin>/.xproofd.<port>) to
keep information about active and terminated sessions, and active
clients. This is used to reguraly check the client and session
activity, to cleanup orphalin sessions and to shutdown inactive client
connections.
- domain + level control of printout message
- Dynamic "per-query" scheduling
- Dynamic worker startup. It can be enabled by the cluster
administrator with the 'xpd.putrc Proof.DynamicStartup 1' directive
in the config file. The effect is that a session starts only on
the master. When a query is submitted (call to TProof::Process),
the session master contacts the scheduler.
In response it receives a list of workers and starts the worker
processes. The environment is copied from the master to the workers.
It consist of: the include and library paths, the set of enabled
packages as well as the macros loaded by the user.
- Flexible and fault-tolerant workers
- A packet resubmitting mechanism. When a worker dies all the
packets that it processed are resubmitted.
- Added the possibility to handle dynamically removed workers and partly processed
packets (when a worker is stopped while processing a packet it finishes
the current event and the rest of the packet is reassigned to another workers).
It's done by a new method TPacketizerAdaptive::AddProcessed(TSlave *sl,
TProofProgressStatus *st, TList **) and TPacketizerAdaptive::ReassignPacket.
- Memory control
- Add
the possibility to display the memory footprint on workers and master as a
function of the entry processed (workers) or of the merging step
(master). A new button has been added to the PROOF dialog box to
retrieve and display the memory usage. On the workers about 100
measurements are recorded by default; this number can be changed with 'proof->SetParameter("PROOF_MemLogFreq", memlogfreq)';
- Add
the possibility to set upper limits on the virtual memory used by
processes; the session gets firts a warning when it reaches 80% of
the limit, and then processing is stopped whenit exceeds 95% of the
limit, sending back the results. Also, the memory footprint is notified
when the session is terminated. The limit in MBs is set by the
environment variable "ROOTPROOFASSOFT". An hard limit can be set via the
env "ROOTPROOFASHARD" (also in MBs): the process is automatically
killed by the system if it reaches this limit. Envs variables for the
PROOF processes can be set using the directive 'xpd.putenv' in the
xrootd config file.
- Input data
- Introduce the
concept of 'input data': these are objects that are distributed in
optimal way to the workers, which are available via the input list, but
which are not saved in the TQueryResult object. These are meant for big
objects whic can create a big overload when distributed via the
standard input list (which should mostly be used for job control
parameters). To add an input-data object just use
TProof::AddInputData(TObject *); if the input-data objects are in a
file you can use TProof::SetInputDataFile(const char *file); the final
set of input-data objects is assembled from the objects added via
AddInputData and those found in the file defined bySetInputDataFile.
- Improvements:
- More
complete set of tests in test/stressProof . To run with PROOF-Lite pass
the argument 'lite' as master URL, e.g. './stressProof lite'.
- Possibility
to control on the client via rc variable the location of the sandbox,
package directory, cache and dataset directory (the latters two only
for PROOF-Lite); the variable names are 'Proof.Sandbox',
'Proof.PackageDir', 'Proof.CacheDir' and 'Proof.DataSetDir'. The default location of the sandbox has been changed from "~/proof" to "~/.proof" to avoid interferences with possible users' working areas.
- XrdProofd plug-in
- Overall refactorization for easier
maintainance and improved solidity
- Improved format of printout messages: all information
messages contain now the tag 'xpd-I' and all error messages the
tag 'xpd-E', so that they can easily be grepped out from the
log file.
- Log sending
-
Implement selective sending of logs from workers to master to avoid duplicating
too many text lines on the master log. Logs are now sent only after Exec, Print
requests and in case an error (level >= kError) occured. Of course, the full
logs can always be retrieved via TProofMgr::GetSessionLogs
- Log retrieval:
- for 'grep' operations, use the system 'grep' command
via 'popen'
instead of a handmade filtering; this implies that the full grep
functionality is now available
- set the default number of displayed lines to 100
instead of 10
- Improve diagnostic in case of worker death: clients will
now
receive a message containing the low level reason for the failure and a
hint for getting more information
- In
TProofOutputFile, support the "<user>" and "<group>"
placeholders in the output file name to automatically re-direct the
output to an area specific to the logged user.
- Addition of a new class TProofProgressStatus, which is used to keep
the query progress stauts in all the TProofPlayer objects and in the
TPacketizerAdaptive. It is also send in kPROOF_GETPACKET and
kPROOF_STOPPROCESS messages.
- The class TPacketizerProgressive is removed.
- Fixes
- Enable
the max number of sessions ('mxsess' parameter in the xpd.schedparam
directive); users are just refused to start a session if this limit is
reached.
- Make sure to collect consistently input messages when running in asynchronous mode
- Fix
a few problems with TProof::SendFile (used by UploadPackage, Load)
appearing when a rapid sequence of these commands was submitted
- Invalidate the TProofMgr when the physical connection is
closed; avoids
crashing when trying to get the logs after a failure.
- Fix a memory leak in log retrieval (the TProofLog object
was never
deleted)
- Add protections for the cases the manager cannot be
initialized
- Fix a race condition possibly affecting the handling of
workers death
- Avoid duplicating worker logs in the master log file
unless
when explicitly needed by the request (Exec(...), Print(...)) or when
an error occured
- Fix
problem with the determination and transmission of the name of the
object to be processed. The problem appeared when processing files
containing >1 trees in changing order.
- Fix problem with TProof::Load loading the macro to one worker only per machine
- Fix wrong return code preventing the correct propagation of the full ClearPackage to workers
- Fix a problem causing the whole query to stop even in the case a worker was terminated gently with SIGTERM.
-
Fix a problem triggering full re-build of a package upon change of a
single file; the version info file was wrongly reset; this should
happen only after a re-build.
- Make sure that in case multiple TProofOutputFile are present, each get merged correctly
- Fix problem in TProofServLogHandler::Notify due to bad usage of Form(...).