This file contains information about using GPTL. For information on building and installing GPTL, see the file INSTALL. GPTL is the "General Purpose Timing Library". It can be used to manually instrument application codes with an arbitrary set of "regions" (or "timers") over which statistics such as wallclock time and CPU time are gathered and subsequently printed. If the target application is built with the GNU compilers (gcc or gfortran) or Pathscale (pathcc or pathf95), GPTL can also be used to automatically instrument regions which are defined by function entry and exit points. This is an easy way to generate a dynamic call tree. See Auto-Instrumentation below for a description of how to use this feature. If the PAPI library is installed (http://icl.cs.utk.edu/papi), GPTL also provides a convenient mechanism to access all available PAPI events. In addtion to PAPI preset and native events, GPTL defines derived events which are based on PAPI counters. See gptl.h for a list of available derived events. Of course these events can only be enabled if the PAPI counters they require are available on the target architecture. Using GPTL ---------- C codes making GPTL library calls should #include . Fortran codes should #include or Fortran include 'gptl.inc'. The C and Fortran interfaces are identical, except that the C interface uses mixed case. All user-accessible functions return either 0 (success) or -1 (failure). Example codes that use the library can be found in subdirectories ctests/ and ftests/. Code instrumentation to utilize GPTL involves zero or more calls to GPTLsetoption(), then a single call to GPTLinitialize(), then an arbitrary sequence of calls to GPTLstart() and GPTLstop(), and finally a call to GPTLpr() or GPTLpr_file(). See "Example" below for a sample calling sequence. Calls to GPTLstart() and GPTLstop() are thread-safe, with per-thread statistics printed by GPTLpr() or GPTLpr_file(). The purpose of GPTLsetoption() is to enable or disable various library options. For example, to enable the PAPI counter for total cycles, do this: ret = GPTLsetoption (PAPI_TOT_CYC, 1); The "1" says "enable". Use "0" for "disable". See the man pages for complete documentation on function usage and arguments. The list of available GPTL options is contained in gptl.h, and a complete list of available PAPI-based events can be found by running "ctests/avail". GPTLinitialize() initializes the GPTL library. There can be an arbitrary number of start/stop pairs before GPTLpr() or GPTLpr_file() is called to print the results. And an arbitrary amount of nesting of regions is also allowed. The printed results will be indented to indicate the level of nesting for each region. GPTLpr() prints the results to a file named timing., where the single argument is an integer. For MPI jobs, it is most convenient to use the MPI rank of the invoking task for . Equivalently, function GPTLpr_file() can be called. Its input argument is a character string indicating the output file name to be written. It is up to the user to ensure that these print functions write to uniquely-named files, in order to avoid name-space collisions. GPTLfinalize() can be called to clean up the GPTL environment. All space malloc'ed by the GPTL library will be freed by this call. Example ------- From "man GPTLstart", a simple example calling sequence to time a couple of code regions and print the results is: (void) GPTLsetoption (GPTLcpu, 1); /* enable cpu timings */ (void) GPTLsetoption (GPTLwall, 0); /* disable wallclock timings */ (void) GPTLsetoption (PAPI_TOT_CYC, 1); /* enable counting of total cycles */ ... (void) GPTLinitialize(); /* initialize the GPTL library */ (void) GPTLstart ("total"); /* start a timer */ ... (void) GPTLstart ("do_work"); /* start another timer */ do_work(); /* do some work */ (void) GPTLstop ("do_work"); /* stop a timer */ (void) GPTLstop ("total"); /* stop a timer */ ... (void) GPTLpr (mympitaskid); /* print the results to timing. */ Auto-instrumentation -------------------- If the regions to be timed are defined by function entry and exit points, and the application to be profiled is built with either the GNU or Pathscale compilers, you might find it convenient to use the auto-instrumentation feature of GPTL. Here's how: 1) Add the flag -finstrument-functions when compiling the routines you'd like to profile. 2) Add calls to GPTLsetoption() (if desired), and GPTLinitialize() to the main program before any other routines are invoked. 3) Add a call to GPTLpr() or GPTLpr_file() wherever appropriate prior to where the code terminates. 4) Link with -lgptl (and -lpapi if PAPI is enabled). 5) Run the code. 6) Run "hex2name.pl | less", where is the name of the executable, and is the name of the timing file to be converted. The result should be a dynamic call tree with timings and (if enabled) PAPI counts and derived event statistics for each region, where regions are defined by function entry and exit points. Here's what's happening under the covers: The -finstrument-functions flag tells the compiler to insert calls to __cyg_profile_func_enter (void *this_fn, void *call_site) at function start, and __cyg_profile_func_exit (void *this_fn, void *call_site) at function exit. GPTL defines these functions as calls to (effectively) GPTLstart() and GPTLstop(), where the address of the function is used as the input sentinel to these routines. Running hex2name.pl converts the function addresses back to human-readable function names. It uses the UNIX "nm" utility to do this. Multi-processor instrumented codes ---------------------------------- For instrumented codes which make use of threading and/or MPI, a post-processing script is provided to analyze GPTL output files and gather max/min/average stats on a per-region basis. The script is parsegptlout.pl. It might be invoked as, for example: parsegptlout.pl sub1 The script will look through all files in the current directory named timing.* for regions named "sub1", then gather and print various statistics. Numerous options are available. See "man parsegptlout" for more in-depth information.