10.4. Performance Tools
- Support for Performance Co-Pilot;
- SystemTap support for (DynInst-based) instrumentation that runs entirely in unprivileged user space, as well as efficient (Byteman-based) pinpoint probing of Java applications;
- Valgrind support for hardware transactional memory and improvements in modeling vector instructions.
10.4.1. Performance Co-Pilot
/usr/share/doc/pcp-doc/*directory, which also includes the Performance Co-Pilot User's and Administrator's Guide as well as the Performance Co-Pilot Programmer's Guide.
- Using the
dyninstbinary-editing library, SystemTap can now execute some scripts purely at user-space level; no kernel or root privileges are used. This mode, selected using the
stap --dyninsti command, enables only those types of probes or operations that affect only the user's own processes. Note that this mode is incompatible with programs that throw C++ exceptions;
- A new way of injecting probes into Java applications is supported in conjunction with the byteman tool. New SystemTap probe types,
java("com.app").class("class_name").method("name(signature)").*, enable probing of individual method
exitevents in an application, without system-wide tracing;
- A new facility has been added to the SystemTap driver tooling to enable remote execution on a libvirt-managed KVM instance running on a server. It enables automated and secure transfer of a compiled SystemTap script to a virtual machine guest across a dedicated secure virtio-serial link. A new guest-side daemon loads the scripts and transfers their output back to the host. This way is faster and does not require IP-level networking connection between the host and the guest. To test this function, run the following command:
- In addition, a number of improvements have been made to SystemTap's diagnostic messages:
- Many error messages now contain cross-references to the related manual pages. These pages explain the errors and suggest corrections;
- If a script input is suspected to contain typographic errors, a sorted suggestion list is offered to the user. This suggestion facility is used in a number of contexts when user-specified names may mismatch acceptable names, such as probed function names, markers, variables, files, aliases, and others;
- Diagnostic duplicate-elimination has been improved;
- ANSI coloring has been added to make messages easier to understand.
- Support for IBM System z Decimal Floating Point instructions on hosts that have the DFP facility installed;
- Support for IBM POWER8 (Power ISA 2.07) instructions;
- Support for Intel AVX2 instructions. Note that this is available only on 64-bit architectures;
- Initial support for Intel Transactional Synchronization Extensions, both Restricted Transactional Memory (RTM) and Hardware Lock Elision (HLE);
- Initial support for Hardware Transactional Memory on IBM PowerPC;
- The default size of the translation cache has been increased to 16 sectors, reflecting the fact that large applications require instrumentation and storage of huge amounts of code. For similar reasons, the number of memory mapped segments that can be tracked has been increased by a factor of 6. The maximum number of sectors in the translation cache can be controlled by the new flag
- Valgrind no longer temporarily creates a mapping of the entire object to read from it. Instead, reading is done through a small fixed sized buffer. This avoids virtual memory spikes when Valgrind reads debugging information from large shared objects;
- The list of used suppressions (displayed when the
-voption is specified) now shows, for each used suppression, the file name and line number where the suppression is defined;
- A new flag,
--sigill-diagnosticscan now be used to control whether a diagnostic message is printed when the just-in-time (JIT) compiler encounters an instruction it cannot translate. The actual behavior — delivery of the SIGILL signal to the application — is unchanged.
- The Memcheck tool has been improved with the following features:
- Improvements in handling of vector code, leading to significantly fewer false error reports. Use the
--partial-loads-ok=yesflag to get the benefits of these changes;
- Better control over the leak checker. It is now possible to specify which kind of leaks (definite, indirect, possible, and reachable) should be displayed, which should be regarded as errors, and which should be suppressed by a given leak suppression. This is done using the options
--errors-for-leak-kinds=kind1,kind2,..and an optional
match-leak-kinds:line in suppression entries, respectively;Note that generated leak suppressions contain this new line and are therefore more specific than in previous releases. To get the same behavior as previous releases, remove the
match-leak-kinds:line from generated suppressions before using them;
possible leakreports from the leak checker by the use of better heuristics. The available heuristics provide detection of valid interior pointers to std::stdstring, to new allocated arrays with elements having destructors, and to interior pointers pointing to an inner part of a C++ object using multiple inheritance. They can be selected individually using the
- Better control of stacktrace acquisition for heap-allocated blocks. Using the
--keep-stacktracesoption, it is possible to control independently whether a stack trace is acquired for each allocation and deallocation. This can be used to create better "use after free" errors or to decrease Valgrind's resource consumption by recording less information;
- Better reporting of leak suppression usage. The list of suppressions used (shown when the
-voption is specified) now shows, for each leak suppression, how many blocks and bytes it suppressed during the last leak search.
- The Valgrind GDB server integration has been improved with the following monitoring commands:
- A new monitor command,
v.info open_fds, that gives the list of open file descriptors and additional details;
- A new monitor command,
v.info execontext, that shows information about the stack traces recorded by Valgrind;
- A new monitor command,
v.do expensive_sanity_check_general, to run certain internal consistency checks.