originally written by Brendan Gregg in 2005
- Is it "DTrace", "Dtrace", "dTrace" or "DTRACE"?
- What is DTrace?
- What are the risks of DTrace?
- What Operating Systems have DTrace?*
- Why is DTrace different from other performance tools?
- What is DTrace used for?
- Who can use DTrace?
- Do I need to know kernel internals to use DTrace?
- Is there an easier way to use the power of DTrace?
- What are some DTrace success stories?
- Wasn't this invented 20 years ago on mainframes?
- Is the source code available?
- Are there books for DTrace?
- Will DTrace be released for Solaris 9?
- D Language
- How do I DTrace ...?
- Error Messages
- DTrace requires additional privileges
- drops on CPU #dtrace: 864476 drops on CPU 0
- invalid address (0x...) in action
- failed to create probe ... Not enough space
- See Also
DTrace is a performance analysis and troubleshooting tool that provides a comprehensive view of operating system and application behaviour. It has functionality similar to many other performance tools combined, bundled into a single scriptable tool that can examine both userland activity and the kernel. DTrace was designed to be safe for use on live production servers, and to operate with minimum performance overhead.
Safety was a key design tenet in the development of DTrace. Since its release in 2005, DTrace has proven safe for production use, as it was designed to be. In some production environments DTrace is running continually 24x7 without harm or even serious performance degradation.
DTrace was developed as key Solaris 10 technology by former Sun Microsystems engineers Bryan Cantrill, Mike Shapiro and Adam Leventhal, and was released in March 2005 with the first release of Solaris 10. It's now available for Illumos, SmartOS, FreeBSD and Mac OS X, and two different ports for Linux are in early stages.
- greater observability
- production safe
- realtime data
For example, DTrace can fetch latency metrics from functions in the kernel and applications at the same time, summarizing the data in a low cost manner, and passing the summaries every second to user-land for real-time visualization.
Performance analysis, observability, troubleshooting, debugging. Examples include watching disk I/O details live, and timing userland functions to determine hotspots.
Firstly, you need to be root or have one of the DTrace privileges to be able to invoke DTrace.
- Sysadmins can use DTrace to understand the behaviour of the operating system and applications.
- Application Programmers can use DTrace to fetch timing and argument details from the functions that they wrote, both in development and from live customer production environments.
- Kernel and Device Driver Engineers can use DTrace to debug a live running kernel and all its modules, without needing to run drivers in debug mode.
No, although it can help. The following points should explain:
- You can get value from DTrace by using the many pre-written and documented scripts available from:
- However, it's useful to learn to write your own custom scripts, to solve your specific issues.
- There are many high level "providers" carefully designed to provide a succinct, stable and documented abstraction of the kernel (see the DTrace Guide, eg: proc, io, sched, sysinfo, vminfo), which make tracing the kernel much easier than it may sound.
- No kernel knowledge is required to study user-level application code only. Application developers can study the functions that they wrote, and that they are already familiar with.
- Understanding the OS kernel is necessary for writing advanced DTrace scripts for which there is currently no high level provider; for example, to examine TCP and IP activity in detail. Solaris Internals 2nd Edition is highly recommended.
- If you're a customer of the Joyent Public Cloud (including no.de), you can use Joyent's powerful Cloud Analytics to observe the performance of your Joyent SmartMachine.
- With Joyent's SmartDataCenter, you can apply the power of Cloud Analytics across all nodes, to see what's going on throughout your cloud datacenter.
- You can also use Joyent's Cloud Analytics API to create custom instrumentations and visualizations for any or all of your nodes in the Joyent Public Cloud, or your entire SmartDataCenter.
DTrace has had countless wins, see the blogs on dtrace.org for some examples.
No! DTrace can dynamically trace every function entry and return in the live kernel (around 36,000 probes); plus every function in user-level application code and libraries (for example, mozilla + libraries is over 100,000 probes); and user-level instructions (over 200,000 probes - just for the Bourne shell).
Yes, Sun Microsystems released it in January, 2005, and it was the first major component of the Solaris source to be open-sourced.
- The DTrace Guide is a superb reference for DTrace which covers the language, providers, and is packed with examples. It was written by the DTrace engineers, and is the authorative reference. This entire book is available online in both HTML and PDF format, at no charge. A hardcopy is available to purchase from iUniverse.
- Solaris Performance and Tools demonstrates using DTrace in practical ways for performance observability and debugging. It was written by Richard McDougall and Jim Mauro (who also wrote Solaris Internals), and Brendan Gregg (DTraceToolkit).
- The DTrace book by Brendan Gregg and Jim Mauro, published in early 2011, is a comprehensive "cookbook" on all aspects of DTrace. Sample chapter here.
There are also lots of videos about DTrace, search YouTube or start here.
No. (this used to be FAQ #1 back in '05).
The D programming language is based on C, and so any background in C programming will help. D is arguably far easier than C, as you only need to know a small number of functions and variable types to be able to write powerful scripts.
D programs are similar in form to awk programs: they are not a top-down programS, but action-based.
A probe is an instrumentation point that can be traced by DTrace. For example, the probe "syscall::read:entry" is called when a read(2) syscall is called, and "syscall::read:return" is called when a read(2) syscall completes. There are four components to the probe name, provider:module:function:name. Provider is the most significant, the role of the other names are explained in the DTrace guide.
A provider is a collection of related probes, much like a library is a collection of functions. For example, the "syscall" provider provides probes for the entry and return for all system calls. The DTrace guide lists the providers as seperate chapters.
System calls can be easily traced using the syscall provider, which provides a probe for both the entry and the return of the syscall, and variables for the entry arguments and the return code. As the midway point between user-land and the kernel, the syscall interface often reflects application behaviour well. Each syscall is also well documented in section 2 of the man pages. The following are some example DTrace one-liners.
Files opened by process name
Syscall count by process name
Syscall count by syscall
Of particular value may be to measure the elapsed time and on-CPU time of system calls, to both explain response time and CPU load. The procsystime tool from the DTraceToolkit does this using the -e and -o flags.
Disk events can be traced using the io provider, which provides probes for the request and completion of both disk and client NFS I/O. Each probe provides extensive details of the I/O through the args array, as documented in the DTrace guide. The following lists the disk related probes.
Points to bear in mind when using the io provider for tracing disk activity:
- This is actual disk I/O requests. Your application may be doing loads of I/O which is being absorbed by the file system cache.
- I/O completions (io:::done) are asynchronous, so pid and execname will not identify the responsible process.
- Disk write requests (io:::start) often occur asynchronously to the responsible process, as the file system has cached the write and is flushed to storage at a later time.
- io events don't necessarily mean that disk heads are moving somewhere - many disks have buffers to cache I/O activity, especially storage arrays.
The following are some example one-liners.
Disk size by process ID
Disk size aggregation
The DTraceToolkit contains many tools for analysing disk I/O, including:
- iosnoop - snoop I/O events as they occur
- iotop - display top disk I/O events by process
- bitesize.d - print disk event size report
- iofile.d - I/O wait time by filename and process
- iopattern - print disk I/O pattern
- seeksize.d - print disk seek size report
You must either be root or have additional privileges to be able to use DTrace. Those privileges are:
- dtrace_user - allows the use of profile, syscall and fasttrap providers, on processes that the user owns.
- dtrace_proc - allows the use of the pid provider on processes that the user owns.
- dtrace_kernel - allows most providers to probe everything, in read only mode.
Privileges can be added to a process (such as a user's shell) temporarily by using the ppriv(1) command. For example, to add dtrace_user to PID 1851, ppriv -s A+dtrace_user 1851
usermod can be used to make this a permanent change to a user account. For example, usermod -K defaultpriv=basic,dtrace_user brendan
dtrace: 2179050 drops on CPU 0
dtrace: 1343451 drops on CPU 0
The DTrace kernel buffer is overflowing due to output being generated too quickly for /usr/sbin/dtrace to read. This usually happens when your script would output hundreds of screens of text per second. Some remedies:
- Increase the switchrate of /usr/sbin/dtrace, so that rather than flushing the buffer at 1 Hertz (default), it is reading the buffer faster. At the command line this can be -x switchrate=10hz.
- Increase the size of the DTrace primary buffer. By default this is usually 4 Mbytes per CPU. At the command line it can be increased, eg -b 8m.
- Do you really want that much data to be output? Try to probe fewer events. Also, aggregations can be used so that DTrace can summarise the data and output the the final report, avoiding an output buffer overflow.
This error is caused when DTrace attempts to dereference a memory address which isn't mapped. In the above example, the arg0 variable for the open(2) syscall refers to a user-land address, however DTrace executes in the kernel address space; this example can be fixed by changing stringof to copyinstr. Listing remedies:
- Use either copyin() or copyinstr() to copy the data from user-land into the kernel.
- Attempt to dereference on the return of a function, not the entry. On the entry, an address may be valid but not faulted in.
DTrace ran out of RAM when trying to create probes. This can happen if you attempt to probe far too many events. For example, here we leave fields blank in our probe description (wildcards), and so our probe description will attempt to match every instruction from every function of mozilla (which would be millions of probes).
In this case, perhaps we meant to probe just function entries - pid$target:::entry, or perhaps instructions from just one library - pid$target:libaio::.