View Source

originally written by [Brendan Gregg|] in 2005

h2. General

h3. Is it "DTrace", "Dtrace", "dTrace" or "DTRACE"?

It's "DTrace".

h3. What is DTrace?

DTrace is a performance analysis and troubleshooting tool that provides a comprehensive view of operating system and application behaviour. It has functionality similar to many other performance tools combined, bundled into a single scriptable tool that can examine both userland activity and the kernel. DTrace was designed to be safe for use on live production servers, and to operate with minimum performance overhead.

h3. What are the risks of DTrace?

Safety was a [key design tenet|] in the development of DTrace. Since its release in 2005, DTrace has proven safe for production use, as it was designed to be. In some production environments DTrace is running continually 24x7 without harm or even serious performance degradation.

h3. What Operating Systems have DTrace?\*

DTrace was developed as key Solaris 10 technology by former Sun Microsystems engineers [Bryan Cantrill|], [Mike Shapiro|] and [Adam Leventhal|], and was released in March 2005 with the first release of Solaris 10. It's now available for [Illumos|], SmartOS, [FreeBSD|] and [Mac OS X|], and two different ports for Linux are in early stages.

h3. Why is DTrace different from other performance tools?

* greater observability
* production safe
* realtime data

For example, DTrace can fetch latency metrics from functions in the kernel and applications at the same time, summarizing the data in a low cost manner, and passing the summaries every second to user-land for real-time visualization.

h3. What is DTrace used for?

Performance analysis, observability, troubleshooting, debugging. Examples include watching disk I/O details live, and timing userland functions to determine hotspots.

h3. Who can use DTrace?

Firstly, you need to be root or have one of the DTrace privileges to be able to invoke DTrace.
* Sysadmins can use DTrace to understand the behaviour of the operating system and applications.
* Application Programmers can use DTrace to fetch timing and argument details from the functions that they wrote, both in development and from live customer production environments.
* Kernel and Device Driver Engineers can use DTrace to debug a live running kernel and all its modules, without needing to run drivers in debug mode.

h3. Do I need to know kernel internals to use DTrace?

No, although it can help. The following points should explain:
* You can get value from DTrace by using the many pre-written and documented scripts available from:
** the [DTraceToolkit|] \- many of which are [included in Mac OS X|]
** the scripts and one-liners documented in [Solaris Peformance and Tools|] (2005)
** the scripts and one-liners documented in the [DTrace book|] (2011)
* However, it's useful to learn to write your own custom scripts, to solve your specific issues.
* There are many high level "providers" carefully designed to provide a succinct, stable and documented abstraction of the kernel (see the [DTrace Guide|], eg: proc, io, sched, sysinfo, vminfo), which make tracing the kernel much easier than it may sound.
* No kernel knowledge is required to study user-level application code only. Application developers can study the functions that they wrote, and that they are already familiar with.
* Understanding the OS kernel is necessary for writing advanced DTrace scripts for which there is currently no high level provider; for example, to examine TCP and IP activity in detail. [Solaris Internals 2nd Edition|] is highly recommended.

h3. Is there an easier way to use the power of DTrace?

The power of DTrace underlies several observability tools, including a [GUI for Netbeans|], [Mac OS X Instruments|], Analytics for the Oracle ZFS Storage product family.

And we've been tapping Joyent's internal DTrace expertise ([Bryan Cantrill|], [Brendan Gregg|], [Dave Pacheco|], [Robert Mustacchi|]) to make the power of DTrace ever more accessible:

* If you're a customer of the Joyent Public Cloud (including [|]), you can use Joyent's powerful [Cloud Analytics|] to observe the performance of your Joyent SmartMachine. 
* With Joyent's SmartDataCenter, you can [apply the power of Cloud Analytics across all nodes|], to see what's going on throughout your cloud datacenter.
* You can also use Joyent's [Cloud Analytics API|] to create custom instrumentations and visualizations for any or all of your nodes in the Joyent Public Cloud, or your entire SmartDataCenter.

h3. What are some DTrace success stories?

DTrace has had countless wins, see the blogs on [|] for some examples.

h3. Wasn't this invented 20 years ago on mainframes?

No\! DTrace can dynamically trace every function entry and return in the live kernel (around 36,000 probes); plus every function in user-level application code and libraries (for example, mozilla + libraries is over 100,000 probes); and user-level instructions (over 200,000 probes - just for the Bourne shell). 

h3. Is the source code available?

Yes, Sun Microsystems released it in January, 2005, and it was the first major component of the Solaris source to be open-sourced.

h3. Are there books for DTrace?

* The [DTrace Guide|] is a superb reference for DTrace which covers the language, providers, and is packed with examples. It was written by the DTrace engineers, and is the authorative reference. This entire book is available online in both HTML and PDF format, at no charge. A hardcopy is available to [purchase|] from iUniverse.
* [Solaris Performance and Tools|,1144,0131568191,00.html] demonstrates using DTrace in practical ways for performance observability and debugging. It was written by Richard McDougall and Jim Mauro (who also wrote [Solaris Internals|]), and [Brendan Gregg|] (DTraceToolkit).
* The [DTrace book|] by Brendan Gregg and Jim Mauro, published in early 2011, is a comprehensive "cookbook" on all aspects of DTrace. Sample chapter [here|].

There are also lots of videos about DTrace, search YouTube or start [here|].

h3. Will DTrace be released for Solaris 9?

No. (this used to be FAQ #1 back in '05).

h2. D Language

h3. What language is D most like?

The D programming language is based on C, and so any background in C programming will help. D is arguably far easier than C, as you only need to know a small number of functions and variable types to be able to write powerful scripts.

D programs are similar in form to awk programs: they are not a top-down programS, but action-based.

h3. What are Probes and Providers?

A *probe* is an instrumentation point that can be traced by DTrace. For example, the probe "syscall::read:entry" is called when a read(2) syscall is called, and "syscall::read:return" is called when a read(2) syscall completes. There are four components to the probe name, provider:module:function:name. Provider is the most significant, the role of the other names are explained in the [DTrace guide|].

A *provider* is a collection of related probes, much like a library is a collection of functions. For example, the "syscall" provider provides probes for the entry and return for all system calls. The DTrace guide lists the providers as seperate chapters.

h2. How do I DTrace ...?

h3. Syscalls

System calls can be easily traced using the syscall provider, which provides a probe for both the entry and the return of the syscall, and variables for the entry arguments and the return code. As the midway point between user-land and the kernel, the syscall interface often reflects application behaviour well. Each syscall is also well documented in section 2 of the man pages. The following are some example DTrace one-liners.

Files opened by process name
# dtrace -n 'syscall::open*:entry { printf("%s %s",execname,copyinstr(arg0)); }'
dtrace: description 'syscall::open*:entry ' matched 2 probes
0 6329 open:entry df /var/ld/ld.config
0 6329 open:entry df /usr/lib/
0 6329 open:entry df /usr/lib/
0 6329 open:entry df /etc/mnttab

Syscall count by process name
# dtrace -n 'syscall:::entry { @num[execname] = count(); }'
dtrace: description 'syscall:::entry ' matched 228 probes
svc.startd 1
mozilla-bin 26
sshd 58
bash 88
dtrace 95
df 108

Syscall count by syscall
# dtrace -n 'syscall:::entry { @num[probefunc] = count(); }'

# dtrace: description 'syscall:::entry ' matched 228 probes
lwp_self 1
write 33
sigaction 33
lwp_sigmask 53
ioctl 95

Of particular value may be to measure the elapsed time and on-CPU time of system calls, to both explain response time and CPU load. The procsystime tool from the&nbsp;[DTraceToolkit|]&nbsp;does this using the&nbsp;-e&nbsp;and&nbsp;-o&nbsp;flags.

h3. Disk I/O

Disk events can be traced using the&nbsp;io&nbsp;provider, which provides probes for the request and completion of both disk and client NFS I/O. Each probe provides extensive details of the I/O through the&nbsp;args\[\]&nbsp;array, as documented in the DTrace guide. The following lists the disk related probes.

# dtrace -ln 'io:genunix::'
9571 io genunix biodone done
9572 io genunix biowait wait-done
9573 io genunix biowait wait-start
9582 io genunix default_physio start
9583 io genunix bdev_strategy start
9584 io genunix aphysio start

Points to bear in mind when using the io provider for tracing disk activity:
* This is actual disk I/O requests. Your application may be doing loads of I/O which is being absorbed by the file system cache.
* I/O completions (io:::done) are asynchronous, so&nbsp;pid&nbsp;and&nbsp;execname&nbsp;will not identify the responsible process.
* Disk write requests (io:::start) often occur asynchronously to the responsible process, as the file system has cached the write and is flushed to storage at a later time.
* io&nbsp;events don't necessarily mean that disk heads are moving somewhere - many disks have buffers to cache I/O activity, especially storage arrays.

The following are some example one-liners.

Disk size by process ID

# dtrace -n 'io:::start { printf("%d %s %d",pid,execname,args[0]->b_bcount); }'
dtrace: description 'io:::start ' matched 6 probes
0 9583 bdev_strategy:start 8238 tar 1024
0 9583 bdev_strategy:start 8238 tar 4096
0 9583 bdev_strategy:start 8238 tar 4096
0 9583 bdev_strategy:start 8238 tar 1024
0 9583 bdev_strategy:start 8238 tar 1024
0 9583 bdev_strategy:start 8238 tar 2048

Disk size aggregation

# dtrace -n 'io:::start { @size[execname] = quantize(args[0]->b_bcount); }'
dtrace: description 'io:::start ' matched 6 probes
value ------------- Distribution ------------- count
512 | 0
1024 |@@ 37
2048 |@@@@@@@ 114
4096 |@@@@@@@ 116
8192 |@@@@@@@@@@@@@@@@@ 286
16384 |@@ 33
32768 |@@@@@ 87
65536 | 0

The [DTraceToolkit|] contains many tools for analysing disk I/O, including:
* iosnoop - snoop I/O events as they occur
* iotop - display top disk I/O events by process
* bitesize.d - print disk event size report
* iofile.d - I/O wait time by filename and process
* iopattern - print disk I/O pattern
* seeksize.d - print disk seek size report

h2. Error Messages

h3. DTrace requires additional privileges

You must either be root or have additional privileges to be able to use DTrace. Those privileges are:
* dtrace_user - allows the use of profile, syscall and fasttrap providers, on processes that the user owns.
* dtrace_proc - allows the use of the pid provider on processes that the user owns.
* dtrace_kernel - allows most providers to probe everything, in read only mode.

Privileges can be added to a process (such as a user's shell) temporarily by using the ppriv(1) command. For example, to add dtrace_user to PID 1851, ppriv \-s A+dtrace_user 1851
usermod can be used to make this a permanent change to a user account. For example, usermod \-K defaultpriv=basic,dtrace_user brendan

h3. drops on CPU #dtrace: 864476 drops on CPU 0

dtrace: 2179050 drops on CPU 0
dtrace: 1343451 drops on CPU 0

The DTrace kernel buffer is overflowing due to output being generated too quickly for /usr/sbin/dtrace to read. This usually happens when your script would output hundreds of screens of text per second. Some remedies:
* Increase the switchrate of /usr/sbin/dtrace, so that rather than flushing the buffer at 1 Hertz (default), it is reading the buffer faster. At the command line this can be&nbsp;-x switchrate=10hz.
* Increase the size of the DTrace primary buffer. By default this is usually 4 Mbytes per CPU. At the command line it can be increased, eg&nbsp;-b 8m.
* Do you really want that much data to be output? Try to probe fewer events. Also, aggregations can be used so that DTrace can summarise the data and output the the final report, avoiding an output buffer overflow.

h3. invalid address (0x...) in action

# dtrace -n 'syscall::open:entry { trace(stringof(arg0)); }'

dtrace: description 'syscall::open:entry ' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 6329: syscall::open:entry):
invalid address (0xd27f7a24) in action #1
dtrace: error on enabled probe ID 1 (ID 6329: syscall::open:entry):
invalid address (0xd27fbf38) in action #1

This error is caused when DTrace attempts to dereference a memory address which isn't mapped. In the above example, the&nbsp;arg0&nbsp;variable for the open(2) syscall refers to a user-land address, however DTrace executes in the kernel address space; this example can be fixed by changing stringof to copyinstr. Listing remedies:
* Use either&nbsp;copyin()&nbsp;or&nbsp;copyinstr()&nbsp;to copy the data from user-land into the kernel.
* Attempt to dereference on the return of a function, not the entry. On the entry, an address may be valid but not faulted in.

h3. failed to create probe ... Not enough space

DTrace ran out of RAM when trying to create probes. This can happen if you attempt to probe far too many events. For example, here we leave fields blank in our probe description (wildcards), and so our probe description will attempt to match every instruction from every function of mozilla (which would be millions of probes).

# dtrace -ln 'pid$target:::' -p `pgrep mozilla-bin`
dtrace: invalid probe specifier pid$target:::: failed to create probe in process 7424:
Not enough space

In this case, perhaps we meant to probe just function entries \-&nbsp;pid$target:::entry, or perhaps instructions from just one library \-&nbsp;pid$target:libaio::.

h2. See Also

* [DTrace Tips, Tricks and Gotchas|]