Java Performance Basics and Tools

JAVA Performance Monitoring, Analysis and Diagnosis

The jmap utility can be used to create a heap dump of a running process. It is recommended to use the latest utility, jcmd instead of jmap utility for enhanced diagnostics and reduced performance overhead.

The command in Example 1 creates a heap dump for a running process using jcmd and results similar to the jmap command in Example 2.

For additional Heap dump collection examples, please see the Heap Dump collection instructions :Whenever heap dump is collected for running java process, we see lots of java.lang.ref.Finalizer objects. To minimize these objects, follow the instructions.

Regardless of how the JVM was started, the jmap tool produces a head dump snapshot in the preceding example in a file named snapshot.jmap.

If the " java.io.IOException: well-known file is not secure " error occurs, it mostly will be due to a mismatch of the user executing command and user started the jvm process. Please use following steps in such case

If you specify the -XX:+HeapDumpOnOutOfMemoryError command-line option while running your application, then when an OutOfMemoryError exception is thrown, the JVM will generate a heap dump.

Reference Link: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks004.html#CIHJIIBA

Example 1: Create a Heap Dump using jcmd

$> $JAVA_HOME/bin/jcmd <process id/main class> GC.heap_dump

Example 2: Create a Heap Dump using jmap (DO NOT use "-F" with jmap for well running process, it will cause a STUCK thread)

$> $JAVA_HOME/bin/jmap -dump:format=b

Run command to understand current heap usage details. /u01/jdk/bin/jstat -gc <pid>

In jstat output, we are more interested in "Old space utilization in KB (OU)"

Run command to request execution of finalizer /u01/jdk/bin/jcmd <pid> GC.run_finalization

Wait for 30 seconds by running command sleep 30s

Run command to understand current heap usage details /u01/jdk/bin/jstat -gc <pid>

Run command to collect heap dump /u01/jdk/bin/jcmd <pid> GC.heap_dump <filename>

Compress the file using command tar cvfz <filename>.tar.gz <filename>

Check the user and group of the jvm process using command “ps -ao uid,gid,args <pid>”

Check current user id and group with command “id”

If your current users group does not match that of the process, login to correct group using command “newgrp <group>” . This will change the group of current shell only.

Execute the jfr command and verify successful execution

Using JConsole

Attach the jConsole to process

In the MBeans tab, select the HotSpotDiagnostic MBean, then the Operations display, and choose the dumpHeap operation. Automatic on OOM.

Open the Firefox browser

Select Tools → Web Developer → Network or press <F12> when browser is open.

Select the Clear button (looks like a circle with a line though it). Make sure "All" is selected to capture all types of calls.

Perform the transaction or activity which is slow in UI or by loading the page in browser.

It will start capturing all the calls made to server in the window below. Wait for page to load/response.

Right click on calls captured in network tab and select option - "Save all as HAR with contents"

Save file as HAR with relevant name on machine.

Use “Clear”option, before starting next capture.

First clear the old capture by clicking clear button under network tab (Next to red recording button). It is not clearing cookies or cache in browser.

Open the IE/Edge browser

Select Developer Tools → network or press <F12> when browser is open.

Select the Clear button (looks like lines with a red X in the top left corner). Make sure all types are selected to capture all types of calls.

Perform the transaction or activity which is slow in UI or by loading the page in browser.

It will start capturing all the calls made to server in the window below. Wait for page to load/response.

Right click on calls captured in network tab and select option - "Save all as HAR with contents"

Save file as HAR with relevant name on machine.

Use “Clear”option, before starting next capture.

First clear the old capture by clicking clear button under network tab (Next to red recording button). It is not clearing cookies or cache in browser.

Open the Chrome browser

Select More tools → developer tools → network or press <F12> when browser is open.

Select the Clear button (looks like a circle with a line though it). Make sure "All" is selected to capture all types of calls.

Perform the transaction or activity which is slow in UI or by loading the page in browser.

It will start capturing all the calls made to server in the window below. Wait for page to load/response.

Right click on calls captured in network tab and select option - "Save all as HAR with contents"

Save file as HAR with relevant name on machine.

Use “Clear”option, before starting next capture.

First clear the old capture by clicking clear button under network tab (Next to red recording button). It is not clearing cookies or cache in browser.

Open Chrome → press <F12> → Click on the Network Tab →Drag and drop the .har file

Make The Web Fast - The HAR Show: Capturing and Analyzing performance data with HTTP Archive format

$> tcpdump -i eth0 -w path/filename.cap

$> tcpdump -nn -i eth0 host <IP/hostname> -w /path/filename.cap &

$> netstat -s

To see packets received(rx) and transmitted(tx) throughput

$> sar -n DEV 2 10000 (every 2 sec ,will run for 10000 times

Threaddumps for any stuck threads or deadlocks

$> $JAVA_HOME/bin/jstack <PID> <Path>

If heap analysis is needed, copy the JFR File template to a location of your choice on the VM.

To collect JFRs, the flags "-XX:+UnlockCommercialFeatures -XX:+FlightRecorder" should be present in JVM args.

$> $JAVA_HOME/bin/jcmd <PID> JFR.start settings=<template.jfc location> duration=<Time in Sec> filename <Path/java_JfrRecordingFile.jfr>

If the " java.io.IOException: well-known file is not secure " error occurs, it mostly will be due to a mismatch of the user executing command and user started the jvm process.

Execute the jfr command and verify successful execution

HAR is a useful tool use in diagnosing Cloud web application (e.g. http) or javascript performance issues. When Cloud end-users report slow web performance / UI navigation flows that may be due to a client-side issue, request that the customer submit a HAR file to Oracle Support for further analysis.

A Http Archive (HAR) viewer is a visualization tool to see the information in HAR files. A HAR file (http://www.softwareishard.com/blog/har-12-spec/) is a json schema made of all the metadata needed to re-construct the network traffic (waterfall) of a web session. Most modern browsers (Chrome, Firefox, IE/Edge) include a “developer tools” feature. Using developer tools, one can examine and capture (via record) a web session’s network traffic. This capture can be saved as a HAR file, which can be viewed later through a HAR viewer.

Capturing HAR files can be useful to investigate web performance as experienced by a client. The HAR file captures the timing of each http request method, response codes, and files (images, stylesheet, scripts, etc). It also includes the details in the http request and response headers (initiator, ecids, cookies, etc.).

$> file <core file name>

Example:

$> gdb <executable path of the process responsible for core file creation> <core file name>

Example:

At the (gdb) prompt type "bt" for backtrace .

Whenever a java or c/c++ process crashes, core files are generated. Core files need to be analyzed to identify the stack responsible for crash issue.

Identify the process which created the core file using the command:

$> file core.23120

core.23120: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/bi/app/fmw/bi/bifoundation/server/bin/nqsserver', real uid: 501, effective uid: 501, real gid: 502, effective gid: 502, execfn: '/bi/app/fmw/bi/bifoundation/server/bin/nqsserver', platform: 'x86_64'

Identify the callstack/thread responsible for crash by running below command :

$> gdb /bi/app/fmw/bi/bifoundation/server/bin/nqsserver core.23120

or it the core is from a java process:

$> gdb /bi/app/jdk/bin/java <core file name>

Output the thread responsible for the crash.

gdb /bi/app/fmw/bi/bifoundation/server/bin/nqsserver core.23120

#0 0x00007f216a8619ef in __strlen_sse42 () from /lib64/libc.so.6

#1 0x00007f216d1d986b in MultiThreadedUtilityPortableErrorMessageManager::ThrowErrorMessage(int, char const*, char const*) () at server/Utility/Generic/ErrorMsg/SUGErrRegister.cpp:112

#2 0x00007f2177c4ab33 in sup::transcodeimpl::TranscoderCache::getTranscoder(int) () at server/Utility/Generic/Portable/Src/transcodercache.cpp:98

#3 0x00007f2177c36414 in WideCharToMultiByteHandleShortDestBuf(wchar_t const*, int, char*, int, unsigned int) () at server/Utility/Generic/Portable/Src/SUPTranscode.cpp:425

#24 0x00007f216a816c4d in clone () from /lib64/libc.so.6

To output all threads from core file use the "set logging on" and "thread apply all bt" gdb commands to send all thread info to gdb.txt.

$> gdb /bi/app/fmw/bi/bifoundation/server/bin/nqsserver core.23120

(gdb) set logging on

(gdb) thread apply all btStrace to trace system calls and signals

$> strace -o /<dir>/strace.txt -T -q -s 0 -f $(/sbin/pidof httpd.worker | sed 's/$[0-9]*$/-p \1/g') &

If and when the customer complains about performance issue the next time, please ask for the exact time period they are facing the slowing and raise a new bug (if not done already) or update the same in the existing bug.

Even so, a rough timestamp is always helpful in every case of issues.

Get the PIDs of the WLS processes from all the hosts

Get the Java PID: ps -ef | grep <ics_server1/2/AdminServer> | grep -v grep | awk '{print $2}'

In case of nigh cpu or high memory consumption, collect the output of following linux commands (at 5/10/15 seconds apart for 60 seconds) from all WLS server hosts:

free –m & vmstat <5/10/15>

When the application slowness is reproduced, collect thread dumps on all the WLS servers at 5/10/15 seconds interval for 60 seconds (subjective - per case basis).

In case of consistently high cpu being consumed just by the WLS java process, then use the specific java PID and execute the thread-based top command (top –H <pid>) at the exact time when thread dump is captured

In case of continuous memory/heap pressure, capture a heap dump.

As an immediate step, please collect the JFR files present in sub directories of “/tmp/”.

Each sub-directory, that we are interested in, is named as <date_timestamp> with the jfr files too named in the same fashion. E.g. /tmp/2018_04_23_06_52_46_102052/2018_04_26_16_02_52_102052_0.jfr

However, please do not go by the date mentioned in the sub-directory as these dates are the service restart date/time. Please check for the actual JFR files which got created during the time frame of the occurrence of the performance issue and upload those to the bug. There may be multiple such jfr files created. Please upload all of them.

In case of OOMs, please upload the generated heap dump and GC logs from the <DOMAIN HOME> and upload to the bug

Upload all the logs (NOT JUST the latest ones) from the as1, ms1 and ms2 WLS servers and upload to the bug.

In case of slowness in UI navigation flows, capture HAR per page from browser while navigating through the flow.

You can always to the steps for capturing thread dumps, JFRs, heap dumps, etc. from this link: https://confluenc.ddddcorp.com/confluence/display/dddd/OPC+Diagnostic+Process

Sample GC LOGS commands from cmd prompt :

command to collect GC details runtime -

/u01/jdk/bin/jstat -gc 20649 <PID> | tail -l | awk '{printf("Used: %.1f, Allocated: %.1f GB \n", ($3+$4+$6+$8)/(1024*1024), ($1+$2+$5+$7)/(1024*1024)) }'

[oddd@wdcdevqsic-wls-1 sessions]$ find . -mindepth 1 -maxdepth 1 -type d -mmin +15 | wc -l

awk '{print $9}' access.log | sort| uniq -c | sort -r

96639 "200"

90 "503"

grep 'Full GC' /u01/data/domains/*/GC*.log* | awk -F':' '{print $2" "$3" "$4" "$5" "$6;}'|sort -k1,2 -r |head -10

Heapdumps for any OOM/memory leaks/ Mem consumption / GC issues, Using Command line tool jcmd/jmap.

Tool Description:

HAR Viewer:

TCP DUMP :

THREAD DUMPS :

JFRs for all WLS/JVM service performance issues :

HAR for slowness in UI Navigation flows

CORE DUMP :

Core File Analysis

Generic steps to follow in cases of performance issues like slowness, stuck threads, OOMs, slow performance, consistently high CPU or memory utilization, etc.:

Java Performance Basics and Tools

Post a Comment

0 Comments

Search

Popular Posts

Oracle Index Fundamentals

Formulas to calculate Pacing

Buffer Cache Fundamentals

Handling Redirects-LR

Java Commands and Tools

TCP Details

Social Widget

Java Performance Basics and Tools

You may like these posts

Post a Comment

0 Comments

Search

Popular Posts

Oracle Index Fundamentals

Formulas to calculate Pacing

Buffer Cache Fundamentals

Handling Redirects-LR

Java Commands and Tools

TCP Details

Social Widget