Basics Of CS nodes
Riak CS is running out of memory. Most of the time this is non-actionable; nodes will crash and automatically restart and operation usually returns to normal without intervention. If a node is consistently running out of memory then it is likely necessary to either allocate more memory (usually done by creating a new node with more memory) or by allocating more swap space (can affect performance).
:~$ free -mh total used free shared buffers cachedMem: 7.5G 7.2G 311M 104M 97M 6.9G-/+ buffers/cache: 236M 7.3GSwap: 0B 0B 0Bssm-user:~$ sudo su -~# sync; echo 3 > /proc/sys/vm/drop_caches~# free -mh total used free shared buffers cachedMem: 7.5G 337M 7.2G 104M 2.1M 139M-/+ buffers/cache: 195M 7.3GSwap: 0B 0B 0B |
$ root:/var/log/riak# ps -e -o pid,rss|sort -nk2 -r|head -5 2615 196596 335 69504 298 18424 303 15972# ps -ef | grep 2615riak 2615 2613 18 22:08 pts/1 00:06:34 /usr/lib/riak/erts-5.10.3/bin/beam.smp -scl false -sfwi 500 -P 256000 -e 256000 -Q 262144 -A 64 -K true -W w -zdbbl 32768 -- -root /usr/lib/riak -progname riak -- -home /var/lib/riak -- -boot /usr/lib/riak/releases/2.2.3/riak -config /var/lib/riak/generated.configs/app.2020.03.27.22.08.16.config -setcookie riak -name riak@ip-172-26-82-142.ap-southeast-2.compute.internal -smp enable -vm_args /var/lib/riak/generated.configs/vm.2020.03.27.22.08.16.args -pa /usr/lib/riak/lib/basho-patches -- consoleriak 2854 2615 0 22:08 ? 00:00:00 sh -s disksupriak 2952 2615 0 22:08 ? 00:00:00 inet_gethost 4root 5850 30679 0 22:43 pts/0 00:00:00 grep 2612 |
Sometimes, most of the memory would be cached. So, we need to clear that up before moving forward
After clearing the cache, you might sometimes see that the node comes back to spiking memory and when you check the stats, most of the memory is again occupied by cache. The top command might only show a fraction of usage of memory like 2% . Then, to find the process using most memory, use the following command.
RSS is Resident Set Size (physically resident memory - this is currently occupying space in the machine's physical memory), and VSZ is Virtual Memory Size (address space allocated - this has addresses allocated in the process's memory map, but there isn't necessarily any actual memory behind it all right now). Note that in these days of commonplace virtual machines, physical memory from the machine's view point may not really be actual physical memory. The resident set size (RSS) is the amount of space of physical memory (RAM) held by a process. The value is typically specified in bytes or pages. If the full amount of space required by a process exceeds the RSS, the remaining portion is typically stored in swap. Collectively, the total amount is the virtual set size.
Sometimes, even after taking all the needed steps, if you still see the "CRASH REPORT" in your console.log, something like this then, remember that Hashtrees are used for anti-entropy exchanges. If you have a corrupt file in your hashtree it is probably for the best to simply delete the files for that tree and allow them to rebuild. You can stop the riak process on that node , drop the contents of anti_entropy and start the riak process node. It will rebuild the data by itself.
