Basics Of CS nodes


Riak CS is running out of memory. Most of the time this is non-actionable; nodes will crash and automatically restart and operation usually returns to normal without intervention. If a node is consistently running out of memory then it is likely necessary to either allocate more memory (usually done by creating a new node with more memory) or by allocating more swap space (can affect performance).

:~$ free -mh
             total       used       free     shared    buffers     cached
Mem:          7.5G       7.2G       311M       104M        97M       6.9G
-/+ buffers/cache:       236M       7.3G
Swap:           0B         0B         0B
ssm-user:~$ sudo su -
~# sync; echo 3 > /proc/sys/vm/drop_caches
~# free -mh
             total       used       free     shared    buffers     cached
Mem:          7.5G       337M       7.2G       104M       2.1M       139M
-/+ buffers/cache:       195M       7.3G
Swap:           0B         0B         0B
$ root:/var/log/riak# ps -e -o pid,rss|sort -nk2 -r|head -5
 2615 196596
  335 69504
  298 18424
  303 15972

# ps -ef | grep 2615
riak      2615  2613 18 22:08 pts/1    00:06:34 /usr/lib/riak/erts-5.10.3/bin/beam.smp -scl false -sfwi 500 -P 256000 -e 256000 -Q 262144 -A 64 -K true -W w -zdbbl 32768 -- -root /usr/lib/riak -progname riak -- -home /var/lib/riak -- -boot /usr/lib/riak/releases/2.2.3/riak -config /var/lib/riak/generated.configs/app.2020.03.27.22.08.16.config -setcookie riak -name riak@ip-172-26-82-142.ap-southeast-2.compute.internal -smp enable -vm_args /var/lib/riak/generated.configs/vm.2020.03.27.22.08.16.args -pa /usr/lib/riak/lib/basho-patches -- console
riak      2854  2615  0 22:08 ?        00:00:00 sh -s disksup
riak      2952  2615  0 22:08 ?        00:00:00 inet_gethost 4
root      5850 30679  0 22:43 pts/0    00:00:00 grep 2612
Sometimes, most of the memory would be cached. So, we need to clear that up before moving forward
After clearing the cache, you might sometimes see that the node comes back to spiking memory and when you check the stats, most of the memory is again occupied by cache. The top command might only show a fraction of usage of memory like 2% . Then, to find the process using most memory, use the following command.
RSS is Resident Set Size (physically resident memory - this is currently occupying space in the machine's physical memory), and VSZ is Virtual Memory Size (address space allocated - this has addresses allocated in the process's memory map, but there isn't necessarily any actual memory behind it all right now). Note that in these days of commonplace virtual machines, physical memory from the machine's view point may not really be actual physical memory. The resident set size (RSS) is the amount of space of physical memory (RAM) held by a process. The value is typically specified in bytes or pages. If the full amount of space required by a process exceeds the RSS, the remaining portion is typically stored in swap. Collectively, the total amount is the virtual set size.
Sometimes, even after taking all the needed steps, if you still see the "CRASH REPORT" in your console.log, something like this then, remember that Hashtrees are used for anti-entropy exchanges. If you have a corrupt file in your hashtree it is probably for the best to simply delete the files for that tree and allow them to rebuild.  You can stop the riak process on that node , drop the contents of anti_entropy and start the riak process node. It will rebuild the data by itself.