Empire did strike back and this time really hard with an imposing challenge : *25 upstream timed out (110: Connection timed out) while reading response header from upstream, client: xx.yy.zz.aa, server: _, request: “POST /xyz/abc/add HTTP/1.1”, upstream:

Deep root-cause-analysis finally resulted into tuning various parameters in the environment starting from data-integration layer,  nginx , tomcat, mysql to linux.

(A) Data Integration Layer

1. Combine streaming documents into a mega-document.
2. Split Mega-document into multiple sets.
3. Send considerable amount of data ( that http server can handle ) to http server endpoint in order to utilize the maximum bandwidth.

4. Note that the http header and packet size need to be increased on the web server layer.
5. Send all the sets in parallel ( through as much threads as the processor can support )

(A) Tune Nginx

1. The proxy_read_timeout property basically tells nginx how long it should wait to hear back from the downstream servers before it times out.
So we should first increase time-out value !

2. But once this issue is resolved, another problem with bigger proportion was waiting to hit back !! The dreaded data-size issues !!
So we have to increase the buffer size !! This is a trial and error game !!

3. One needs to increase size of incoming payload , tune the following params and observe the CPU/Memory/IO/NW load and finally pick the right combination of ‘allowable payload size’ , ‘allocated resource’ and ‘param value’ !!

$ service nginx stop
 Under http section :
 sendfile on; tcp_nopush on; tcp_nodelay on; #keepalive_timeout 0; keepalive_timeout 600;
 client_header_timeout 60m; client_body_timeout 60m; send_timeout 60m; proxy_buffer_size 2048k;
 proxy_buffers 4 2048k; proxy_busy_buffers_size 2048k; #proxy_temp_file_write_size 8M;
 proxy_read_timeout 1800; proxy_send_timeout 1800; connection_pool_size 512; client_header_buffer_size 2048k; 
 large_client_header_buffers 4 2048k; client_max_body_size 100M;
 client_body_buffer_size 1024k; request_pool_size 1024k; output_buffers 4 1024k; postpone_output 1460;
 Under section => server { 9443 } ssl_session_timeout 30m;
 Under section => location { proxy_buffer_size 2048k; proxy_buffers 4 2048k; 
 proxy_busy_buffers_size 2048k;
 proxy_temp_file_write_size 8M;
 proxy_read_timeout 1800;
 proxy_send_timeout 1800;


(B) Tuning MySql ( my.cnf )

 service mysql stop (as root)

[mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql

# INCREASE THIS VALUE max_allowed_packet=128M
 innodb_file_per_table = 1 l
 ow_priority_updates=1 concurrent_insert=ALWAYS thread_cache_size = 32
 query_cache_size = 32M query_cache_limit = 4M long_query_time = 10
 table_open_cache = 1000 #join_buffer_size=1G log-bin=mysql-bin server-id=270407128
 expire_logs_days=2 key_buffer_size = 512M
#max_heap_table_size=1G preload_buffer_size=64MB read_buffer_size=64MB read_rnd_buffer_size=128MB
 #sort_buffer_size=1024MB log_bin_trust_function_creators = 1
 innodb_additional_mem_pool_size = 512M 
# 80% of Main Memory innodb_buffer_pool_size = 5G 
innodb_buffer_pool_restore_at_startup = 60
innodb_data_file_path = ibdata1:10M:autoextend
# Do not Specify any concurrency , let CPU choose it innodb_thread_concurrency = 0
 innodb_buffer_pool_instances = 2
 innodb_flush_log_at_trx_commit = 2
 innodb_io_capacity=20000 group_concat_max_len=8M innodb_table_locks=0
# INCREASE TIME OUT wait_timeout = 3600 interactive_timeout = 3600
 innodb_write_io_threads = 4 innodb_doublewrite=0
 innodb_buffer_pool_instances=1 innodb_adaptive_flushing=1
 innodb_flush_method = ALL_O_DIRECT max_connections = 300


(C) Tuning Tomcat

$ kill -9 <tomcat7 pid>
$vi startup.sh
Add following lines After the line -> EXECUTABLE=catalina.sh CATALINA_OPTS=”$CATALINA_OPTS –

XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseParallelGC -XX:+UseParallelOldGC
 -XX:+DisableExplicitGC -Xloggc:/tmp/gc.log -verbose:gc -Djava.awt.headless=true
 -Xms1024m -Xmx2048m -XX:MaxPermSize=512m -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000" JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$INCIDENTS_DIR" $vi server.xml <Connector port="9080" protocol="org.apache.coyote.http11.Http11NioProtocol" connectionTimeout="1200000" executor="tomcatThreadPool" />

Ref : http://tomcat.apache.org/tomcat-7.0-doc/jndi-datasource-examples-howto.html

(D) Tuning JDBC Pool

timeBetweenEvictionRunsMillis = “30000” datasource.conntion.pool.MinEvictableIdleTimeMillis=30000
Ref: http://people.apache.org/~fhanik/jdbc-pool/jdbc-pool.html

(E) Tuning Linux

Configure Linux to receive bigger size of packets over network. Ref: http://www.cyberciti.biz/faq/linux-tcp-tuning/

“tune Linux network stack by increasing network buffers size for high-speed networks that connect server systems to handle more network packets.”

tcpdump -ni <nw_name>

“tcp_mem , rmem_default, rmem_max, wmem_default,wmem_max, optmem_max, tcp_window_scaling,net.core.netdev_max_backlog .. “

(F) Further References :

Nginx Tuning >> – http://www.cyberciti.biz/tips/linux-unix-bsd-nginx-webserver-security.htmlhttp://blog.lowkey.net.my/2011/07/directadmin-nginxreverse-proxy/ MySQL Tuning >> – http://www.techrepublic.com/blog/opensource/10-mysql-variables-that-you-should-monitor/56http://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_connect_timeouthttp://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_wait_timeout

– SELECT variable_value FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE variable_name=’threads_connected’
– show full processlist;
– SELECT FORMAT(A.num * 100.0 / B.num,2) BufferPoolFullPct FROM (SELECT variable_value num FROM information_schema.global_status WHERE variable_name = ‘Innodb_buffer_pool_pages_data’) A, (SELECT variable_value num FROM information_schema.global_status WHERE variable_name = ‘Innodb_buffer_pool_pages_total’) B;