nginx php5-fpm上行超时（110：连接超时），同时连接上行

我们有一个运行着nginx php5-fpm apc setup的web服务器。然而，我们在最近的页面渲染过程中经历了上游连接超时错误和缓慢下降一个快速的php5-fpm重启解决了这个问题，但是我们找不到原因。

我们有另一个web服务器在另一个子域下运行apache2，连接相同的数据库，做同样的工作。但是缓慢的发生只发生在nginx-fpm服务器上。我认为php5-fpm或apc可能会导致问题。

日志告诉各种连接超时：

upstream timed out (110: Connection timed out) while connecting to upstream bla bla bla

php5-fpm日志不显示任何内容。只是孩子开始和完成：

 Apr 07 22:37:27.562177 [NOTICE] [pool www] child 29122 started Apr 07 22:41:47.962883 [NOTICE] [pool www] child 28346 exited with code 0 after 2132.076556 seconds from start Apr 07 22:41:47.963408 [NOTICE] [pool www] child 29172 started Apr 07 22:43:57.235164 [NOTICE] [pool www] child 28372 exited with code 0 after 2129.135717 seconds from start

发生错误时，服务器没有加载，负载平均只有2（2cpus 16cores）和php5-fpm进程似乎工作正常。

nginx conf：

 user www-data; worker_processes 14; pid /var/run/nginx.pid; # set open fd limit to 30000 worker_rlimit_nofile 30000; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; gzip_disable "msie6"; # gzip_vary on; # gzip_proxied any; # gzip_comp_level 6; # gzip_buffers 16 8k; # gzip_http_version 1.1; # gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; }

nginx启用的网站conf：

  location ~* \.php$ { fastcgi_split_path_info ^(.+\.php)(.*)$; fastcgi_pass backend; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; fastcgi_param QUERY_STRING $query_string; fastcgi_param REQUEST_METHOD $request_method; fastcgi_param CONTENT_TYPE $content_type; fastcgi_param CONTENT_LENGTH $content_length; fastcgi_intercept_errors off; fastcgi_ignore_client_abort off; fastcgi_connect_timeout 20; fastcgi_send_timeout 20; fastcgi_read_timeout 180; fastcgi_buffer_size 128k; fastcgi_buffers 4 256k; fastcgi_busy_buffers_size 256k; fastcgi_temp_file_write_size 256k; } ## Disable viewing .htaccess & .htpassword location ~ /\.ht { deny all; } } upstream backend { server 127.0.0.1:9000; }

fpm conf：

 pm.max_children = 500 pm.start_servers = 100 pm.min_spare_servers = 50 pm.max_spare_servers = 100 pm.max_requests = 10000

fpm conf文件中有紧急重启设置。我不知道他们是否帮我们解决了这个问题？

 emergency_restart_interval = 0

首先，将PHP-FPM max_requests减少到100; 你希望PHP线程重启的时间要早于10000请求。

其次，你只有一个PHP进程运行了很多孩子。这对于开发很好，但是在生产中你希望有更多的PHP进程，每个进程都有更少的子进程，所以如果出于某种原因导致进程出现故障，那么其他进程可能会占用这些时间。所以，而不是像现在这样1:50的比例，比例为10：5。这将会更加稳定。

为了达到这个目的，你可能想要像管理员那样来管理你的PHP进程。我们在生产中使用它，它确实帮助增加了正常运行时间，并减少了我们花费在管理/监视服务器上的时间。这里是我们的配置的一个例子：

/etc/php5/php-fpm.conf：

 [global] daemonize = no [www] listen = /tmp/php.socket

/etc/supervisor.d/php-fpm.conf：

 [program:php] user=root command=/usr/sbin/php-fpm -c /etc/php5/php.ini -y /etc/php5/php-fpm.conf numprocs=10 process_name=%(program_name)s

/etc/nginx/conf/php.backend：

 upstream backend { server unix:/tmp/php.socket }

编辑：

与所有服务器设置一样，不要依赖猜测工作来追踪问题出在哪里。我建议安装Munin以及各种PHP（-FPM）和Nginx插件; 这些将帮助您追踪有关请求，响应时间，内存使用情况，磁盘访问，线程/进程级别等方面的硬统计数据……在追踪问题出现的位置时，这一切都是至关重要的。

另外，正如我在下面的评论中所提到的，即使在适当的级别上添加服务器端和客户端缓存也可以帮助为用户提供更好的体验，无论是使用nginx的本地缓存支持，还是像varnishd更具体的东西。即使是最有活力的网站/应用程序，也有许多静态的元素可以保存在内存中，而且服务速度更快。从缓存中提供这些资源可以帮助减少总体负载，并确保那些绝对需要动态的元素在需要时拥有所需的所有资源。