Build a serious WordPress server for high traffic website with nginx

November 10, 2011 by · 1 Comment
Filed under: design, linux, server 

Deploy multiple nginx instance on 1 server for load balance

When building WordPress for high traffic website, we maintain and restart the web server frequently for better SEO  and better user experience; but we don’t want to interrupt user’s visiting.

When we get familiar with SEO, the link/site structure may change and need to update nginx configuration which need restart of nginx. When nginx new version released for fixing a dangerous security issue, we need upgrade nginx.

So we need at least 2 nginx instance for load balance, and another nginx as proxy server. Let’s build the 3 nginx instance on a single server with the same nginx installation(or use 2 nginx installation if need upgrading nginx ). Two of the instance serves WordPress with the same WordPress installation and the same database.

The architecture and network composition:

 

nginx load balance architecture on a single machine

nginx load balance architecture on a single machine

 

 

 

 

 

 

 

 

 

 

 

 

The following steps describes how to do in on Debian.

1. Deploy single wordpress server on nginx

Make sure your system is up-to-date.  Reference the following links to install php pre-requirements:

cd /opt

wget http://nginx.org/download/nginx-1.0.9.tar.gz

tar -zxvf nginx-1.0.9.tar.gz
cd /opt/nginx-1.0.9/
./configure --prefix=/opt/nginx --user=nginx --group=nginx --with-http_ssl_module --with-ipv6

make
make install

Create init script to manage nginx:

wget -O init-deb.sh http://library.linode.com/assets/682-init-deb.sh
mv init-deb.sh /etc/init.d/nginx
chmod +x /etc/init.d/nginx
/usr/sbin/update-rc.d -f nginx defaults
/etc/init.d/nginx start

2. Duplicate nginx configurations

cd /opt/nginx  (change directory to where nginx is installed)

cp conf/nginx.conf  conf/nginx.81.conf

cp conf/nginx.conf  conf/nginx.82.conf

mv conf/nginx.conf conf/nginx.backup.conf

And then change the listen port 80 to 81,82 respectively in nginx.81.conf and nginx.82.conf; also don’t forget to comment out the pid line and change the pid file for each instance so that each instance can be stopped normally:

pid        logs/nginx.81.pid;  ### in file nginx.81.conf

pid        logs/nginx.82.pid;  ### in file nginx.82.conf

3. Configure and deploy the proxy nginx server

vi conf/nginx.conf,

copy the following content to listen on port 80 which is balanced by two nginx instance at port 81 and 82.

#deploy multiple nginx instance for load balance on 1 server

upstream main {
        server 106.187.45.82:81;
        server 106.187.45.82:82;
}  

server {
listen 106.187.45.82:80 

location /   {
	proxy_set_header Host $host;
	proxy_set_header X-Real-IP $remote_addr;
	proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	proxy_pass http://main;
        }
}

4. Stop your old server and start the new instance with load balance support

sbin/nginx -s stop, or kill -QUI pid

Then you can start the new instance by:

cd /opt/nginx
sbin/nginx -c conf/nginx.81.conf
sbin/nginx -c conf/nginx.82.conf
sbin/nginx -c conf/nginx.conf

In case you want to stop the instance, you can run one/all of these commands:

sbin/nginx -s stop -c conf/nginx.81.conf
sbin/nginx -s stop -c conf/nginx.82.conf
sbin/nginx -s stop -c conf/nginx.conf

Of course, you can stop the instance by executing this command too: kill -QUIT pid-of-the-nginx

5. Debug and maintenance one of the nginx instance

When you debugging on one of the nginx node, if you visit http://106.187.45.82:81, WordPress will redirect your url to the default WordPress site URL  automatically(for example http://www.beyondlinux.com/ ), and then you don’t know which instance you are visiting.

But why wordpress has automatic url redirects? You can typically visit the home page of a WordPress web site by several different URLs:

http://beyondlinux.com/,

http://www.beyondlinux.com/,

http://www.beyondlinux.com:81/

The problem with allowing all of these URLs to access a single page is that it can potentially hurt your website’s overall search engine optimization (SEO). It means search engines could index duplicate copies. So WordPress fixes this problem by employing automatic redirects known as Canonical URL Redirection, which only enables one url per page.

When debugging and testing new functions,   you don’t want to enable the redirection. You can add the following code to functions.php file. remove_filter('template_redirect','redirect_canonical');

If you don’t like editing the file, you could install the plugin: “Permalink Fix & Disable Canonical Redirects Pack“, and activate it, then redirection would be disabled.

After debugging and testing of your WordPress finished, you should deactivate the plugin to enable the Redirection for a better SEO site.

6. Redirect request to next server if error or timeout.

When in load balance mode, nginx will redirect/resend request to another server by default when server error or timeout; for more error processing, we can leverage the directive of proxy_next_upstream and  fastcgi_next_upstream

syntax: proxy_next_upstream [error|timeout|invalid_header|http_500|http_502|http_503|http_504|http_404|off]; default: proxy_next_upstream error timeout;

context: http, server, location

The directive determines in what cases the request will be transmitted to the next server, here’s an example config:

proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504 http_404;  

fastcgi_next_upstream error timeout invalid_header http_500 http_503 http_404;

Digg This
Reddit This
Stumble Now!
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Why Load Balance not work in Hessian C# client calling to hessian service?

November 9, 2011 by · Comments Off on Why Load Balance not work in Hessian C# client calling to hessian service?
Filed under: design, java, troubleshooting 

When I was migrating our application from C# to Java, our Java service moved ahead of  the client application. The client application is in C#. And we export service through Hessian service. So we call java hessian service through C# hessian client.

But we met a big problem on load balance after the new application deployed. The load is never balanced on the C# hessian client’s request. The service is invoked through F5.

After digging into the code of C# hessian, I found the cause: C# Hessian Client uses HttpWebRequest with default properties to call java hessian service, while the default HttpWebRequest’s KeepAlive property is true. That means after the C# client connected to a load balance server, it will keep on calling the same back-end service and the request from this client will not routed to other back-end service.

So the solution is to change the default KeepAlive property in file CHessianMethodCaller.cs

HttpWebRequest req = webRequest as HttpWebRequest;

req.KeepAlive = false;  // newly added line to assure load balance work

 

Digg This
Reddit This
Stumble Now!
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

QCon Hangzhou 2011 slides in English

October 24, 2011 by · Comments Off on QCon Hangzhou 2011 slides in English
Filed under: design, tech watch 

I’ve got the English slides in QCon Hangzhou 2011 from weibo.com, and shared them; and hope it may help for some English readers around the world.

Building Innovative Data Products-JohnWang

Driving Agile transformation by encouraging right behaviors(Evelyn)

Enabling a Real Time Enterprise Through Event Driven Architecture(Richard)

Everything I ever learned about JVM performance tuning @twitter(Attila Szegedi)

Integrating The Clouds-Pattern for Success(Richard)

JVM Customization @taobao

Java EE 7 Platform:Reaching the Cloud(Tyler Jewell)

Oracle Public Cloud(Tyler Jewell)

Realtime Analysis:for Big data-Lessons Learned from Facebook(Uri)

Schooner-Scale Smart(John R. Busch)

eBay Architecture(Tony Ng)

eBay M2M Email Analysis on Hadoop(Forest Su, Daniel Zhang)

Digg This
Reddit This
Stumble Now!
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Noticeable Hacker news I read(2011/06)

July 1, 2011 by · Comments Off on Noticeable Hacker news I read(2011/06)
Filed under: design, tech watch 

Post for startup :

Where to find a co-founder/

Why startup needs blog/

 

Post for tech:

Ultra KSM — An improvement upon Linux’s memory merging support enabling transparent full system scan at ultra speed

Hacking the system: How to land meetings with anyone you want

understanding v8

Speed up your eclipse as a super fast IDE

write your first mapreduce program in 20 minutes/

good machine learning blogs

Reflections and advice on life as a mid-stage Ph.D. student
What are your favorite Vim tricks?

http://research.microsoft.com/en-us/people/simonpj/

http://projecteuler.net/ (Programming Challenge)

 

Post for Learning , Attitude and Lifestyle:

Things ics students should do before graduating/

A new billionaires 10 rules for success/

Harvard Classics (Bookshelf)

Digg This
Reddit This
Stumble Now!
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

不同的开发语言在性能上的两个重要差别

June 5, 2011 by · Comments Off on 不同的开发语言在性能上的两个重要差别
Filed under: c++, design, java 

不同的开发语言在性能上的重要差别,除了编译执行与解释执行的效率之外,还有两个重要的差别,一个是方法调用的性能差别,另一个是对象的创建与回收效率。

前阵子用C++(我用的VC 2010)写一个搜索本地文件的小程序,把一段代码提取为方法之后(参数使用了引用避免对象拷贝的开销),竟然发现性能有非常明显的下降,这说明C++(准确地说是VC++)的方法调用成本是相当高的。

我们在用Java处理大量的计算和IO操作时,大量的对象产生与销毁,经常会导致频繁的Full GC,Full GC的时候系统的服务几乎停滞,大大影响了系统的性能。 因此Java的内存回收效率是非常低的。 另一个方面,我们在用java时,不论方法调用的层次有多深,都几乎不会影响系统的执行性能,说明java的方法调用性能是非常高的。

Perl社区曾尝试使用纯Perl移植Lucene,然后试图通过不断的profiling和tuning来改善Perl版本Lucene的性能,最终发现由于Perl语言方法调用的成本过高,且对象的频繁创建和销毁导致性能低下而未能如愿。 参考:
http://wiki.apache.org/lucy/MinimizingObjectOverhead

Digg This
Reddit This
Stumble Now!
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

wget批量下载文件详解:如何下载qcon,tcon 2011,oscon 2009/2010的所有slides

April 16, 2011 by · Comments Off on wget批量下载文件详解:如何下载qcon,tcon 2011,oscon 2009/2010的所有slides
Filed under: design 

1.下载 qcon beijing 2011的所有pdf文件

wget  `curl -s http://www.qconbeijing.com/schedule.html | sed ‘s/<\/a>/\n/g’ | sed ‘s/.*href=”\([^”]*\)”.*$/\1/’ | grep download | sed ‘s/download/http:\/\/www.qconbeijing.com\/download/g’ `

命令详解:
curl下载到schedule.html,内容输出到stdout,
第1个sed把链接的结束标签替换为换行, </a>替换为换行, 以确保每行只有一个链接。
第2个sed找到所有的href=””中间的内容,并输出;
grep download 找到所有的 download/xxxx.pdf的链接,
最后一个sed把download替换为文件的全路径, 比如 网页中的 download/panxiaoliang.pdf 链接会被替换为 http://www.qconbeijing.com/download/panxiaoliang.pdf

举例,schedule.html网页中有这样的一行,其中第2个href的地址是需要提取出来,并且补充baseurl的:
<td><p align=”center”><a href=”ShowNews.aspx?id=35″>构建高性能的微博系统——再谈新浪微博架构</a><a target=”_blank” href=”download/yangweihua.pdf”>(幻灯片下载)</a><a href=”ShowNews.aspx?id=37″></a><br />

2. 下载Qcon San Francisco 2008-2011的所有slides

wget `curl http://qconsf.com/sf2008/schedule/wednesday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2008#http://qconsf.com/sf2008#’`

wget `curl http://qconsf.com/sf2008/schedule/thursday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2008#http://qconsf.com/sf2008#’`

wget `curl http://qconsf.com/sf2008/schedule/friday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2008#http://qconsf.com/sf2008#’`

重命名下载的文件为文件名中%2F后面的名字:
ls | awk -F%2F ‘{print “mv ” “\””$0″\””, “\””$5″\””}’ > ../a.sh
source ../a.sh
wget -c `curl http://qconsf.com/sf2009/schedule/wednesday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2009#http://qconsf.com/sf2009#’`

wget -c `curl http://qconsf.com/sf2009/schedule/thursday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2009#http://qconsf.com/sf2009#’`

wget -c `curl http://qconsf.com/sf2009/schedule/friday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2009#http://qconsf.com/sf2009#’`

重命名下载的文件为文件名中%2F后面的名字:
ls | awk -F%2F ‘{print “mv ” “\””$0″\””, “\””$4″\””}’ > ../a.sh
source ../a.sh
wget -c `curl http://qconsf.com/sf2010/schedule/wednesday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2010#http://qconsf.com/sf2010#’`

wget -c `curl http://qconsf.com/sf2010/schedule/thursday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2010#http://qconsf.com/sf2010#’`

wget -c `curl http://qconsf.com/sf2010/schedule/friday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/sf2010#http://qconsf.com/sf2010#’`

重命名下载的文件为文件名中%2F后面的名字:
ls | awk -F%2F ‘{print “mv ” “\””$0″\””, “\””$4″\””}’ > ../a.sh
source ../a.sh

3.下载Qcon London 2010-2011的slides

wget `curl http://qconlondon.com/london-2010/schedule/wednesday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/london-2010#http://qconlondon.com/london-2010#’`

wget `curl http://qconlondon.com/london-2011/schedule/wednesday.jsp -s | grep pdf | sed ‘s”<a href=””‘ | sed ‘s#”##’ | sed ‘s#”>##’ | sed ‘s#/london-2011#http://qconlondon.com/london-2011#’`

2010,2011年的还有如下thrusday, friday两个jsp页面中文件的下载命令上面未列出,直接替换上面的wednesday为thrusday, friday即可。
http://qconlondon.com/london-2011/schedule/thursday.jsp
http://qconlondon.com/london-2011/schedule/friday.jsp

重命名下载的文件为文件名中%2F后面的名字:

ls | awk -F%2F ‘{print “mv ” $0, $4}’ > ../a.sh
source ../a.sh

2011/07更新:

4.下载淘宝嘉年华2011 (Tcon 2011)所有slides

wget http://developerclub.taobao.com/schedule/ -O tcon2011.txt
wget `grep ppts tcon2011.txt | sed 's/.*href="\([^"]*\)".*$/\1/' | sed 's#/ppts#http://developerclub.taobao.com/ppts#g'`

里面的wget下载到schedule页面的内容

grep ppts找到所有包含下载链接的行并输出到标准输出;

第1个sed找到所有href中的链接地址(相对链接地址),如 /ppts/魏子均More_Weapons_More_Power.pdf。

第2个sed将上面的相对路径替换成绝对路径, 如:

http://developerclub.taobao.com/ppts/魏子均More_Weapons_More_Power.pdf

外面的wget下载所有的链接。

5.下载oscon 2009/2010的所有slides

download oscon 2009 slides:

wget http://www.oscon.com/oscon2009/public/schedule/proceedings  -O oscon2009.txt
grep "Presentation File:" -A 6  oscon2009.txt | grep "a href" | sed 's/.*href="\([^"]*\)".*$/wget "\1"/' > download-oscon2009.sh
source download-oscon2009.sh

第1个grep找到包含Presentation File:的行,及其后的6行并输出到控制台;

第2个grep找到包含href链接的行;

sed命令产生一条条的wget命令, 如:

wget “http://assets.en.oreilly.com/1/event/27/Django in the Real World Presentation.pdf”

然后输出到download-oscon2009.sh,并执行这个文件。

download oscon 2010 slides:

wget http://www.oscon.com/oscon2010/public/schedule/proceedings -O oscon2010.txt 
grep "Presentation:" -A 6 oscon2010.txt | grep "a href" | sed 's/.*href="\([^"]*\)".*$/wget "\1"/' > download-oscon2010.sh
download-oscon2010.sh

 

 

 

 

 

 

 

具体参数与上面的类同。

 

Digg This
Reddit This
Stumble Now!
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)