-
Recent Posts
Categories
- algorithm (1)
- automation (3)
- c++ (2)
- cloud (1)
- design (6)
- git (2)
- java (10)
- jenkins (1)
- linux (17)
- python (6)
- search engine (1)
- security (1)
- server (3)
- tech watch (6)
- test (3)
- troubleshooting (7)
- Uncategorized (2)
- virtualization (3)
Archives
- March 2012 (4)
- February 2012 (6)
- January 2012 (2)
- November 2011 (5)
- October 2011 (1)
- August 2011 (3)
- July 2011 (8)
- June 2011 (10)
- May 2011 (4)
- April 2011 (6)
- March 2011 (1)
Links
Tag Archives: design
wget批量下载文件详解:如何下载qcon,tcon 2011,oscon 2009/2010的所有slides
1.下载 qcon beijing 2011的所有pdf文件 wget `curl -s http://www.qconbeijing.com/schedule.html | sed ‘s/<\/a>/\n/g’ | sed ‘s/.*href=”\([^"]*\)”.*$/\1/’ | grep download | sed ‘s/download/http:\/\/www.qconbeijing.com\/download/g’ ` 命令详解: curl下载到schedule.html,内容输出到stdout, 第1个sed把链接的结束标签替换为换行, </a>替换为换行, 以确保每行只有一个链接。 第2个sed找到所有的href=””中间的内容,并输出; grep download 找到所有的 download/xxxx.pdf的链接, 最后一个sed把download替换为文件的全路径, 比如 网页中的 download/panxiaoliang.pdf 链接会被替换为 http://www.qconbeijing.com/download/panxiaoliang.pdf 举例,schedule.html网页中有这样的一行,其中第2个href的地址是需要提取出来,并且补充baseurl的: <td><p align=”center”><a … Continue reading