海运的博客

高性能libcurl配合epoll的curl_multi_socket_action方法使用

发布时间:February 3, 2015 // 分类: // No Comments

libcurl对大量请求连接提供了管理socket的方法,用户可使用select/poll/epoll事件管理器监控socket事件,可读写时通知libcurl读写数据,libcurl读写完成后再通知用户程序改变监听socket状态。
两个重要的设置:
curl_multi_setopt(g.multi, CURLMOPT_SOCKETFUNCTION, sock_cb);
curl_multi_setopt(multi, CURLMOPT_TIMERFUNCTION, multi_timer_cb);
1.当使用curl_multi_add_handle(g->multi, conn->easy)添加请求时会回调multi_timer_cb,然后调用curl_multi_socket_action(g->multi, CURL_SOCKET_TIMEOUT, 0, &g->still_running)初始化请求并得到一个socket(fd)。
2.调用sock_cb回调函数,传入新建的sockfd,根据传入的what状态添加到相应的事件管理器,如封装epoll的libev或libevent。
3.当事件管理器发现socket状态改变时通过curl_multi_socket_action(g->multi, fd, action, &g->still_running)通知libcurl读写数据,然后再调用sock_cb通知事件管理器,如此反复。

libcurl官网提供的基于libevlibevent事件管理示例:
http://curl.haxx.se/libcurl/c/hiperfifo.html
http://curl.haxx.se/libcurl/c/evhiperfifo.html

CentOS编译安装libcurl/curl添加异步DNS解析c-ares

发布时间:January 30, 2015 // 分类: // No Comments

在使用curl异步并发请求时如果有大量域名解析会长时间阻塞程序IO,可以编译升级libcurl以支持异步DNS解析。
Centos7自带libcurl已支持异步DNS支持,不过是--enable-threaded-resolver,可以使用curl-config --configure查看curl编译参数。
查看Libcurl是否已支持异步DNS解析,包含AsynchDNS为支持:

1
2
3
4
curl --version
curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.40.0 OpenSSL/1.0.1e zlib/1.2.3
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile NTLM SSL libz

首先安装异步DNS解析库c-ares:

1
yum install c-ares-devel openssl-devel

编译libcurl库:

1
2
3
4
5
wget http://curl.haxx.se/download/curl-7.40.0.tar.gz
tar zxvf curl-7.40.0.tar.gz
cd curl-7.40.0/
./configure --enable-ares --prefix=/usr/local/curl --with-ssl
make && make install

查看编译安装的curl信息,已经支持了异步DNS解析库c-ares:

1
2
3
4
/usr/local/curl/bin/curl --version
curl 7.40.0 (x86_64-unknown-linux-gnu) libcurl/7.40.0 OpenSSL/1.0.1e zlib/1.2.7 c-ares/1.10.0
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IPv6 Largefile NTLM NTLM_WB SSL libz UnixSockets

将libcurl动态库添加到动态链接库:

1
2
echo '/usr/local/curl/lib' > /etc/ld.so.conf.d/libcurl.conf
ldconfig

使用libcurl库看是否支持c-ares:

1
2
3
4
5
6
7
8
9
10
11
#include <curl/curl.h>
int main()
{
 curl_version_info_data*info=curl_version_info(CURLVERSION_NOW);
 if (info->features&CURL_VERSION_ASYNCHDNS) {
   printf( "ares enabled\n");
 } else {
   printf( "ares NOT enabled\n");
 }
 return 0;
}

又一PHP libcurl封装异步并发HTTP客户端

发布时间:January 27, 2015 // 分类:PHP // No Comments

PHP标准库内置curl扩展,不过实现不完整,如multi_socket_action接口,无意中发现pecl http库同样基于libcurl封装,支持更多的libcurl特性,更新也比较快,底层通过libevent(epoll)实现multi_socket_action接口,不过pecl http版本1和版本2 api完全不兼容,使用过程中稳定性及性能并不如PHP内置的curl,好像还有内存泄露,以下为示例代码,基于pecl_http 2.20:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
<?php
   function push($client, $url) {
      $req = new http\Client\Request("GET", $url, ["User-Agent"=>"My Client/0.1"]);
      $req->setOptions(array('connecttimeout'=>1, 'timeout'=>1));
      $client->enqueue($req, function($response) use ($client, $req, $url) {
         printf("%s returned '%s' (%d)\n", $response->getTransferInfo("effective_url"), $response->getInfo(), $response->getResponseCode());
         echo $client->count().PHP_EOL;
         global $urls;
         if ($urls) {
            while ($client->count() < 20) {
               $url = array_shift($urls);
               push($client, $url);
            }
            return true; // dequeue
         }
      });
   }
 
   $client = new http\Client;
   $client->enablePipelining(true);
   $client->enableEvents(true);
 
   for ($i = 0; $i < 10000; ++$i) {
      $urls[] = "http://192.168.1.3/";
   }
   for ($i = 0; $i < 20; ++$i) {
      $url = array_shift($urls);
      push($client, $url);
   }
   /*
   try{
      var_dump($client->send());
   }
   catch(http\Exception\RuntimeException  $e)
   {
      echo 'Message: ' .$e->getMessage().PHP_EOL;
   }
   */
 
   while ($client->once()) {
      $client->wait();
   }

Python CURL异步并发HTTP客户端

发布时间:January 7, 2015 // 分类:Python // No Comments

Select模式,类似于php multi curl异步并发,连接数不能太多:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/usr/bin/python
# -*- coding: utf-8 -*-
 
import sys
import pycurl
import cStringIO
 
#最大连接数
num_conn = 20
 
queue = []
urls = ['https://www.haiyun.me/'] * 10000
for url in urls:
  queue.append(url)
 
num_urls = len(queue)
num_conn = min(num_conn, num_urls)
print ('----- Getting', num_urls, 'Max conn', num_conn,
       'connections -----')
 
m = pycurl.CurlMulti()
#初始化handle,可复用
m.handles = []
for i in range(num_conn):
  c = pycurl.Curl()
  c.body = cStringIO.StringIO()
  c.setopt(pycurl.FOLLOWLOCATION, 1)
  c.setopt(pycurl.MAXREDIRS, 5)
  c.setopt(pycurl.CONNECTTIMEOUT, 30)
  c.setopt(pycurl.TIMEOUT, 300)
  c.setopt(pycurl.NOSIGNAL, 1)
  m.handles.append(c)
 
 
freelist = m.handles[:]
num_processed = 0
#主循环开始
while num_processed < num_urls:
 
    #添加请求URL
    while queue and freelist:
      url = queue.pop()
      c = freelist.pop()
      c.setopt(pycurl.URL, url)
      c.setopt(pycurl.WRITEFUNCTION, c.body.write)
      m.add_handle(c)
      c.url = url
      #print url
 
    #执行请求
    while 1:
      (ret, num_handles) = m.perform()
      if ret != pycurl.E_CALL_MULTI_PERFORM:
        break
 
    #阻塞一会直到有连接完成
    m.select(1.0)
 
    #读取完成的连接
    while 1:
      (num_q, ok_list, err_list) = m.info_read()
      for c in ok_list:
        m.remove_handle(c)
        #print c.body.getvalue()
        freelist.append(c)
 
      for (c, errno, errmsg) in err_list:
        m.remove_handle(c)
        print ('Failed: ', c.url, errno, errmsg)
        freelist.append(c)
      num_processed = num_processed + len(ok_list) + len(err_list)
      if num_q == 0:
        break
 
for c in m.handles:
  c.fp = None
  c.close()
m.close()

epoll模式,php mult curl不支持此模式,tornado基于pycurl multi_socket_action封装的异步http client,每个client实例维护一个ioloop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from tornado.httpclient import AsyncHTTPClient
from tornado.ioloop import IOLoop
count = 10000
done = 0
def handle_request(response):
  global done
  done += 1
  if (done == count):
    #结束循环
    IOLoop.instance().stop()
 
  if response.error:
    print "Error:", response.error
  #else:
    #print response.body
#默认client是基于ioloop实现的,配置使用Pycurl
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient",max_clients=20)
http_client = AsyncHTTPClient()
for i in range(count):
  http_client.fetch("https://www.haiyun.me/", handle_request)
#死循环
IOLoop.instance().start()

基于epoll的multi curl在lan环境下效果不如select,因为所有Socket都在活跃状态,所有的callback都被唤醒,会导致资源的竞争。既然都是要处理所有的Socket,直接遍历是最简单最有效的方式.
为更好的性能建议libcurl/pycurl开启异步DNS解析

自用完美PHP异步并发multi curl

发布时间:December 7, 2014 // 分类:PHP // No Comments

修改自https://code.google.com/p/rolling-curl/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
<?php
   /*
   Authored by Josh Fraser (www.joshfraser.com)
   Released under Apache License 2.0
 
   Maintained by Alexander Makarov, http://rmcreative.ru/
 
   $Id$
   */
 
   /**
   * Class that represent a single curl request
   */
   class RollingCurlRequest {
      public $url = false;
      public $method = 'GET';
      public $post_data = null;
      public $headers = null;
      public $options = null;
      public $info = null;
      public $callback;
      public $recursion = false;
 
      /**
      * @param string $url
      * @param string $method
      * @param  $post_data
      * @param  $headers
      * @param  $options
      * @return void
      */
      function __construct($url, $options = null, $info = null, $method = "GET", $post_data = null, $headers = null  ) {
         $this->url = $url;
         $this->method = $method;
         $this->post_data = $post_data;
         $this->headers = $headers;
         $this->options = $options;
         $this->info = $info;
      }
 
      /**
      * @return void
      */
      public function __destruct() {
         unset($this->url, $this->method, $this->post_data, $this->headers, $this->options);
      }
   }
 
   /**
   * RollingCurl custom exception
   */
   class RollingCurlException extends Exception {
   }
 
   /**
   * Class that holds a rolling queue of curl requests.
   *
   * @throws RollingCurlException
   */
   class RollingCurl {
      /**
      * @var int
      *
      * Window size is the max number of simultaneous connections allowed.
      *
      * REMEMBER TO RESPECT THE SERVERS:
      * Sending too many requests at one time can easily be perceived
      * as a DOS attack. Increase this window_size if you are making requests
      * to multiple servers or have permission from the receving server admins.
      */
      private $window_size = 5;
 
      //private $master = 'NULL';
      //保存连接数量
      public $current_size =0;
      /**
      * @var float
      *
      * Timeout is the timeout used for curl_multi_select.
      */
      private $timeout = 10;
 
      /**
      * @var array
      *
      * Set your base options that you want to be used with EVERY request.
      */
      protected $options = array(
         CURLOPT_SSL_VERIFYPEER => 0,
         CURLOPT_RETURNTRANSFER => 1,
         CURLOPT_VERBOSE => 0,
         CURLOPT_TIMEOUT => 20,
         CURLOPT_DNS_CACHE_TIMEOUT => 3600,
         CURLOPT_CONNECTTIMEOUT => 10,
         CURLOPT_ENCODING => 'gzip,deflate',
         CURLOPT_FOLLOWLOCATION => 1,
         CURLOPT_MAXREDIRS => 2,
         CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0',
         //CURLOPT_HEADER => 1
      );
 
      /**
      * @var array
      */
      private $headers = array(
 
         'Connection: Keep-Alive',
         'Keep-Alive: 300',
         'Expect:'
      );
 
      /**
      * @var Request[]
      *
      * The request queue
      */
      private $requests = array();
 
      /**
      * @var RequestMap[]
      *
      * Maps handles to request indexes
      */
      private $requestMap = array();
 
      /**
      * @param  $callback
      * Callback function to be applied to each result.
      *
      * Can be specified as 'my_callback_function'
      * or array($object, 'my_callback_method').
      *
      * Function should take three parameters: $response, $info, $request.
      * $response is response body, $info is additional curl info.
      * $request is the original request
      *
      * @return void
      */
      function __construct($callback = null) {
         $this->callback = $callback;
      }
 
      /**
      * @param string $name
      * @return mixed
      */
      public function __get($name) {
         return (isset($this->{$name})) ? $this->{$name} : null;
      }
 
      /**
      * @param string $name
      * @param mixed $value
      * @return bool
      */
      public function __set($name, $value) {
         // append the base options & headers
         if ($name == "options" || $name == "headers") {
            $this->{$name} = $value + $this->{$name};
         } else {
            $this->{$name} = $value;
         }
         return true;
      }
 
      /**
      * Add a request to the request queue
      *
      * @param Request $request
      * @return bool
      */
      public function add($request) {
         $this->requests[] = $request;
         return true;
      }
 
      /**
      * Create new Request and add it to the request queue
      *
      * @param string $url
      * @param string $method
      * @param  $post_data
      * @param  $headers
      * @param  $options
      * @return bool
      */
      public function request($url, $method = "GET", $post_data = null, $headers = null, $options = null) {
         $this->requests[] = new RollingCurlRequest($url, $method, $post_data, $headers, $options);
         return true;
      }
 
      /**
      * Perform GET request
      *
      * @param string $url
      * @param  $headers
      * @param  $options
      * @return bool
      */
      public function get($url, $headers = null, $options = null) {
         return $this->request($url, "GET", null, $headers, $options);
      }
 
      /**
      * Perform POST request
      *
      * @param string $url
      * @param  $post_data
      * @param  $headers
      * @param  $options
      * @return bool
      */
      public function post($url, $post_data = null, $headers = null, $options = null) {
         return $this->request($url, "POST", $post_data, $headers, $options);
      }
 
      /**
      * Execute processing
      *
      * @param int $window_size Max number of simultaneous connections
      * @return string|bool
      */
      public function execute($window_size = null) {
         // rolling curl window must always be greater than 1
         if (sizeof($this->requests) == 1) {
            return $this->single_curl();
         } else {
            // start the rolling curl. window_size is the max number of simultaneous connections
            return $this->rolling_curl($window_size);
         }
      }
 
      /**
      * Performs a single curl request
      *
      * @access private
      * @return string
      */
      private function single_curl() {
         $ch = curl_init();
         $request = array_shift($this->requests);
         //获取选项及header
         $options = $this->get_options($request);
         curl_setopt_array($ch, $options);
         $output = curl_exec($ch);
         $info = curl_getinfo($ch);
         //处理错误
         if (curl_error($ch))
         $info['error'] = curl_error($ch);
 
         // it's not neccesary to set a callback for one-off requests
         if ($request->callback) {
            $callback = $request->callback;
            if (is_callable($callback)) {
               call_user_func($callback, $output, $info, $request);
            }
         }
         else
         return $output;
         return true;
      }
 
      /**
      * Performs multiple curl requests
      *
      * @access private
      * @throws RollingCurlException
      * @param int $window_size Max number of simultaneous connections
      * @return bool
      */
      private function rolling_curl($window_size = null) {
         if ($window_size)
         $this->window_size = $window_size;
 
         // make sure the rolling window isn't greater than the # of urls
         if (sizeof($this->requests) < $this->window_size)
         $this->window_size = sizeof($this->requests);
 
         if ($this->window_size < 2) {
            throw new RollingCurlException("Window size must be greater than 1");
         }
 
         $master = curl_multi_init();
 
         //首次执行填满请求
         for ($i = 0; $i < $this->window_size; $i++) {
            $ch = curl_init();
 
            $options = $this->get_options($this->requests[$i]);
 
            curl_setopt_array($ch, $options);
            curl_multi_add_handle($master, $ch);
 
            $key = (int) $ch;
            //ch重用队列
            $chs[$key] = $ch;
            //请求map,后续根据返回信息的ch获取原始请求信息
            $this->requestMap[$key] = $i;
            $this->current_size++;
         }
 
         do {
            //执行句柄内所有连接,包括后来新加入的连接
            do {
               //running变量返回正在处理的curl数量,0表示当前没有正在执行的curl
               $execrun = curl_multi_exec($master, $running);
            } while ($execrun == CURLM_CALL_MULTI_PERFORM); // 7.20.0后弃用
 
            if ($execrun != CURLM_OK)
            echo "ERROR!\n " . curl_multi_strerror($execrun);
 
            //阻塞一会等待有数据可读,返回可读数量,失败为-1,避免一直循环占用CPU
            if ($running)
            curl_multi_select($master, $this->timeout);
 
            //读取返回的连接,并加入新的连接
            while ($done = curl_multi_info_read($master)) {
 
               //获取完成的句柄
               $ch = $done['handle'];
               //获取返回的请求信息
               $info = curl_getinfo($ch);
               //获取返回内容
               $output = curl_multi_getcontent($ch);
               //处理错误信息
               //if (curl_error($ch)) 
               if ($done['result'] != CURLE_OK)
               $info['error'] = curl_error($ch);
 
               //根据请求映射是哪个请求返回的信息,即请求数组中第i个请求
               $key = (int) $ch;
               $request = $this->requests[$this->requestMap[$key]];
               //发送返回信息到回调函数
               $callback = $request->callback;
               if (is_callable($callback)) {
                  //移除请求信息和请求映射
                  unset($this->requests[$this->requestMap[$key]]);
                  unset($this->requestMap[$key]);
                  $this->current_size--;
                  //回调函数
                  call_user_func($callback, $output, $info, $request);
               }
               //删除完成的句柄
               curl_multi_remove_handle($master, $done['handle']);
 
               //判断队列内的连接是否用完
               if (isset($this->requests[$i])) {
                  //重用之前完成的ch
                  $ch = $chs[$key];
                  //var_dump($ch);
                  $options = $this->get_options($this->requests[$i]);
                  curl_setopt_array($ch, $options);
                  //增加新的连接
                  curl_multi_add_handle($master, $ch);
 
                  //添加到request Maps,用于返回信息时根据handle找到相应连接
                  $key = (int) $ch;
                  $this->requestMap[$key] = $i;
                  $this->current_size++;
                  $i++;
               }
            }
 
         } while ($this->current_size) ;
         curl_multi_close($master);
         return true;
      }
 
      //返回是否还有活动连接
      public function state() {
         return curl_multi_select($this->master, $this->timeout);
      }
 
      /**
      * Helper function to set up a new request by setting the appropriate options
      *
      * @access private
      * @param Request $request
      * @return array
      */
      private function get_options($request) {
         //获取类内选项设置
         $options = $this->__get('options');
         if (ini_get('safe_mode') == 'Off' || !ini_get('safe_mode')) {
            $options[CURLOPT_FOLLOWLOCATION] = 1;
            $options[CURLOPT_MAXREDIRS] = 5;
         }
 
         //附加类内设置到请求选项中
         if ($request->options) {
            $options = $request->options + $options;
         }
 
         //获取类内head设置
         $headers = $this->__get('headers');
 
         //附加header
         if ($request->headers) {
            $headers = $request->headers + $headers;
         }
 
         // set the request URL
         $options[CURLOPT_URL] = $request->url;
 
         // posting data w/ this request?
         if ($request->post_data) {
            $options[CURLOPT_POST] = 1;
            $options[CURLOPT_POSTFIELDS] = $request->post_data;
         }
         if ($headers) {
            $options[CURLOPT_HEADER] = 0;
            $options[CURLOPT_HTTPHEADER] = $headers;
         }
 
         return $options;
      }
 
      /**
      * @return void
      */
      public function __destruct() {
         unset($this->window_size, $this->callback, $this->options, $this->headers, $this->requests);
      }
 
      Function test() {
 
         var_dump($this->requests);
      }
   }

使用方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<?php
   require("class/RollingCurl.php");
   function callback($response, $info, $request) {
      print_r($response);
      print_r($info);
      print_r($request);
   }
   $rc = new RollingCurl();
   $rc->window_size = 2;
   for ($i = 1; $i < 1000; $i++) {
      $url = "http://www.baidu.com/";
      $request = new RollingCurlRequest($url);
      $request->options = array(CURLOPT_COOKIEJAR => '/tmp/ck.cookie', CURLOPT_COOKIEFILE => '/tmp/ck.cookie');
      $request->headers = array('Referer: https://www.haiyun.me');
      $request->callback = 'callback';
      $rc->add($request);
   }
   $res = $rc->execute();
分类
最新文章
最近回复
  • opnfense: 谢谢博主!!!解决问题了!!!我之前一直以为内置的odhcp6就是唯一管理ipv6的方式
  • liyk: 这个方法获取的IPv6大概20分钟之后就会失效,默认路由先消失,然后Global IPV6再消失
  • 海运: 不好意思,没有。
  • zongboa: 您好,請問一下有immortalwrt設定guest Wi-Fi的GUI教學嗎?感謝您。
  • 海运: 恩山有很多。
  • swsend: 大佬可以分享一下固件吗,谢谢。
  • Jimmy: 方法一 nghtp3步骤需要改成如下才能编译成功: git clone https://git...
  • 海运: 地址格式和udpxy一样,udpxy和msd_lite能用这个就能用。
  • 1: 怎么用 编译后的程序在家里路由器内任意一台设备上运行就可以吗?比如笔记本电脑 m参数是笔记本的...
  • 孤狼: ups_status_set: seems that UPS [BK650M2-CH] is ...
Statcounter