注册
登录
提问时间:2016/7/27 19:52:42
我们使用代理IP抓取同一站点的相关数据。代理IP从第三方购买,质量参差不齐。

目前我们有两种Spider,一种返回大块的数据,平均大概每个页面50K;另一种返回小块的数据,平均每个页面2.5K,我们在linux上部署这两种Spider,发现返回块较小的Spider始终无法将总带宽占满(带宽为10M,只占用了3.5M左右)。

我们已经排查了程序上的各种问题,也调整了各种linux配置参数,但问题带宽无法占满的现象仍没有解决。

请问我们需要这么办?

目前的系统配置如下:
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 8192 436600 268435456
net.ipv4.tcp_rmem = 32768 436600 268435456
net.ipv4.tcp_max_syn_backlog = 65536
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 32768
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_mem = 94500000 915000000 927000000
net.ipv4.tcp_max_orphans = 3276800
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_orphan_retries = 3
net.ipv4.tcp_reordering = 5
net.ipv4.tcp_low_latency = 1
net.ipv4.ip_local_port_range = 1024 85000

我们使用代理IP抓取同一站点的相关数据。代理IP从第三方购买,质量参差不齐。

目前我们有两种Spider,一种返回大块的数据,平均大概每个页面50K;另一种返回小块的数据,平均每个页面2.5K,我们在linux上部署这两种Spider,发现返回块较小的Spider始终无法将总带宽占满(带宽为10M,只占用了3.5M左右)。

我们已经排查了程序上的各种问题,也调整了各种linux配置参数,但问题带宽无法占满的现象仍没有解决。

请问我们需要这么办?

目前的系统配置如下:net.ipv4.ip_forward = 0 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.ipv4.tcp_syncookies = 1 kernel.msgmnb = 65536 kernel.msgmax = 65536 kernel.shmmax = 68719476736 kernel.shmall = 4294967296 net.ipv4.tcp_max_tw_buckets = 5000 net.ipv4.tcp_sack = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_wmem = 8192 436600 268435456 net.ipv4.tcp_rmem = 32768 436600 268435456 net.ipv4.tcp_max_syn_backlog = 65536 net.core.netdev_max_backlog = 32768 net.core.somaxconn = 32768 net.core.wmem_default = 8388608 net.core.rmem_default = 8388608 net.core.rmem_max = 268435456 net.core.wmem_max = 268435456 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_syn_retries = 2 net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.tcp_mem = 94500000 915000000 927000000 net.ipv4.tcp_max_orphans = 3276800 net.ipv4.tcp_fin_timeout = 30 net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 15 net.ipv4.tcp_retries2 = 5 net.ipv4.tcp_orphan_retries = 3 net.ipv4.tcp_reordering = 5 net.ipv4.tcp_low_latency = 1 net.ipv4.ip_local_port_range = 1024 85000… 显示全部
1楼(未知网友)

spider同时会抓网页?tcp最好7个以上通道一起工作才能容易把带宽跑满。
立即注册站大爷用户,免费试用全部产品
立即注册站大爷用户,免费试用全部产品