Awesome Open Source
Awesome Open Source

python3-concurrency-mzitu

Python aiohttp BeautifulSoup4 requests pymongo progressbar2

1. 进度条

2. 部分截图

分析爬取的过程:

3. 爬虫系列

4. 使用方法

此代码库中只有低速同步下载版本,协程高速版本请访问: https://madmalls.com/blog/post/python3-concurrency-pics-02/

4.1 下载代码

[[email protected] ~]# git clone https://github.com/wangy8961/python3-concurrency-pics-02.git
[[email protected] ~]# cd python3-concurrency-pics-02/

4.2 准备虚拟环境

如果你的操作系统是Linux:

[[email protected] python3-concurrency-pics-02]# python3 -m venv venv3
[[email protected] python3-concurrency-pics-02]# source venv3/bin/activate

Windows激活虚拟环境的命令是: venv3\Scripts\activate

4.3 安装依赖包

如果你的操作系统是Linux:

(venv3) [[email protected] python3-concurrency-pics-02]# pip install -r requirements-linux.txt

如果你的操作系统是Windows(不会使用uvloop):

(venv3) C:\Users\wangy> pip install -r requirements-win32.txt

4.4 测试

由于图片有16万多张,所以测试的时候,你可以指定只下载100个图集来对比同步下载多线程下载异步下载的效率区别,修改以下三个脚本中的TEST_NUM = 100

建议每次测试完,都删除相关目录:

(venv3) [[email protected] python3-concurrency-pics-02]# rm -rf downloads/ logs/ __pycache__/

删除数据库记录:

(venv3) [[email protected] python3-concurrency-pics-02]# mongo
MongoDB shell version v3.6.6
connecting to: mongodb://127.0.0.1:27017
...
> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
mzitu   0.036GB
> use mzitu
switched to db mzitu
> db.dropDatabase()
{ "dropped" : "mzitu", "ok" : 1 }
> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
> 

(1) 依序下载

(venv3) [root@CentOS python3-concurrency-pics-02]# python sequential.py

(2) 多线程下载

(venv3) [root@CentOS python3-concurrency-pics-02]# python threadpool.py

(3) 异步下载

(venv3) [root@CentOS python3-concurrency-pics-02]# python asynchronous.py

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (50,867
python3 (1,525
asyncio (261
concurrency (193
requests (104
progressbar (73
coroutine (69
aiohttp (68
futures (46
threadpool (17

Find Open Source By Browsing 7,000 Topics Across 59 Categories