Awesome Open Source
Awesome Open Source

python3-concurrency-mzitu

Python aiohttp BeautifulSoup4 requests pymongo progressbar2

1. 进度条

2. 部分截图

分析爬取的过程:

3. 爬虫系列

4. 使用方法

此代码库中只有低速同步下载版本,协程高速版本请访问: https://madmalls.com/blog/post/python3-concurrency-pics-02/

4.1 下载代码

[[email protected] ~]# git clone https://github.com/wangy8961/python3-concurrency-pics-02.git
[[email protected] ~]# cd python3-concurrency-pics-02/

4.2 准备虚拟环境

如果你的操作系统是Linux:

[[email protected] python3-concurrency-pics-02]# python3 -m venv venv3
[[email protected] python3-concurrency-pics-02]# source venv3/bin/activate

Windows激活虚拟环境的命令是: venv3\Scripts\activate

4.3 安装依赖包

如果你的操作系统是Linux:

(venv3) [[email protected] python3-concurrency-pics-02]# pip install -r requirements-linux.txt

如果你的操作系统是Windows(不会使用uvloop):

(venv3) C:\Users\wangy> pip install -r requirements-win32.txt

4.4 测试

由于图片有16万多张,所以测试的时候,你可以指定只下载100个图集来对比同步下载多线程下载异步下载的效率区别,修改以下三个脚本中的TEST_NUM = 100

建议每次测试完,都删除相关目录:

(venv3) [[email protected] python3-concurrency-pics-02]# rm -rf downloads/ logs/ __pycache__/

删除数据库记录:

(venv3) [[email protected] python3-concurrency-pics-02]# mongo
MongoDB shell version v3.6.6
connecting to: mongodb://127.0.0.1:27017
...
> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
mzitu   0.036GB
> use mzitu
switched to db mzitu
> db.dropDatabase()
{ "dropped" : "mzitu", "ok" : 1 }
> show dbs
admin   0.000GB
config  0.000GB
local   0.000GB
> 

(1) 依序下载

(venv3) [[email protected] python3-concurrency-pics-02]# python sequential.py

(2) 多线程下载

(venv3) [[email protected] python3-concurrency-pics-02]# python threadpool.py

(3) 异步下载

(venv3) [[email protected] python3-concurrency-pics-02]# python asynchronous.py

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (1,141,503
Python3 (33,113
Requests (2,001
Asyncio (1,630
Coroutines (1,358
Concurrency (1,297
Progress Bar (692
Aiohttp (475
Futures (433
Thread Pool (354
Related Projects