编辑中…

Requests

设置请求头headers

作用:在禁止爬取的网站中,通过反爬机制解决。设置headers信息,模拟成浏览器从而实现访问网站。
获取headers: 右键–>检查–>Network->Doc->html文件
需要按Fn+F5刷新出网页来
最常用的是user-agent和host
即按照图中显示操作
获取headers

Posting Data

参考
Content-Type in Headers:
Form Data:

1
2
3
4
POST
Content-Type: application/x-www-form-urlencoded

user=me@example.com

JSON Payload

1
2
3
4
POST
Content-Type: application/json

{"user":"me@example.com"}

Form Data

1
2
3
4
5
6
7
import requests
url = 'https://httpbin.org/post'
data = {'user':'me@example.com'}

response = requests.post(url, data=data)

print(response) # <Response [200]>

Notice that our response variable is a Response object. To be able to use this data, we need to apply a method or a property.

Text property - String

1
2
result = response.text
print(type(result)) # <class 'str'>

JSON Payload

1
2
3
4
5
6
7
8
9
10
11
12
import requests
import json
from pprint import pprint

url = 'https://httpbin.org/post'
data = {'user':'me@example.com'}

# as payload
response = requests.post(url, data=json.dumps(data))

result = response.json()
pprint(result)

By using the json.dumps method, we can convert the dictionary into a JSON-formatted string to post as a payload.
We used pprint to pretty-print our dictionary data.

Session

用于维持会话,跨请求时保持某些参数

文件读写

参考

read

1
2
3
4
>>> f = open('/Users/michael/test.txt', 'r')
>>> f.read()
'Hello world!'
>>> f.close()

python读到的内容转成str对象。
文件使用完毕后必须关闭,因为文件对象会占用操作系统的资源,并且操作系统同一时间能打开的文件数量也是有限。如果文件不存在,open()函数就会抛出一个IOError的错误,终止运行,不再调用f.close(),利用with语句解决这个问题。

1
2
with open('/path/to/file', 'r') as f:
print(f.read())

等价于

1
2
3
4
5
6
try:
f = open('/path/to/file', 'r')
print(f.read())
finally:
if f:
f.close()

此外,调用read()会一次性读取文件的全部内容,如果文件过大,内存就爆了。保险起见,可以反复调用read(size)方法,每次最多读取size个字节的内容。

write

1
2
with open('/Users/michael/test.txt', 'w') as f:
f.write('Hello, world!')

传入标识符w或者wb表示写文本文件或写二进制文件