python3 request库使用基础
import requests ##调用requset库
response = requests.get('http://www.baidu.com') ##请求网站的数据下载
print(type(response)) ##返回值的类型
print(response.status_code) ##当前网站返回代码
print(type(response.text)) ##网页内容的类型
print(response.text)##网页对的具体内容html代码
print(response.cookies) ##网页的小型文本文件
Cookie,有时也用其复数形式 Cookies。类型为“小型文本文件”,是某些网站为了辨别用户身份,进行Session跟踪而储存在用户本地终端上的数据(通常经过加密),由用户客户端计算机暂时或永久保存的信息
执行结果
[root@controller ~]# python3 requerst.py
<class 'requests.models.Response'>
200
<class 'str'>
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸ä¸ï¼ä½ å°±ç¥é</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=ç¾åº¦ä¸ä¸ class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ°é»</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>å°å¾</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>è§é¢</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç»å½</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">ç»å½</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">æ´å¤äº§å</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å³äºç¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>©2017 Baidu <a href=http://www.baidu.com/duty/>使ç¨ç¾åº¦åå¿è¯»</a> <a href=http://jianyi.baidu.com/ class=cp-feedback>æè§åé¦</a> 京ICPè¯030173å· <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
还有其他请求
requests.post('http://httpbin.org/post')
requests.delete('http://httpbin.org/delete')
requests.put('http://httpbin.ort/put')
requests.head('http://httpbin.org/head')
requests.options('http:/httpbin.org/get')
get请求
httpbin.org是进行http请求 测试的
[root@controller ~]# cat requerst.py
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)
[root@controller ~]# python3 requerst.py
{
"args": {}, ##参数
"headers": { ##页眉
"Accept": "*/*", ##接收
"Accept-Encoding": "gzip, deflate", ##编码类型
"Host": "httpbin.org", ##主机
"User-Agent": "python-requests/2.24.0", ##用户代理
"X-Amzn-Trace-Id": "Root=1-60c94fac-29fd348604dc2a1120e5ccee" ##身份id
},
"origin": "58.20.91.20", ##源ip
"url": "http://httpbin.org/get"
}
带参数的get
[root@controller ~]# cat requerst.py
import requests
response = requests.get('http://httpbin.org/get?name=germey&age=22') ##添加参数
print(response.text)
执行
[root@controller ~]# python3 requerst.py
{
"args": {
"age": "22",
"name": "germey"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.24.0",
"X-Amzn-Trace-Id": "Root=1-60c9525d-6386601736ddb9e06987a0b7"
},
"origin": "58.20.91.22",
"url": "http://httpbin.org/get?name=germey&age=22"
}
还可以使用字典来添加参数
[root@controller ~]# cat requerst.py
import requests
data = {
'name':'ttt', ###这里的值输出会按照首字母排序
'age':'112'
}
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)
执行
[root@controller ~]# python3 requerst.py
{
"args": {
"age": "112",
"name": "ttt"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.24.0",
"X-Amzn-Trace-Id": "Root=1-60c95565-06933f5864afcc277effbd91"
},
"origin": "58.20.91.21",
"url": "http://httpbin.org/get?name=ttt&age=112"
}
解析json
JSON(JavaScript Object Notation, JS 对象简谱) 是一种轻量级的数据交换格式。它基于 ECMAScript (欧洲计算机协会制定的js规范)的一个子集,采用完全独立于编程语言的文本格式来存储和表示数据。简洁和清晰的层次结构使得 JSON 成为理想的数据交换语言。 易于人阅读和编写,同时也易于机器解析和生成,并有效地提升网络传输效率。
import requests
import json
response = requests.get('http://httpbin.org/get') ##请求
print(type(response.text)) ##返回值文本类型
print(response.json()) ##输出返回值的json格式
print(json.loads(response.text)) ##输出返回值json格式
print(response.text) ##输出返回值text格式
print(type(response.json())) ##返回值类型json的类型
####json.loads和response.json输出结果相同
执行
[root@controller ~]# python3 requerst.py
<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.24.0', 'X-Amzn-Trace-Id': 'Root=1-60c99b3d-75c3adbb05cd9d5c362e9e27'}, 'origin': '111.8.55.226', 'url': 'http://httpbin.org/get'}
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.24.0', 'X-Amzn-Trace-Id': 'Root=1-60c99b3d-75c3adbb05cd9d5c362e9e27'}, 'origin': '111.8.55.226', 'url': 'http://httpbin.org/get'}
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.24.0",
"X-Amzn-Trace-Id": "Root=1-60c99b3d-75c3adbb05cd9d5c362e9e27"
},
"origin": "111.8.55.226",
"url": "http://httpbin.org/get"
}
<class 'dict'>
获取二进制数据
在下载图片或者视频时,常用的方法,获取图片或者视频的二进制数据。
import requests
response = requests.get('https://dss1.bdstatic.com/70cFvXSh_Q1YnxGkpoWK1HF6hhy/it/u=3700536893,2599774500&fm=26&gp=0.jpg')
print(type(response.text),type(response.content))
print(response.text)
print(response.content)
执行后就会出现二进制代码
添加headers
在写爬虫时,如果不加header参数,当前网站就会把你当成爬虫,禁止你访问,加入headers可以有效避免
headers = {
'User-Agent': 'Mozilla/4.0(compatible; MSIE 5.5; Windows NT)', ##用来判断是否为浏览器发送的请求
'Host': 'httpbin.org' ##
}
POST请求
POST和GET请求最大的区别就在于,需要穿过form表单,使用requests库可以讲表单构造成字典类型,之后传入
[root@controller ~]# cat requerst-post.py
import requests
data= {
'name':'baixie',
'age':'22'
}
headers = {
'User-Agent':'Mozilla/5.0',
'Host':'httpbin.org'
}
url = 'http://httpbin.org/post'
response = requests.post(url,data,headers)
print(response.json())
print(response.text)
执行
root@controller ~]# python3 requerst-post.py
{'args': {}, 'data': '', 'files': {}, 'form': {'age': '22', 'name': 'baixie'}, 'heantent-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agenf5'}, 'json': None, 'origin': '111.8.55.226', 'url': 'http://httpbin.org/post'}
{
"args": {},
"data": "",
"files": {},
"form": {
"age": "22",
"name": "baixie"
},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "18",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.24.0",
"X-Amzn-Trace-Id": "Root=1-60c9a6e8-3e453e4345ef1785145916f5"
},
"json": null,
"origin": "111.8.55.226",
"url": "http://httpbin.org/post"
}
响应
response属性
[root@controller ~]# cat requerst-xiangying.py
import requests
response = requests.get('http://httpbin.org/get')
print(type(response.status_code),response.status_code)
print(type(response.headers),response.headers)
print(type(response.cookies),response.cookies)
print(type(response.url),response.url)
print(type(response.history),response.history)
print(type(response.text))
print(response.text)
执行
[root@controller ~]# python3 requerst-xiangying.py
<class 'int'> 200 ##状态代码
<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Wed, 16 Jun 2021 07:53:20 GMT', 'Content-Type': 'application/json', 'Content-Length': '305', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} ##headers值的类型headers值
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]> ##小型文本文件的类型和cookies值
<class 'str'> http://httpbin.org/get ##url类型和url值
<class 'list'> [] #history类型和history值
<class 'str'> #text类型
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.24.0",
"X-Amzn-Trace-Id": "Root=1-60c9adf0-066c3a060261c68e026817fe"
},
"origin": "111.8.55.226",
"url": "http://httpbin.org/get"
}
状态码的判断
import requests
response = requests.get('http://httpbin.org/get')
exit()if not response.status_code == requests.codes.ok else print('成功访问')
执行
[root@controller ~]# python3 requerst-xiangying.py
成功访问
高级操作
文件上传
获取cookie
Cookie,有时也用其复数形式 Cookies。类型为“小型文本文件**”,是某些网站为了辨别用户身份,进行Session跟踪而储存在用户本地终端上的数据(通常经过加密),由用户客户端计算机暂时或永久保存的信息
import requests
response = requests.get('http://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
print(key+"="+value)
.items()安装顺序返回键和值
执行
[root@controller ~]# python3 requerst-cookies.py
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315
会话维持
获取掉cookie后,就可以镜像模拟登录操作了
在requests中,如果直接利用get() post()等方法的确可以做到模拟网页的请求,但这实际上是想当于不同会话
import requests
s=requests.Session()
s.get('http://httpbin.org/cookies/set/123/12345')
r = s.get('http://httpbin.org/cookies')
print(r.text)
Session保存了当前的cookie,让服务器认为是一台浏览器发起的请求,就可以打印成出cookie
requests.Session()来发起请求,他可以模拟浏览器对服务器进行请求,维持了登录会话
执行
[root@controller ~]# python3 requerst-Session.py
{
"cookies": {
"123": "12345"
}
}
证书验证
如果访问用request请求访问https协议网站时,它会检测证书是否合法,如果检测不合法,就会抛出错误
如果想要避免错误,只需要把reques中的verify(验证)设置为false,就可以
测试
import requests
response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)
执行
200
代理设置
import requests
proxys = ({
'http':'http://192.168.137.123/dashboard', ###这里参数为你自己代理的ip
'https':'https://192.168.137.123'
})
response = requests.get("http://www.baidu.com",proxys)
print(response.status_code)
执行
[root@controller ~]# python3 requerst-proxy.py
200
超时设置
import requests
response = requests.get('http://www.baidu.com',timeout = 0.0000001)
print(response.status_code)
timeout参数时设置时间,如果访问网站的时间超过这个时间,就会报错
执行
[root@controller ~]# python3 requerst-time.py
200
认证设置
有些网站需要验证用户名和密码
import requests
from requests.auth import HTTPBasicAuth
response = requests.get('需要访问的网址',auth = HTTPBasicAuth('用户名','密码'))
print(response.status_code)
t