python3 request库使用基础

import requests ##调用requset库
response = requests.get('http://www.baidu.com') ##请求网站的数据下载
print(type(response)) ##返回值的类型
print(response.status_code) ##当前网站返回代码
print(type(response.text)) ##网页内容的类型
print(response.text)##网页对的具体内容html代码
print(response.cookies)  ##网页的小型文本文件

Cookie,有时也用其复数形式 Cookies。类型为“小型文本文件”,是某些网站为了辨别用户身份,进行Session跟踪而储存在用户本地终端上的数据(通常经过加密),由用户客户端计算机暂时或永久保存的信息

执行结果

[root@controller ~]# python3 requerst.py 
<class 'requests.models.Response'>
200
<class 'str'>
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸ä¸ï¼ä½ å°±ç¥é</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=ç¾åº¦ä¸ä¸ class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ°é»</a> <a href=http://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>å°å¾</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>è§é¢</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç»å½</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">ç»å½</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">æ´å¤äº§å</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å³äºç¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使ç¨ç¾åº¦åå¿è¯»</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>æè§åé¦</a>&nbsp;京ICPè¯030173å·&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
还有其他请求
requests.post('http://httpbin.org/post')
requests.delete('http://httpbin.org/delete')
requests.put('http://httpbin.ort/put')
requests.head('http://httpbin.org/head')
requests.options('http:/httpbin.org/get')

get请求

httpbin.org是进行http请求 测试的

[root@controller ~]# cat requerst.py 
import requests
response = requests.get('http://httpbin.org/get')
print(response.text)

[root@controller ~]# python3 requerst.py 
{
  "args": {},                           ##参数
  "headers": {                          ##页眉
    "Accept": "*/*",                    ##接收
    "Accept-Encoding": "gzip, deflate",  ##编码类型
    "Host": "httpbin.org",              ##主机
    "User-Agent": "python-requests/2.24.0",  ##用户代理
    "X-Amzn-Trace-Id": "Root=1-60c94fac-29fd348604dc2a1120e5ccee" ##身份id
  }, 
  "origin": "58.20.91.20",  ##源ip
  "url": "http://httpbin.org/get"
}
带参数的get
[root@controller ~]# cat requerst.py 
import requests
response = requests.get('http://httpbin.org/get?name=germey&age=22') ##添加参数
print(response.text)

执行

[root@controller ~]# python3 requerst.py 
{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.24.0", 
    "X-Amzn-Trace-Id": "Root=1-60c9525d-6386601736ddb9e06987a0b7"
  }, 
  "origin": "58.20.91.22", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}
还可以使用字典来添加参数
 [root@controller ~]# cat requerst.py 
import requests
data = {
'name':'ttt',   ###这里的值输出会按照首字母排序
'age':'112'

}
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)

执行

[root@controller ~]# python3 requerst.py 
{
  "args": {
    "age": "112", 
    "name": "ttt"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.24.0", 
    "X-Amzn-Trace-Id": "Root=1-60c95565-06933f5864afcc277effbd91"
  }, 
  "origin": "58.20.91.21", 
  "url": "http://httpbin.org/get?name=ttt&age=112"
}
解析json

JSON(JavaScript Object Notation, JS 对象简谱) 是一种轻量级的数据交换格式。它基于 ECMAScript (欧洲计算机协会制定的js规范)的一个子集,采用完全独立于编程语言的文本格式来存储和表示数据。简洁和清晰的层次结构使得 JSON 成为理想的数据交换语言。 易于人阅读和编写,同时也易于机器解析和生成,并有效地提升网络传输效率。

import requests
import json

response = requests.get('http://httpbin.org/get') ##请求
print(type(response.text))  ##返回值文本类型
print(response.json())      ##输出返回值的json格式
print(json.loads(response.text)) ##输出返回值json格式
print(response.text)    ##输出返回值text格式
print(type(response.json()))    ##返回值类型json的类型
####json.loads和response.json输出结果相同

执行

[root@controller ~]# python3 requerst.py 
<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.24.0', 'X-Amzn-Trace-Id': 'Root=1-60c99b3d-75c3adbb05cd9d5c362e9e27'}, 'origin': '111.8.55.226', 'url': 'http://httpbin.org/get'}
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.24.0', 'X-Amzn-Trace-Id': 'Root=1-60c99b3d-75c3adbb05cd9d5c362e9e27'}, 'origin': '111.8.55.226', 'url': 'http://httpbin.org/get'}
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.24.0", 
    "X-Amzn-Trace-Id": "Root=1-60c99b3d-75c3adbb05cd9d5c362e9e27"
  }, 
  "origin": "111.8.55.226", 
  "url": "http://httpbin.org/get"
}

<class 'dict'>
获取二进制数据

在下载图片或者视频时,常用的方法,获取图片或者视频的二进制数据。

import requests


response = requests.get('https://dss1.bdstatic.com/70cFvXSh_Q1YnxGkpoWK1HF6hhy/it/u=3700536893,2599774500&fm=26&gp=0.jpg')
print(type(response.text),type(response.content))
print(response.text)
print(response.content)

执行后就会出现二进制代码

添加headers

在写爬虫时,如果不加header参数,当前网站就会把你当成爬虫,禁止你访问,加入headers可以有效避免

headers = {
'User-Agent': 'Mozilla/4.0(compatible; MSIE 5.5; Windows NT)', ##用来判断是否为浏览器发送的请求
'Host': 'httpbin.org' ##
}

POST请求

POST和GET请求最大的区别就在于,需要穿过form表单,使用requests库可以讲表单构造成字典类型,之后传入

[root@controller ~]# cat requerst-post.py 


import requests
data= {
'name':'baixie',
'age':'22'
}

headers = {
'User-Agent':'Mozilla/5.0',
'Host':'httpbin.org'
}
url = 'http://httpbin.org/post'
response = requests.post(url,data,headers)
print(response.json())
print(response.text)

执行

root@controller ~]# python3 requerst-post.py 

{'args': {}, 'data': '', 'files': {}, 'form': {'age': '22', 'name': 'baixie'}, 'heantent-Type': 'application/x-www-form-urlencoded', 'Host': 'httpbin.org', 'User-Agenf5'}, 'json': None, 'origin': '111.8.55.226', 'url': 'http://httpbin.org/post'}

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "22", 
    "name": "baixie"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "18", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.24.0", 
    "X-Amzn-Trace-Id": "Root=1-60c9a6e8-3e453e4345ef1785145916f5"
  }, 
  "json": null, 
  "origin": "111.8.55.226", 
  "url": "http://httpbin.org/post"
}

响应

response属性
[root@controller ~]# cat requerst-xiangying.py 
import requests 

response = requests.get('http://httpbin.org/get')
print(type(response.status_code),response.status_code)
print(type(response.headers),response.headers)
print(type(response.cookies),response.cookies)
print(type(response.url),response.url)
print(type(response.history),response.history)
print(type(response.text))
print(response.text)

执行

[root@controller ~]# python3 requerst-xiangying.py

<class 'int'> 200 ##状态代码

<class 'requests.structures.CaseInsensitiveDict'> {'Date': 'Wed, 16 Jun 2021 07:53:20 GMT', 'Content-Type': 'application/json', 'Content-Length': '305', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'} ##headers值的类型headers值

<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]> ##小型文本文件的类型和cookies值

<class 'str'> http://httpbin.org/get ##url类型和url值

<class 'list'> [] #history类型和history值

<class 'str'> #text类型

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.24.0", 
    "X-Amzn-Trace-Id": "Root=1-60c9adf0-066c3a060261c68e026817fe"
  }, 
  "origin": "111.8.55.226", 
  "url": "http://httpbin.org/get"
}
状态码的判断
import requests 

response = requests.get('http://httpbin.org/get')
exit()if not response.status_code == requests.codes.ok else print('成功访问')

执行

[root@controller ~]# python3 requerst-xiangying.py 
成功访问

高级操作

文件上传
获取cookie

Cookie,有时也用其复数形式 Cookies。类型为“小型文本文件**”,是某些网站为了辨别用户身份,进行Session跟踪而储存在用户本地终端上的数据(通常经过加密),由用户客户端计算机暂时或永久保存的信息

import requests 


response = requests.get('http://www.baidu.com')
print(response.cookies)
for key,value in response.cookies.items():
    print(key+"="+value)

.items()安装顺序返回键和值

执行

[root@controller ~]# python3 requerst-cookies.py 
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
BDORZ=27315
会话维持

获取掉cookie后,就可以镜像模拟登录操作了

在requests中,如果直接利用get() post()等方法的确可以做到模拟网页的请求,但这实际上是想当于不同会话

import requests

s=requests.Session()
s.get('http://httpbin.org/cookies/set/123/12345')
r = s.get('http://httpbin.org/cookies')
print(r.text)

Session保存了当前的cookie,让服务器认为是一台浏览器发起的请求,就可以打印成出cookie

​ requests.Session()来发起请求,他可以模拟浏览器对服务器进行请求,维持了登录会话

执行

[root@controller ~]# python3 requerst-Session.py 
{
  "cookies": {
    "123": "12345"
  }
}

证书验证

​ 如果访问用request请求访问https协议网站时,它会检测证书是否合法,如果检测不合法,就会抛出错误

如果想要避免错误,只需要把reques中的verify(验证)设置为false,就可以

测试

import requests

response = requests.get('https://www.12306.cn',verify=False)
print(response.status_code)

执行

200
代理设置
import requests

proxys = ({
'http':'http://192.168.137.123/dashboard',  ###这里参数为你自己代理的ip
'https':'https://192.168.137.123'
})

response = requests.get("http://www.baidu.com",proxys)
print(response.status_code)

执行

[root@controller ~]# python3 requerst-proxy.py 
200
超时设置
import requests
response = requests.get('http://www.baidu.com',timeout = 0.0000001)
print(response.status_code)    

timeout参数时设置时间,如果访问网站的时间超过这个时间,就会报错

执行

[root@controller ~]# python3 requerst-time.py 
200
认证设置

有些网站需要验证用户名和密码

import requests

from requests.auth import HTTPBasicAuth

response = requests.get('需要访问的网址',auth = HTTPBasicAuth('用户名','密码'))
print(response.status_code)

t