脚本宝典收集整理的这篇文章主要介绍了子页面数据爬取,脚本宝典觉得挺不错的,现在分享给大家,也给大家做个参考。
import requests
import re
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) Applewebkit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40"
}
url = "https://dytt89.COM"
# verify=False 去掉安全验证
resp = requests.get(url,headers=headers,verify=False)
# 指定字符集
resp.encoding = 'gbk'
PRint(resp.text)
resp.close()
import requests
import re
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKIT/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40"
}
url = "https://dytt89.com"
resp = requests.get(url,headers=headers,verify=False)
resp.encoding = 'gbk'
# 匹配子页面链接的标签的正则
obj1 = re.compile(r'2021必看热片.*?<ul>(?P<ul>.*?)</ul>',re.S)
# 匹配子页面链接的正则
obj2 = re.compile(r"<a href='(?P<href>.*?)'",re.S)
result1 = obj1.finditer(resp.text)
for i in result1:
ul = i.group('ul')
# 提取子页面链接
result2 = obj2.finditer(ul)
for j in result2:
print(j.group('href'))
resp.close()
import requests
import re
headers = {
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40"
}
url = "https://dytt89.com"
resp = requests.get(url,headers=headers,verify=False)
resp.encoding = 'gbk'
obj1 = re.compile(r'2021必看热片.*?<ul>(?P<ul>.*?)</ul>',re.S)
obj2 = re.compile(r"<a href='(?P<href>.*?)'",re.S)
obj3 = re.compile(
r'◎片 名(?P<movie>.*?)<br />.*?'
r'<td style="WORD-WRAP: break-word" bgcolor="#fDFddf"><a href="(?P<download>.*?)&tr'
,re.S)
result1 = obj1.finditer(resp.text)
# 定义一个列表存储子页面链接
Child_Href_list = []
for i in result1:
ul = i.group('ul')
result2 = obj2.finditer(ul)
for j in result2:
# 获取完整的子页面链接
Child_Href = url + j.group('href')
# 将链接添加到列表中
Child_Href_list.apPEnd(Child_Href)
for k in Child_Href_list:
Child_Resp = requests.get(k,headers=headers,verify=False)
Child_Resp.encoding = 'gbk'
# 提取电影名和种子
result3 = obj3.seArch(Child_Resp.text)
print(result3.group('movie').strip())
print(result3.group('download'))
resp.close()
以上是脚本宝典为你收集整理的子页面数据爬取全部内容,希望文章能够帮你解决子页面数据爬取所遇到的问题。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。