脚本宝典收集整理的这篇文章主要介绍了面向对象保存爬虫数据 - Python,脚本宝典觉得挺不错的,现在分享给大家,也给大家做个参考。
面向对象保存保存数据。
代码:
1 """ 2 豆瓣top250四种保存方式 3 """ 4 import csv 5 import random 6 import time 7 import parsel 8 import requests 9 10 class doubanSpider(): 11 # url = 'https://movie.douban.COM/top250' 12 headers = { 13 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) Applewebkit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36', 14 'Cookie': 'cookie' 15 } 16 def __inIT__(self, url, headers=headers): 17 self.url = url 18 self.headers = headers 19 20 def getHtml(self): 21 response = requests.get(url=self.url, headers=self.headers) 22 response.encoding = response.apparent_encoding 23 response.encoding = 'utf-8' 24 return response.text 25 26 def parseHtmlByxpath(self): 27 movieListDatas = [] 28 movieDictDatas = [] 29 selector = parsel.Selector(self.gethtml()) 30 results = selector.xpath('//div/ol/li') 31 for item in results: 32 title = item.xpath('.//div[@class="hd"]/a/span[1]/text()').get() 33 movieinfo = item.xpath('.//div[@class="bd"]/p/text()').getall() 34 director = movieInfo[0].split(' ')[0].strip() 35 try: 36 actors = movieInfo[0].split(' ')[1].strip() 37 except: 38 actors = '请从详情页获取!' 39 releaseYear = movieInfo[1].split('xa0/xa0')[0].strip() 40 country = movieInfo[1].split('xa0/xa0')[1].strip() 41 movieTyPE = movieInfo[1].split('xa0/xa0')[2].strip() 42 movieStar = item.xpath('.//div[@class="star"]/span[@class="rating_num"]/text()').get() 43 reviewCount = item.xpath('.//div[@class="star"]/span[last()]/text()').get() 44 try: 45 oneWordDes = item.xpath('.//p[@class="quote"]/span/text()').get() 46 except: 47 oneWordDes = None 48 movieListDatas.append([title, director, actors, releaseYear, country, movieType, movieStar, reviewCount, oneWordDes]) 49 dit = { 50 '电影名称':title, 51 '导演':director, 52 '演员':actors, 53 '年份':releaseYear, 54 '国家':country, 55 '类型':movieType, 56 '评分':movieStar, 57 '评论总数':reviewCount, 58 '一句话描述':oneWordDes, 59 } 60 PRint(dit) 61 movieDictDatas.append(dit) 62 # print(title, director, actors, releaseYear, country, movieType, movieStar, reviewCount, oneWordDes, sep=' | ') 63 # print(movieDictDatas) 64 # print(movieListDatas) 65 66 return movieListDatas 67 def saveToCsv(self): 68 f = open('20211229豆瓣top250.csv', mode='a', encoding='utf-8-sig', newline='') 69 csvWriter = csv.DictWriter(f, fieldnames=[ 70 '电影名称', 71 '导演', 72 '演员', 73 '年份', 74 '国家', 75 '类型', 76 '评分', 77 '评论总数', 78 '一句话描述', 79 ]) 80 csvWriter.writeheader() # 写入头 81 datas = self.parseHtmlByXpath() 82 for data in datas: 83 csvWriter.writerow(data) 84 f.close() 85 86 def saveTocsv2(self): 87 f = open('20211229豆瓣250.csv', mode='a', encoding='utf-8', newline='') 88 lis = ['电影名称', 89 '导演', 90 '演员', 91 '年份', 92 '国家', 93 '类型', 94 '评分', 95 '评论总数', 96 '一句话描述',] 97 csvWriter = csv.writer(f) 98 csvWriter.writerow(lis) 99 datas = self.parseHtmlByXpath() 100 for data in datas: 101 csvWriter.writerow(data) 102 f.close() 103 104 def run(self): 105 self.saveTocsv2() 106 107 if __name__ == "__main__": 108 for start in range(0, 250+1, 25): 109 print(f'************************正在爬取{int(start/25 + 1)}页内容************************') 110 time.sleep(random.uniform(2,5)) 111 url = f'https://movie.douban.com/top250?start={start}&filter=' 112 app = douBanSpider(url=url) 113 app.run() 114 break
以上是脚本宝典为你收集整理的面向对象保存爬虫数据 - Python全部内容,希望文章能够帮你解决面向对象保存爬虫数据 - Python所遇到的问题。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。