抓取网络小说#

《1/13密室杀人》

  • 作者: 鸡丁

  • 副标题: 鸡丁密室推理短篇集

  • 出版年: 2013-9

鸡丁,生于上海,因喜爱吃宫保鸡丁而取此笔名,80后推理作者,天蝎座,较宅。高中时代起迷恋推理小说,钟爱江户川乱步和约翰·狄克森·卡尔。2008年开始撰写短篇推理小说和谜题,崇尚本格,热衷于密室和不可能犯罪题材,在《岁月·推理》和《推理世界》上发表《斩首缆车》《神的密室》《憎恶之锤》《雪祭》等多部密室佳作。系《推理世界》签约作者。

import requests
from bs4 import BeautifulSoup
url = 'https://www.bixiadu.com/bxd-3501/'
page = requests.get(url)
page.encoding = 'utf-8'
soup = BeautifulSoup(page.text, 'html.parser') 
urls = {i.a['href'] for i in soup.find_all('dd')}
urls = sorted(list(urls))
urls = [url+i for i in list(urls)]
urls
['https://www.bixiadu.com/bxd-3501/2065645.html',
 'https://www.bixiadu.com/bxd-3501/2065646.html',
 'https://www.bixiadu.com/bxd-3501/2065647.html',
 'https://www.bixiadu.com/bxd-3501/2065648.html',
 'https://www.bixiadu.com/bxd-3501/2065649.html',
 'https://www.bixiadu.com/bxd-3501/2065650.html',
 'https://www.bixiadu.com/bxd-3501/2065651.html',
 'https://www.bixiadu.com/bxd-3501/2065652.html',
 'https://www.bixiadu.com/bxd-3501/2065653.html',
 'https://www.bixiadu.com/bxd-3501/2065654.html',
 'https://www.bixiadu.com/bxd-3501/2065655.html',
 'https://www.bixiadu.com/bxd-3501/2065656.html',
 'https://www.bixiadu.com/bxd-3501/2065657.html',
 'https://www.bixiadu.com/bxd-3501/2065658.html',
 'https://www.bixiadu.com/bxd-3501/2065659.html',
 'https://www.bixiadu.com/bxd-3501/2065660.html',
 'https://www.bixiadu.com/bxd-3501/2065661.html',
 'https://www.bixiadu.com/bxd-3501/2065662.html',
 'https://www.bixiadu.com/bxd-3501/2065663.html',
 'https://www.bixiadu.com/bxd-3501/2065664.html',
 'https://www.bixiadu.com/bxd-3501/2065665.html',
 'https://www.bixiadu.com/bxd-3501/2065666.html',
 'https://www.bixiadu.com/bxd-3501/2065667.html',
 'https://www.bixiadu.com/bxd-3501/2065668.html',
 'https://www.bixiadu.com/bxd-3501/2065669.html',
 'https://www.bixiadu.com/bxd-3501/2065670.html']
for k, i in enumerate(urls):
    print(k, i)
    page = requests.get(i)
    page.encoding = 'utf-8'
    soup = BeautifulSoup(page.text, 'html.parser') 
    title = soup.select(".bookname")[0]('h1')[0].text
    body = soup.select('#content')[0].text
    body = body.replace('\u3000\u3000\xa0\xa0\xa0\xa0', '\n')
    story = title + '\n' + body
    with open('../data/13chapters.txt', 'a') as f:
        f.write(story)
0 https://www.bixiadu.com/bxd-3501/2065645.html
1 https://www.bixiadu.com/bxd-3501/2065646.html
2 https://www.bixiadu.com/bxd-3501/2065647.html
3 https://www.bixiadu.com/bxd-3501/2065648.html
4 https://www.bixiadu.com/bxd-3501/2065649.html
5 https://www.bixiadu.com/bxd-3501/2065650.html
6 https://www.bixiadu.com/bxd-3501/2065651.html
7 https://www.bixiadu.com/bxd-3501/2065652.html
8 https://www.bixiadu.com/bxd-3501/2065653.html
9 https://www.bixiadu.com/bxd-3501/2065654.html
10 https://www.bixiadu.com/bxd-3501/2065655.html
11 https://www.bixiadu.com/bxd-3501/2065656.html
12 https://www.bixiadu.com/bxd-3501/2065657.html
13 https://www.bixiadu.com/bxd-3501/2065658.html
14 https://www.bixiadu.com/bxd-3501/2065659.html
15 https://www.bixiadu.com/bxd-3501/2065660.html
16 https://www.bixiadu.com/bxd-3501/2065661.html
17 https://www.bixiadu.com/bxd-3501/2065662.html
18 https://www.bixiadu.com/bxd-3501/2065663.html
19 https://www.bixiadu.com/bxd-3501/2065664.html
20 https://www.bixiadu.com/bxd-3501/2065665.html
21 https://www.bixiadu.com/bxd-3501/2065666.html
22 https://www.bixiadu.com/bxd-3501/2065667.html
23 https://www.bixiadu.com/bxd-3501/2065668.html
24 https://www.bixiadu.com/bxd-3501/2065669.html
25 https://www.bixiadu.com/bxd-3501/2065670.html