哎, 说好的每天更新单词, 怎么第二天就断了???

真香定律啊

好吧, 主要是昨天忙着给自己量身订做了一个单词爬虫, 只用输入单词列表就可以爬取相关信息了(手动机智)(走火入魔啊有木有)

主要是自己用着方便, 没太大实用意义

感受是什么?

基础不牢, 地动山摇😅

中间犯了好多错误, 耗时最长的就是没有意识到readline函数会读取换行符

import requests
import lxml
from bs4 import BeautifulSoup
import time

kv = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36 Edg/83.0.478.56'}
t = time.gmtime()

all_file = open("WordList.txt", "r", encoding="utf-8")
Date_file = open("DateTemp.txt", "r+", encoding="utf-8")
times = eval(Date_file.read().strip("\n"))
Date_file.seek(0)
Date_file.write(str(times+1))
Date_file.close()
new_file = open("EveryDayWord-{}.md".format(time.strftime("%Y.%m.%d", t)), "w", encoding="utf-8")

def initial():
    global t
    new_file.write("---\n")
    new_file.write("title: EveryDayWord-" + time.strftime("%Y.%m.%d", t)+"\n")
    new_file.write("tag: [English, 每日必做]\n")
    new_file.write("---\n\n")
    new_file.write("<br>\n\n")
    new_file.write("<center style=\"font-size:200% \"><b>加油! 这是坚持的第<span style=\"color:#B5D46E\"> {} </span>天</b></center>\n\n".format(times))
    new_file.write("<br>\n")

class Word:
    def __init__(self, value):
        self.value = value.strip('\n')

    def OnVocabulary(self):
        url = "https://www.vocabulary.com/dictionary/" + self.value
        content = requests.get(url, headers = kv)
        soup = BeautifulSoup(content.text, 'lxml')
        txt = str(soup.select('.short')[0])
        new_file.write(">\n")
        new_file.write(">" + txt+"\n")
        new_file.write("\n"+"<br>"+"\n")
        new_file.write("\n")

    def OnNetEase(self):
        url = "http://dict.youdao.com/w/" + self.value
        content = requests.get(url, headers=kv)
        soup = BeautifulSoup(content.text, 'lxml')
        txt_list = soup.select('.trans-container')[0].ul.contents
        for i in txt_list:
            if i != '\n':
                txt = i.string.split('.')
                new_file.write(">\n")
                new_file.write("><span style=\"color:red\"><b>"+txt[0]+".&nbsp;</b></span>"+txt[1]+"\n")
                new_file.write(">\n")

# 初始化开头
print("开始初始化文章结构")
initial()


# 处理Word部分
new_file.write("\n")
new_file.write("## Some Words\n\n")
All_Words = all_file.readlines()
for i in All_Words:
    if i == '\n':
        break
    print("正在处理单词: "+i.strip('\n'))
    new_file.write(">**"+i.strip('\n')+"**\n")
    a = Word(i)
    print("开始从有道词典获取资源(汉语释义)")
    try:
        a.OnNetEase()
    except:
        print("无法查到该单词")
    print("开始从Vocabulay.com获取资源")
    try:
        a.OnVocabulary()
    except:
        print("无法从Vocabulay.com查到该单词")
    # print("开始从有道词典获取资源(例句)")
    # a.OnNetEase()

new_file.write("<br>\n")
new_file.write("<center><i>释义, 例句等可能参考包括但不限于有道翻译,百度翻译等工具</i></center>\n")
new_file.write("<br>\n")
new_file.write("<center><i>特别推荐: <a href=\"http://www.vocabulary.com\">Vocabulary.com</a> 本文的英文释义选自于该网站</i></center>\n")

all_file.close()
new_file.close()

这个是示例的DateTemp

3

示例的WordList

fabulous
hello
dog
  • 为了自动显示那个第几天, 用了一个DateTemp文件用以计数(毕竟不能保证每天都更新, 所以不能按天数计时啊)
  • 其实爬虫头部不用伪装也可以爬到数据, 手残一下
  • vocabulary.com网速实在不怎么地…
  1. 现在知道了write()函数不会书写换行符
  2. read相关的会读取换行符
  3. 选择器那里, 列表中会产生’\n’