beautifulsoup解析html标签异常-Java 学习之路

我正在从html文件中提取一些信息 . 但是有些文件没有返回的标签 <p class="p p1"> date </p>

AttributeError: 'NoneType' object has no attribute 'strip'

并且某些文件中的日期不在标记内 . 我发现一个是：

<time content="2005-11-11T19:09:08Z" itemprop="datePublished">
 Nov. 11, 2005  2:09 PM ET
</time>

我该如何解决这两个问题？

我的代码：

month_list = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October','November', 'December', 'Jan', 'Feb', 'Aug', 'Oct', 'Dec']


def first_date_p():

    for p in soup.find_all('p', {"class": "p p1"}):
        for month in month_list:
            if month in p.get_text():
                first_date_p = p.get_text()
                date_start = first_date_p.index(month)
                date_text = first_date_p[date_start:]
                return date_text
            else:
            #if the tag exist, but do not have date.
                month = 'No Date/Error'
                return month.strip()

1 回答

0
如果要确保所选'p'标记始终包含某些文本，可以将 text 参数设置为 True ，即：
```
soup.find_all('p', {"class": "p p1"}, text=True)
```
否则，如果你想得到所有'p'，即使它们不包含任何文本，你也可以将 None 变成字符串，例如：
```
str(p.get_text()).strip()
```
至于你的第二个问题，你可以选择'time'标签的'content'属性，例如：
```
soup.find('time').get('content')
```
回复于 2024-05-19T05:15:13+08:00

beautifulsoup解析html标签异常

1 回答

相关问题