前景提要
HDC调试需求开发(15万预算),能者速来!>>>
报错日志如下:
2017-07-12 21:26:48 [scrapy.pipelines.files] WARNING: File (code: 403): Error downloading file from <GET http://t10.baidu.com/it/u=1495155540,1076493806&fm=55&s=BF904F831EEF3E8C6781B5210300E0F1&w=121&h=81&img.JPEG> referred in <None>
2017-07-12 21:26:48 [scrapy.core.scraper] WARNING: Dropped: Item contains no images
红色部分,是自己写的代码
def item_completed(self, results, item, info):
if item.__class__.__name__ != 'NewsImagesItem':
return item
image_path = [x['path'] for ok,x in results if ok]
if not image_path:
raise DropItem(' Item contains no images')
imagePipiline 配置都没有错误,日志里显示图片已经开始下载
图片地址也没有错误,部分图片是可以下载下来的
另外:图片地址在浏览器里多次尝试也会出现403 Forbidden 错误
应该是网站的防爬策略,怎么解决呢