Scrapy 中英双语文档 by scoful¶

This documentation contains everything you need to know about Scrapy.: 这个文档包含所有的Scrapy相关的资料。

Getting help | 获取帮助¶

Having trouble? We’d like to help!: 有问题？我们很希望能给予帮助！

Try the FAQ – it’s got answers to some common questions.
尝试看看问题集锦 – 它包含大部分大家都会遇到的问题。
Looking for specific information? Try the 索引 or 模块索引.
想找更具体的信息? 试试看索引或是模块索引 (索引或是模块索引)。
Ask or search questions in StackOverflow using the scrapy tag,
或是在 StackOverflow using the scrapy tag 上提问或直接搜索，
Search for information in the archives of the scrapy-users mailing list, or post a question.
在邮件列表里搜索 archives of the scrapy-users mailing list , 或是提问 post a question 。
Ask a question in the #scrapy IRC channel,
在这个频道上提问 #scrapy IRC channel，
Report bugs with Scrapy in our issue tracker.
在 issue tracker 上报bug。

First steps | 第一步¶

Scrapy at a glance | Scrapy一览

Understand what Scrapy is and how it can help you.: 了解Scrapy是什么及它有什么用。

Installation guide | 安装向导

Get Scrapy installed on your computer.: 安装Scrapy。

Scrapy Tutorial | Scrapy教程

Write your first Scrapy project.: 开发第一个Scrapy项目。

Examples | 例子

Learn more by playing with a pre-made Scrapy project.: 学习怎么玩转一个成品Scrapy项目。

Basic concepts | 基础概念¶

Command line tool | 命令行工具

Learn about the command-line tool used to manage your Scrapy project.: 学习用命令行模式管理Scrapy项目。

Spiders | 爬虫

Write the rules to crawl your websites.: 编写爬取网站的爬虫。

Selectors | 选择器

Extract the data from web pages using XPath.: 用XPath来从网页上提取数据。

Scrapy shell | Scrapy 命令行模式

Test your extraction code in an interactive environment.: 在互动环境测试爬虫代码。

Define the data you want to scrape.: 定义想要爬取的数据结构。

Item Loaders

Populate your items with the extracted data.: 把爬取到的数据填充到数据结构里。

Item Pipeline

Post-process and store your scraped data.: 爬取完数据后处理数据并保存。

Feed exports

Output your scraped data using different formats and storages.: 导出爬取的数据，用不同的格式存储。

Requests and Responses

Understand the classes used to represent HTTP requests and responses.: 理解用到的 HTTP requests 和 responses 。

Link Extractors

Convenient classes to extract links to follow from pages.: 便捷得爬取网页上的链接。

Learn how to configure Scrapy and see all available settings.: 学习如何配置Scrapy ，查看Scrapy所有的配置 available settings 。

See all available exceptions and their meaning.: 查看所有可能出现的异常报错。

Built-in services | 内置服务¶

Learn how to use Python’s builtin logging on Scrapy.: 学习如何在Scrapy上使用python内置的日志模块。

Stats Collection

Collect statistics about your scraping crawler.: 收集你的爬虫的统计信息。

Sending e-mail

Send email notifications when certain events occur.: 当某些事件发生时自动发邮件提醒。

Telnet Console

Inspect a running crawler using a built-in Python console.: 通过python内置的控制台检查爬虫。

Web Service

Monitor and control a crawler using a web service.: 通过一个网页服务来监控和控制爬虫。

Solving specific problems | 解决具体的问题¶

Frequently Asked Questions

Get answers to most frequently asked questions.: 看看大部分人经常问到的问题的答案。

Debugging Spiders

Learn how to debug common problems of your scrapy spider.: 学习怎么调试你的爬虫项目。

Spiders Contracts

Learn how to use contracts for testing your spiders.: 学习怎么用contracts(#todo契约？合同？)来测试你的爬虫。

Common Practices

Get familiar with some Scrapy common practices.: 熟悉一些Scrapy实践例子。

Broad Crawls

Tune Scrapy for crawling a lot domains in parallel.: 当爬很多域名的时候，如何并行优化。

Using Firefox for scraping

Learn how to scrape with Firefox and some useful add-ons.: 学习如何借助Firefox和一些有用的插件。

Using Firebug for scraping

Learn how to scrape efficiently using Firebug.: 学习如何有效的利用Firebug。

Debugging memory leaks

Learn how to find and get rid of memory leaks in your crawler.: 学习在你的爬虫项目里找到和解决内存溢出。

Downloading and processing files and images

Download files and/or images associated with your scraped items.: 下载相关的文件和图片。

Deploying Spiders

Deploying your Scrapy spiders and run them in a remote server.: 在远程服务器上部署爬虫。

AutoThrottle extension | 负载均衡拓展

Adjust crawl rate dynamically based on load.: 根据负载动态调整爬虫速度。

Benchmarking

Check how Scrapy performs on your hardware.: 检查爬虫是如何在硬件(#todo?)上运行的。

Jobs: pausing and resuming crawls

Learn how to pause and resume crawls for large spiders.: 学习如何暂停和恢复爬虫。

Extending Scrapy | 相关拓展¶

Architecture overview | 体系结构概述

Understand the Scrapy architecture.: 理解Scrapy的体系结构。

Downloader Middleware

Customize how pages get requested and downloaded.: 设计一个页面如何被请求和下载。

Spider Middleware

Customize the input and output of your spiders.: 设计爬虫的入参和出参。

Extend Scrapy with your custom functionality: 自定义拓展Scrapy的功能。

Core API 核心API

Use it on extensions and middlewares to extend Scrapy functionality: 用中间件拓展Scrapy的功能。

See all available signals and how to work with them.: 查看所有可能的信号(#todo？)及其如何运行的。

Item Exporters

Quickly export your scraped items to a file (XML, CSV, etc).: 快速导出爬取的数据到本地文件里，支持XML，CSV等等。

All the rest | 其他事项¶

Release notes

See what has changed in recent Scrapy versions.: 查看最新版本的Scrapy的改变。

Contributing to Scrapy

Learn how to contribute to the Scrapy project.: 学习如何给Scrapy本体项目做贡献。

Versioning and API Stability

Understand Scrapy versioning and API stability.: 了解Scrapy的版本管理和API稳定性。

Read the Docs v: latest

Versions: latest

Downloads: pdf; htmlzip; epub

On Read the Docs: Project Home; Builds

Free document hosting provided by Read the Docs.