Scrapy 中英双语文档 by scoful

This documentation contains everything you need to know about Scrapy.
这个文档包含所有的Scrapy相关的资料。

Getting help | 获取帮助

Having trouble? We’d like to help!
有问题?我们很希望能给予帮助!

First steps | 第一步

Scrapy at a glance | Scrapy一览
Understand what Scrapy is and how it can help you.
了解Scrapy是什么及它有什么用。
Installation guide | 安装向导
Get Scrapy installed on your computer.
安装Scrapy。
Scrapy Tutorial | Scrapy教程
Write your first Scrapy project.
开发第一个Scrapy项目。
Examples | 例子
Learn more by playing with a pre-made Scrapy project.
学习怎么玩转一个成品Scrapy项目。

Basic concepts | 基础概念

Command line tool | 命令行工具
Learn about the command-line tool used to manage your Scrapy project.
学习用命令行模式管理Scrapy项目。
Spiders | 爬虫
Write the rules to crawl your websites.
编写爬取网站的爬虫。
Selectors | 选择器
Extract the data from web pages using XPath.
用XPath来从网页上提取数据。
Scrapy shell | Scrapy 命令行模式
Test your extraction code in an interactive environment.
在互动环境测试爬虫代码。
Items
Define the data you want to scrape.
定义想要爬取的数据结构。
Item Loaders
Populate your items with the extracted data.
把爬取到的数据填充到数据结构里。
Item Pipeline
Post-process and store your scraped data.
爬取完数据后处理数据并保存。
Feed exports
Output your scraped data using different formats and storages.
导出爬取的数据,用不同的格式存储。
Requests and Responses
Understand the classes used to represent HTTP requests and responses.
理解用到的 HTTP requests 和 responses 。
Link Extractors
Convenient classes to extract links to follow from pages.
便捷得爬取网页上的链接。
Settings
Learn how to configure Scrapy and see all available settings.
学习如何配置Scrapy ,查看Scrapy所有的配置 available settings
Exceptions
See all available exceptions and their meaning.
查看所有可能出现的异常报错。

Built-in services | 内置服务

Logging
Learn how to use Python’s builtin logging on Scrapy.
学习如何在Scrapy上使用python内置的日志模块。
Stats Collection
Collect statistics about your scraping crawler.
收集你的爬虫的统计信息。
Sending e-mail
Send email notifications when certain events occur.
当某些事件发生时自动发邮件提醒。
Telnet Console
Inspect a running crawler using a built-in Python console.
通过python内置的控制台检查爬虫。
Web Service
Monitor and control a crawler using a web service.
通过一个网页服务来监控和控制爬虫。

Solving specific problems | 解决具体的问题

Frequently Asked Questions
Get answers to most frequently asked questions.
看看大部分人经常问到的问题的答案。
Debugging Spiders
Learn how to debug common problems of your scrapy spider.
学习怎么调试你的爬虫项目。
Spiders Contracts
Learn how to use contracts for testing your spiders.
学习怎么用contracts(#todo契约?合同?)来测试你的爬虫。
Common Practices
Get familiar with some Scrapy common practices.
熟悉一些Scrapy实践例子。
Broad Crawls
Tune Scrapy for crawling a lot domains in parallel.
当爬很多域名的时候,如何并行优化。
Using Firefox for scraping
Learn how to scrape with Firefox and some useful add-ons.
学习如何借助Firefox和一些有用的插件。
Using Firebug for scraping
Learn how to scrape efficiently using Firebug.
学习如何有效的利用Firebug。
Debugging memory leaks
Learn how to find and get rid of memory leaks in your crawler.
学习在你的爬虫项目里找到和解决内存溢出。
Downloading and processing files and images
Download files and/or images associated with your scraped items.
下载相关的文件和图片。
Deploying Spiders
Deploying your Scrapy spiders and run them in a remote server.
在远程服务器上部署爬虫。
AutoThrottle extension | 负载均衡拓展
Adjust crawl rate dynamically based on load.
根据负载动态调整爬虫速度。
Benchmarking
Check how Scrapy performs on your hardware.
检查爬虫是如何在硬件(#todo?)上运行的。
Jobs: pausing and resuming crawls
Learn how to pause and resume crawls for large spiders.
学习如何暂停和恢复爬虫。

Extending Scrapy | 相关拓展

Architecture overview | 体系结构概述
Understand the Scrapy architecture.
理解Scrapy的体系结构。
Downloader Middleware
Customize how pages get requested and downloaded.
设计一个页面如何被请求和下载。
Spider Middleware
Customize the input and output of your spiders.
设计爬虫的入参和出参。
Extensions
Extend Scrapy with your custom functionality
自定义拓展Scrapy的功能。
Core API 核心API
Use it on extensions and middlewares to extend Scrapy functionality
用中间件拓展Scrapy的功能。
Signals
See all available signals and how to work with them.
查看所有可能的信号(#todo?)及其如何运行的。
Item Exporters
Quickly export your scraped items to a file (XML, CSV, etc).
快速导出爬取的数据到本地文件里,支持XML,CSV等等。

All the rest | 其他事项

Release notes
See what has changed in recent Scrapy versions.
查看最新版本的Scrapy的改变。
Contributing to Scrapy
Learn how to contribute to the Scrapy project.
学习如何给Scrapy本体项目做贡献。
Versioning and API Stability
Understand Scrapy versioning and API stability.
了解Scrapy的版本管理和API稳定性。