Protego parse! :zap:

- 1 min

Hey! This is my third blog post for GSoC 2019, covering week 3 and 4.

The first part of my project concerning interface for robots.txt parsers is almost complete. I have started working on a pure-Python robots.txt parser which I have named Protego. The name is borrowed from Harry Potter universe, where Protego is a charm that creates a shield to protect the caster. The end goal for Protego is to support all of the popular directives, wildcard matching, and a good number of less popular directives. Also, we aim to make Protego 100% compatible with Google’s robots.txt parser. We intend Protego to become the deafult robots.txt parser in Scrapy.

I have implemented support for all major directives in Protego. I have also implemented support for wildcard matching. I utilised pytest and tox to automate testing Protego on every version of Python. Furthur used Travis to run tests automatically on code push and pull requests. I borrowed tests from other parsers to check Protego on. Protego currently passes all tests borrowed from reppy, rep-cpp and robotexlusionrulesparser.

Anubhav Patel

Anubhav Patel

Life doesn’t happen to you, it happens for you. 💯

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora