Need for Speed- 1 min
Hey! This is my fifth blog post for GSoC 2019, covering week 7 and 8.
The most of week 7 was spent making Protego compatible with Google’s parser. I also worked on the documentation, since Protego codebase is small enough, proper comments and a good readme was sufficient. I uploaded Protego to PyPI -
pip install Protego that’s all it takes to install Protego.
Week 8 was quite interesting. For Protego to become default in Scrapy, it is necessary that it doesn’t throw any kind of error while parsing
robots.txt files. To make sure that, I decided to download
robots.txt from top 10,000 websites. I added tests to see if Protego throws any exceptions while parsing the downloaded
robots.txt. I benchmarked Protego, and the results were quite disappointing. You can see the result here.
We decided to spend the next week improving performance of Protego. I am going to try profiling and heuristics, and see if the performance can be improved.