refactor: migrate crawler to Scrapy framework (!1) · Merge requests · h-da / CERT-Bund Security Advisory Crawler · GitLab

Refactored the CERT-Bund security advisory crawler from a basic Python script to a full Scrapy-based implementation for improved scalability and maintainability.

Changes:

Replaced simple requests-based crawler with Scrapy spider
Implemented proper Scrapy architecture (spiders, items, pipelines)
Added comprehensive .gitignore for Python, Scrapy, IDE, and OS files
Moved database operations to dedicated pipeline class
Updated README with Scrapy-specific usage and configuration
Added configurable fetch size and database path via Scrapy settings
Maintained all existing features (change tracking, database structure)

Benefits:

More maintainable and scalable architecture
Better separation of concerns (spider, items, pipelines)
Leverages Scrapy's built-in features for crawling and data handling
Easier to extend with additional spiders or export formats
Improved logging and error handling through Scrapy framework