scaleway/serverless-scraping-tutorial
HCL
Captured source
source ↗GH
Source ↗published Dec 11, 2023seen 5dcaptured 8hhttp 200method plain
scaleway/serverless-scraping-tutorial
Language: HCL
Stars: 0
Forks: 0
Open issues: 0
Created: 2023-12-11T15:30:24Z
Pushed: 2024-01-15T15:59:26Z
Default branch: main
Fork: no
Archived: no
README:
Create a serverless scraping architecture
This is the code for the tutorial Create a serverless scraping architecture, with Scaleway Messaging and Queuing SQS, Serverless Functions and Managed Database.
In this tutorial we show how to set up a simple application which reads Hacker News and processes the articles it finds there asynchronously. To do so, we use Scaleway serverless products and deploy two functions:
- A producer function, activated by a recurrent cron trigger, that scrapes HackerNews for articles published in the last 15 minutes and pushes the title and URL of the articles to an SQS queue created with Scaleway Messaging and Queuing.
- A consumer function, triggered by each new message on the SQS queue, that consumes messages published to the queue, scrapes some data from the linked article, and then writes the data into a Scaleway Managed Database.
Requirements
This example assumes you are familiar with how serverless functions work. If needed, you can check Scaleway official documentation
You will also need Python and Terraform.
Running
cd scraper pip install -r requirements.txt --target ./package zip -r functions.zip handlers/ package/ cd ../consumer pip install -r requirements.txt --target ./package zip -r functions.zip handlers/ package/ cd ../terraform terraform init terraform apply