RepoScalewayScalewaypublished Dec 11, 2023seen 5d

scaleway/serverless-scraping-tutorial

HCL

Open original ↗

Captured source

source ↗

scaleway/serverless-scraping-tutorial

Language: HCL

Stars: 0

Forks: 0

Open issues: 0

Created: 2023-12-11T15:30:24Z

Pushed: 2024-01-15T15:59:26Z

Default branch: main

Fork: no

Archived: no

README:

Create a serverless scraping architecture

This is the code for the tutorial Create a serverless scraping architecture, with Scaleway Messaging and Queuing SQS, Serverless Functions and Managed Database.

In this tutorial we show how to set up a simple application which reads Hacker News and processes the articles it finds there asynchronously. To do so, we use Scaleway serverless products and deploy two functions:

  • A producer function, activated by a recurrent cron trigger, that scrapes HackerNews for articles published in the last 15 minutes and pushes the title and URL of the articles to an SQS queue created with Scaleway Messaging and Queuing.
  • A consumer function, triggered by each new message on the SQS queue, that consumes messages published to the queue, scrapes some data from the linked article, and then writes the data into a Scaleway Managed Database.

Requirements

This example assumes you are familiar with how serverless functions work. If needed, you can check Scaleway official documentation

You will also need Python and Terraform.

Running

cd scraper
pip install -r requirements.txt --target ./package
zip -r functions.zip handlers/ package/
cd ../consumer
pip install -r requirements.txt --target ./package
zip -r functions.zip handlers/ package/
cd ../terraform
terraform init
terraform apply