Find Bright Data on their YouTube channel: @BrightData
Learn scraping from scratch with this complete training. This training is conducted with Python, but all the theory of scraping and bypassing blocks can apply to any language that allows scraping.
----------------------------------------------------------
PREREQUISITES:
----------------------------------------------------------
๐ Script sources
๐ฅ๏ธ Create your VPS on Infomaniak
๐ My complete Python training on Udemy (+60h of training)
๐ Subscribe to Docstring
๐ฌ Join us on the Discord server
----------------------------------------------------------
===== CHAPTERS =====
00:00:00 Introduction
00:03:13 The training program
00:07:58 Definition of scraping
00:08:56 The prerequisites
00:11:06 The obstacles (and the solution)
00:13:20 PART 1: the basics of scraping
00:18:26 Retrieve the content of a page with requests
00:24:35 Analyze the content of a page with BeautifulSoup
00:33:41 Retrieve information with BeautifulSoup
00:43:03 Analyze the homepage of books
00:54:56 Your turn!
01:04:32 Simple exercises: Introduction
01:06:08 Retrieve categories with a single book
01:08:40 Solution
01:32:01 Retrieve books rated 1 star
01:35:44 Solution
02:08:18 Advanced exercise: Introduction
02:09:08 Exercise statement
02:10:23 Presentation of Selectolax and Loguru
02:18:04 Preparation of a specifications document
02:28:32 Creation of the script body
02:47:46 Retrieve the price of a book
03:12:41 Retrieve all URLs on a page
03:24:48 Retrieve the URL of the next page
03:30:54 Retrieve all URLs from the bookstore
03:38:44 Retrieve the total value of the bookstore
03:46:51 Optimize our script with sessions
03:53:09 Conclusion
03:53:59 PART 2: bypassing obstacles
03:55:57 What the law says
03:56:38 The T&Cs
03:59:25 The GDPR
04:00:49 The case betweenparticuliers.com VS Leboncoin
04:01:58 Examples of lawful and unlawful scraping
04:04:59 The robots.txt file
04:09:10 Interview with Rony SHALIT
04:46:29 Technical blocks
04:50:43 Voluntary blocks
04:52:04 Blocking by request limitation
04:59:18 Blocking with the user-agent
05:04:55 Presentation of Playwright
05:10:46 Use playwright to display JavaScript
05:20:14 Interact with the DOM
05:26:22 Essential methods to know
05:37:45 The Bright Data solution
05:38:43 Overview of the platform
05:45:04 Create your account on Bright Data
05:48:28 Use the residential proxy network
05:57:59 Use the web unlocker
06:02:12 Use the scraping browser
06:09:47 PART 3: Retrieve data on AirBnB
06:11:01 Preparation for ethical scraping
06:15:04 Analyze the site to prepare for scraping
06:20:44 Create the project and install the libraries
06:24:21 Simple scraping with requests
06:29:15 Save the HTML to disk
06:34:57 Retrieve the HTML from disk
06:42:39 Retrieve price data
07:03:49 Run the script from the command line
07:06:11 Advanced scraping with Playwright
07:15:46 Go through all the pages
07:25:09 Use Bright Data's scraping browser
07:33:44 Automate opening the debugger
07:39:11 Minimize bandwidth
07:43:20 Navigate to the search page
07:52:09 Move to the next month
08:09:57 Scroll through the months
08:22:14 Retrieve the price and finalize the script
08:34:01 PART 4: E-commerce alert system
08:35:16 The tools used
08:38:01 Preparation for ethical scraping
08:39:55 Retrieve the HTML with requests
08:52:47 Add environment variables
08:54:57 Use the Web Unlocker
09:00:09 Keep a history of values on disk
09:04:45 Compare the current value with the previous one
09:08:17 Add the alert function with Pushover
09:11:27 Add the logger
09:17:44 Complete the main function
09:28:02 Send files to the VPS
09:32:41 Create a Cron Job
09:39:17 Remove the warning with urllib
09:40:45 Add Sentry alerts
09:50:22 Outro