N06 - 网页表单爬取(RPA初阶)
coding=utf-8 import os.path import requests from lxml import etree import time base_url = ‘https://spiderbuf.cn/web-scraping-practice/scraping-form-rpa’ myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/...
N05 - CSS Sprites (雪碧图)反爬
coding=utf-8 import os.path import requests from lxml import etree import time base_url = ‘https://spiderbuf.cn/web-scraping-practice/css-sprites’ myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4...
N04 - CSS伪元素反爬
coding=utf-8 import os.path import requests from lxml import etree import time base_url = ‘https://spiderbuf.cn/web-scraping-practice/css-pseudo-elements’ myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrom...
H06 - 初识浏览器指纹:Selenium是如何被反爬的
coding=utf-8 import base64 import hashlib import time import requests from lxml import etree from selenium import webdriver base_url = ‘https://spiderbuf.cn/web-scraping-practice/selenium-fingerprint-anti-scraper’ myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10....
H05 - js逆向破解时间戳反爬
coding=utf-8 import base64 import hashlib import time import requests from lxml import etree from selenium import webdriver base_url = ‘https://spiderbuf.cn/web-scraping-practice/javascript-reverse-timestamp’ myheaders = { ‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Wi...