python高性能爬虫框架playwright使用教程
Playwright 是微软在 2020 年初开源的新一代自动化测试工具,它的功能类似于 Selenium、Pyppeteer 等,都可以驱动浏览器进行各种自动化操作。它的功能也非常强大,对市面上的主流浏览器都提供了支持,API 功能简洁又强大。虽然诞生比较晚,但是现在发展得非常火热。
官网:点击跳转
github:点我跳转
当然他也可以用于爬虫!我们来看看他的优点
★性能相对较好-相对同类产品性能较优
★无需手动安装浏览器驱动 - 不需要手动去各大浏览器官网根据浏览器版本下载对应的驱动
★操作极为简单-引入依赖后几句话就能运行demo,极大降低了开发难度
★支持浏览器覆盖面较广-Playwright 支持当前所有主流浏览器,包括 Chrome 和 Edge(基于 Chromium)、Firefox、Safari(基于 WebKit) ,提供完善的自动化控制的 API。
★支持移动端页面测试-Playwright 支持移动端页面测试,使用设备模拟技术可以使我们在移动 Web 浏览器中测试响应式 Web 应用程序。
一、安装
环境:Python 3.7 版本及以上
安装 只需要 如下代码等待安装完成即可
pip3 install playwrightplaywright install
二、基本使用
Playwright有两种启动方式 同步和异步。以下为同步:
from playwright.sync_api import sync_playwright with sync_playwright() as p: browser = p.chromium.launch(headless=False) # 启动 chromium 浏览器 page = browser.new_page() # 打开一个标签页 page.goto("https://www.baidu.com") # 打开百度地址 print(page.title()) # 打印当前页面title browser.close() # 关闭浏览器对象
异步:
from playwright.sync_api import sync_playwright playwright = sync_playwright().start() browser = playwright.chromium.launch(headless=False) page = browser.new_page() page.goto("https://www.baidu.com/") browser.close() playwright.stop()
三、执行JavaScript
使用evaluate方法执行js
from playwright.sync_api import sync_playwright with sync_playwright() as playwright: browser = playwright.chromium.launch(headless=False) page = browser.new_page() page.goto("https://bk.yyge.net") # 执行Js page.evaluate("console.log('hello playwright')")
四、监听事件
监听页面加载的资源根据资源选择放行还是拦截
使用route 方法来设置
from playwright.sync_api import sync_playwright def intercept_request(route, request): if request.url.startswith("https://www.baidu.com"): route.abort() # 中止请求 else: route.continue_() # 通过请求 with sync_playwright() as playwright: browser = playwright.chromium.launch(headless=False) page = browser.new_page() # 监听请求并拦截 page.route("**/*", lambda route, request: intercept_request(route, request)) page.goto("https://bk.yyge.net")
五、Js与Playwright动态交互
2112
from playwright.sync_api import sync_playwright def click_fun(info): print(info) with sync_playwright() as playwright: browser = playwright.chromium.launch(headless=False) page = browser.new_page() page.goto("https://bk.yyge.net") # 执行Js 通过Js监听页面元素被点击 # 元素被点击后执行alert 弹出对话框 page.evaluate('window.addEventListener("click",(e)=>{alert(e.target.innerHTML)})') # 监听对话框弹出 page.on("dialog",lambda info:click_fun(info))
相关链接
性能对比:点击跳转
简单使用:点击跳转
rote使用:点击跳转
简单的DEMO
import time
from playwright.sync_api import sync_playwright
from playwright.sync_api import sync_playwright
def click_fun(info):
print(info)
def intercept_request(route, request):
# print(route,request)
if request.url.startswith("https://www.baidu.com"):
# print(f"Intercepted request to: {request.url}")
# route.continue() # 中止请求
route.continue_()
else:
route.continue_()
with sync_playwright() as playwright:
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
# 监听请求并拦截
page.route("**/*", lambda route, request: intercept_request(route, request))
page.goto("https://bk.yyge.net")
page.evaluate('window.addEventListener("click",(e)=>{alert(e.target.innerHTML)})')
page.on("dialog",lambda info:click_fun(info))
# time.sleep(5)
result = page.evaluate('alert("你好王大锤")')
page.wait_for_timeout(5000) # 强制等待
发表评论