Chrome’s Headless Browser Puppteer All In One

发布时间 2023-09-04 16:41:13作者: xgqfrms

Chrome’s Headless Browser Puppteer All In One

Chrome 无头浏览器 Puppeteer

Chrome’s Headless mode gets an upgrade: introducing --headless=new

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch({
  headless: 'new',
  // `headless: true` (default) enables old Headless;
  // `headless: 'new'` enables new Headless;
  // `headless: false` enables “headful” mode.
});

const page = await browser.newPage();
await page.goto('https://developer.chrome.com/');

// …
await browser.close();

image

https://developer.chrome.com/articles/new-headless/

errors

TimeoutError: Waiting for selector .group-container failed: Waiting failed: 30000ms exceeded ❌

// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.tesla.cn/modely/design#overview');
  await page.setViewport({width: 1920, height: 1080});
  // ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
  const searchResultSelector = '.group-container';
  const box = await page.waitForSelector(searchResultSelector);
  console.log(`box`, box)
  const text = await box?.evaluate(el => el.textContent);
  console.log(`text`, text)
  await browser.close();
})();

image

$ node ./pp.js

  /*
    Puppeteer old Headless deprecation warning:
    In the near future `headless: true` will default to the new Headless mode
    for Chrome instead of the old Headless implementation. For more
    information, please see https://developer.chrome.com/articles/new-headless/.
    Consider opting in early by passing `headless: "new"` to `puppeteer.launch()`
    If you encounter any bugs, please report them to https://github.com/puppeteer/puppeteer/issues/new/choose.
  */
/Users/xgqfrms-mm/Documents/github/cyclic-express-server/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:59
                void this.terminate(new Errors_js_1.TimeoutError(`Waiting failed: ${options.timeout}ms exceeded`));
                                    ^

TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
    at Timeout.<anonymous> (/Users/xgqfrms-mm/Documents/github/cyclic-express-server/node_modules/puppeteer-core/lib/cjs/puppeteer/common/WaitTask.js:59:37)
    at listOnTimeout (node:internal/timers:564:17)
    at process.processTimers (node:internal/timers:507:7)

Node.js v18.12.0

solutions

  1. headless: "new"
  const browser = await puppeteer.launch({
    headless: "new",
    // headless new ✅?
  });
  1. headless: false
  const browser = await puppeteer.launch({
    headless: false,
    // disabled headless  ✅?
  });
  1. page.setUserAgent
navigator.userAgent;
// 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36'

// navigator.userAgent, UA ✅
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36')

demos

// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: "new",
    // headless new ? 
  });
  const page = await browser.newPage();
  await page.goto('https://www.tesla.cn/modely/design#overview');
  await page.setViewport({width: 1920, height: 1080});
  // ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
  const searchResultSelector = '.group-container';
  const box = await page.waitForSelector(searchResultSelector);
  console.log(`box`, box)
  const text = await box?.evaluate(el => el.textContent);
  console.log(`text`, text)
  await browser.close();
})();

// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    // close headless , open chromium  ?
  });
  const page = await browser.newPage();
  await page.goto('https://www.tesla.cn/modely/design#overview');
  await page.setViewport({width: 1920, height: 1080});
  // ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
  const searchResultSelector = '.group-container';
  const box = await page.waitForSelector(searchResultSelector);
  console.log(`box`, box)
  const text = await box?.evaluate(el => el.textContent);
  console.log(`text`, text)
  await browser.close();
})();

// import puppeteer from 'puppeteer';
const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  // navigator.userAgent, UA ✅
  await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36')
  await page.goto('https://www.tesla.cn/modely/design#overview');
  await page.setViewport({width: 1920, height: 1080});
  // ❌ TimeoutError: Waiting for selector `.group-container` failed: Waiting failed: 30000ms exceeded
  const searchResultSelector = '.group-container';
  const box = await page.waitForSelector(searchResultSelector);
  console.log(`box`, box)
  const text = await box?.evaluate(el => el.textContent);
  console.log(`text`, text)
  await browser.close();
})();

(? 反爬虫测试!打击盗版⚠️)如果你看到这个信息, 说明这是一篇剽窃的文章,请访问 https://www.cnblogs.com/xgqfrms/ 查看原创文章!

Node.js 如何延迟爬取 SSR 动态渲染的页面内容

Node.js 如何延迟爬取动态渲染的页面内容

如何爬取动态渲染的页面内容 puppeteer ✅, waitUntil

https://www.cnblogs.com/xgqfrms/p/17673122.html#5207654

refs

https://github.com/puppeteer/puppeteer/issues/8166#issuecomment-1704367760

https://www.cnblogs.com/xgqfrms/p/17673122.html#5207658



©xgqfrms 2012-2021

www.cnblogs.com/xgqfrms 发布文章使用:只允许注册用户才可以访问!

原创文章,版权所有©️xgqfrms, 禁止转载 ?️,侵权必究⚠️!