소개/소소한공부

Headless 브라우저를 활용한 웹 페이지 캡처 및 데이터 추출

이영훈닷컴 2025. 2. 11. 14:52
728x90

오늘은 headless 모드를 지원하는 브라우저(Opera, Microsoft Edge, Google Chrome)를 이용해 웹 페이지의 HTML(DOM) 덤프, 스크린숏 캡처, PDF 출력 등을 커맨드라인으로 자동화하는 여러 가지 방법을 배웠다. 아래는 내가 분석한 명령어들과 각 옵션의 역할에 대한 정리이다.

주요 옵션 및 역할

--headless GUI 없이 브라우저를 실행하는 headless 모드로 전환한다.
예: 서버나 CI 환경에서 브라우저 테스트/스크래핑에 유용

--disable-gpu GPU 가속을 비활성화한다.
예: headless 환경에서 GPU 관련 오류를 방지

--screenshot 웹 페이지의 스크린샷을 캡처한다.
결과는 기본 경로나 지정한 경로에 이미지 파일로 저장됨

--dump-dom 웹 페이지의 DOM(HTML 구조)을 콘솔에 출력한다.
출력을 > 연산자를 이용해 파일로 리디렉션 가능

--print-to-pdf=<파일경로> 웹 페이지를 PDF 파일로 저장한다.
예: --print-to-pdf="aa.pdf" 또는 전체 경로 지정

--enable-logging 디버깅을 위해 로그 출력을 활성화한다.

추가 옵션들 (Chrome 관련)
--v=<숫자>: 로깅의 상세 레벨(예, --v=1 또는 --v=2)을 설정한다.
--ignore-certificate-errors-spki-list: SSL 인증서 오류를 무시하도록 설정한다.

Opera로 DOM 덤프하여 HTML 파일 저장

C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --dump-dom "https://www.website.com" > "aa.html"
분석: Opera를 headless 모드로 실행하여 지정 URL의 DOM 정보를 콘솔에 출력하고, 이를 aa.html 파일로 저장한다.
Opera로 스크린샷 캡처, DOM 덤프 및 PDF 출력 (파일명 지정)

C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf="aa.pdf" https://www.website.com
분석:
--disable-gpu로 GPU 가속을 끄고,
--screenshot 옵션으로 스크린숏을 캡처하며,
--dump-dom으로 DOM 정보를 출력하고,
--print-to-pdf="aa.pdf"로 PDF 파일을 생성한다.
Opera에서 PDF 저장 경로를 전체 경로로 지정

C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com
분석: PDF 출력 파일을 데스크탑의 aa.pdf 경로에 저장하는 방식이다.
Opera에 로깅 옵션 추가

C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
분석: 위 명령어에 --enable-logging을 추가하여 실행 중 발생하는 로그 정보를 확인할 수 있다.
Microsoft Edge를 이용한 유사 작업

C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
분석: Edge에서도 Opera와 동일한 옵션을 사용하여 headless 모드에서 웹 페이지를 캡처하고, DOM 정보를 추출하며, PDF로 저장할 수 있다.
Google Chrome을 이용한 실행 (추가 로깅 옵션 포함)

C:\Program Files\Google\Chrome\Application\chrome.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging --v=2
분석: Chrome 역시 headless 모드로 실행 가능하며, --v=2 옵션을 통해 보다 상세한 로그를 출력할 수 있다.
참고: 일부 명령어에서는 --ignore-certificate-errors-spki-list와 같은 옵션도 사용되어 SSL 인증서 오류를 무시하도록 설정할 수 있다.

오늘의 학습 포인트

Headless 브라우저 활용:
GUI 없이 웹 페이지를 자동으로 로드하여 HTML, 스크린숏, PDF 등 다양한 형태로 저장할 수 있으므로, 테스트 자동화, 웹 스크래핑, 아카이빙 등에 매우 유용하다.

출력 방식:
콘솔 출력을 파일로 리디렉션(>)하거나, 브라우저의 내장 옵션(--print-to-pdf)을 사용해 원하는 형식으로 결과물을 저장할 수 있다.

브라우저 간 공통 기능:
Opera, Edge, Chrome 모두 Chromium 기반이거나 유사한 엔진을 사용하여 비슷한 옵션을 제공하므로, 한 브라우저에서 테스트한 명령어를 다른 브라우저에서도 쉽게 적용할 수 있다.

디버깅 및 로깅:
--enable-logging 및 로깅 상세 레벨 옵션(--v)을 활용하면, headless 실행 중 발생하는 문제를 보다 쉽게 추적할 수 있다.

오늘의 학습을 통해 headless 브라우저의 다양한 명령어 옵션과 그 활용 방법을 이해하게 되었으며, 이를 바탕으로 웹 페이지 데이터 추출 및 자동화 작업에 적용할 수 있게 되었다. 앞으로 이러한 기능들을 활용하여 테스트 환경을 구축하거나 스크래핑 자동화 프로젝트에 응용해 볼 생각이다.

 

C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --dump-dom "https://www.website.com" > "aa.html"
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom "https://www.website.com" > "aa.html"
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf="aa.pdf" https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --dump-dom https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --dump-dom https://www.website.com > aa.html
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --dump-dom "https://www.website.com" > "aa.html"
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom "https://www.website.com" > "aa.html"
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf="aa.pdf" https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com
C:\Users\app\AppData\Local\Programs\Opera\launcher.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --screenshot --dump-dom https://www.website.com --enable-logging --ignore-certificate-errors-spki-list
C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --screenshot --dump-dom https://www.website.com --enable-logging --v=1
C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --screenshot --dump-dom https://www.website.com --enable-logging --v=2
C:\Program Files\Google\Chrome\Application\chrome.exe" --headless --disable-gpu --screenshot --dump-dom https://www.website.com --enable-logging
C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe --headless --disable-gpu --screenshot --dump-dom --print-to-pdf=C:\Users\app\Desktop\aa.pdf https://www.website.com --enable-logging
728x90