Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Nodriver] CDP DownloadWillBegin event not working #2060

Open
abhash-rai opened this issue Oct 30, 2024 · 7 comments
Open

[Nodriver] CDP DownloadWillBegin event not working #2060

abhash-rai opened this issue Oct 30, 2024 · 7 comments

Comments

@abhash-rai
Copy link

Hello. I just wanted to make a nodriver script that can detect downloads and its status. I tried to print logging "Download Started..." whenever tab download begins but it doesn't print the log which leads me to believe the event in my script is not setup correctly. I would also be grateful if anyone can add the functionality to detect download completion if download was started. Thanks in advance!

import asyncio
import time
import nodriver as uc
from nodriver import cdp

download_status = False

def listen_download(page):
    async def handler(evt):
        global download_status
        download_status = True
        print("Download started...")  # Add logging

    page.add_handler(cdp.browser.DownloadWillBegin, handler)

async def crawl():
    global download_status

    browser = await uc.start(headless=False)

    # Use the main tab
    tab = await browser.get('about:blank')

    listen_download(tab)

    # Navigate to the PDF URL
    pdf_url = "https://www.python.org/ftp/python/3.13.0/python-3.13.0-amd64.exe"
    print(f"Navigating to {pdf_url}...")
    await tab.get(pdf_url)


    # Keep the script running to monitor the download
    print("Monitoring downloads... Press Ctrl+C to exit.")

    while True:
        if download_status:
            print('Hurray')
        await asyncio.sleep(0.2)  # Keep the event loop alive

if __name__ == '__main__':
    uc.loop().run_until_complete(crawl())
@ultrafunkamsterdam
Copy link
Owner

You need handlers for both cdp.page.
DownloadWillBegin and DownloadProgress

@abhash-rai
Copy link
Author

abhash-rai commented Nov 1, 2024

Thanks this worked:

import asyncio
import nodriver as uc
from nodriver import cdp

binded_tabs = []
async def bind_handlers(browser):
    global binded_tabs
    while True:
        await asyncio.sleep(0.01)
        for tab in browser.tabs:
            if tab not in binded_tabs:
                tab.add_handler(cdp.page.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))        
                binded_tabs.append(tab)

async def crawl():    

    browser = await uc.start(headless=False)

    asyncio.create_task(bind_handlers(browser))

    await browser.get("https://www.python.org/ftp/python/3.13.0/python-3.13.0-amd64.exe")
    await browser.get("https://code.visualstudio.com/sha/download?build=stable&os=win32-x64-user", new_tab=True)

    while True:
        await asyncio.sleep(0.2)  # Keep the event loop alive

if __name__ == '__main__':
    uc.loop().run_until_complete(crawl())

However sometimes when clicking a download button it redirects to a new tab entirely from where download will begin. In this case it doesn't detect download.

I want to be able to add handlers to every opened tab current or future. How can I do this? Is there a cdp event for this as well? I checked out cdp.browser and it has DownloadWillBegin event class but when I use cdp.browser.DownloadWillBegin to above code the function to be called on download start event which in this case is a basic lambda logging function is not called.

My aim is to detect download at browser level across every tab currently opened or future tabs.

@ultrafunkamsterdam
Copy link
Owner

Yes there are 2 "domains" where this download events could be set but afaik it was just redundant but you could try setting it on the browser (for nodriver you then set it on browser.connection.add_handler)

@abhash-rai
Copy link
Author

I haven't tried the method you suggested: browser.connection.add_handler with cdp.browser.DownloadWillBegin but tab.add_handler with cdp.page.DownloadWillBegin seems to work as I expected confirmed by test below:

import time
import asyncio
import nodriver as uc
from nodriver import cdp

binded_tabs = []
async def bind_handlers(browser):
    global binded_tabs
    while True:
        await asyncio.sleep(0.01)
        for tab in browser.tabs:
            if tab not in binded_tabs:
                tab.add_handler(cdp.page.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))        
                binded_tabs.append(tab)

async def crawl():    

    browser = await uc.start(headless=False)

    asyncio.create_task(bind_handlers(browser))

    tab1 = await browser.get("https://code.visualstudio.com/sha/download?build=stable&os=win32-x64-user")

    tab2 = await browser.get("https://journals.lww.com/anesthesia-analgesia/fulltext/2024/05000/special_communication__response_to__ensuring_a.2.aspx", new_tab=True)
    
    time.sleep(5)
    
    pdf_button = await tab2.find("//button[contains(., 'PDF')]")
    await pdf_button.click()

    while True:
        await asyncio.sleep(0.2)  # Keep the event loop alive

if __name__ == '__main__':
    uc.loop().run_until_complete(crawl())

@JoaoSobhie
Copy link

@abhash-rai I tried this and didn't work. It's funny because when I start the browser with uc.start and change the download path with set_download_path and cdp.browser.set_download_behavior and click on elements e ask to download by hand, it download on the path that I've asked but if I do with automation, I get the modal asking where to save. Any idea @ultrafunkamsterdam?

@abhash-rai abhash-rai reopened this Nov 20, 2024
@abhash-rai
Copy link
Author

Yes there are 2 "domains" where this download events could be set but afaik it was just redundant but you could try setting it on the browser (for nodriver you then set it on browser.connection.add_handler)

Although tab.add_handler(cdp.page.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid)) worked in most cases, it fails to detect download start events on cases like where pressing anchor tag element redirects to a new tab which starts download and then quickly closes the tab.

@ultrafunkamsterdam I tried to follow what you said on setting it on the browser (for nodriver you then set it on browser.connection.add_handler)

I tried these:

browser.connection.add_handler(cdp.page.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))
browser.connection.add_handler(cdp.browser.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))        

but it doesn't work. Here's my full code:

import time
import asyncio
import nodriver as uc
from nodriver import cdp

async def crawl():    

    browser = await uc.start(headless=False)
    time.sleep(2)

    browser.connection.add_handler(cdp.page.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))
    browser.connection.add_handler(cdp.browser.DownloadWillBegin, lambda event: print('Download event => %s' % event.guid))        

    await browser.get("https://code.visualstudio.com/sha/download?build=stable&os=win32-x64-user")

    while True:
        await asyncio.sleep(0.2)  # Keep the event loop alive

if __name__ == '__main__':
    uc.loop().run_until_complete(crawl())      

I have been stuck with this problem for weeks. I want to be able to detect download start across all tabs opened currently or in the future. I would really appreciate help on this.

@abhash-rai
Copy link
Author

abhash-rai commented Nov 20, 2024

@abhash-rai I tried this and didn't work. It's funny because when I start the browser with uc.start and change the download path with set_download_path and cdp.browser.set_download_behavior and click on elements e ask to download by hand, it download on the path that I've asked but if I do with automation, I get the modal asking where to save. Any idea @ultrafunkamsterdam?

@JoaoSobhie I think you should disable 'Ask where to save each file before downloading' on chrome. Go to 'chrome://settings/?search=download' and you'll see the option towards the end. Disable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants