Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soup.find fails to find Tableau data #58

Open
stepa8 opened this issue Mar 18, 2022 · 3 comments
Open

soup.find fails to find Tableau data #58

stepa8 opened this issue Mar 18, 2022 · 3 comments

Comments

@stepa8
Copy link

stepa8 commented Mar 18, 2022

Ran this on WSL on Windows 10 which is a flavor of ubuntu.

from tableauscraper import TableauScraper as TS

url = "https://public.tableau.com/app/profile/epidemiology.immunization.services.branch/viz/COVID-19DailyHighlights/DailyHighlights"
ts = TS()
ts.loads(url)

Then, we see this error:
python scrape_tableau.py
Traceback (most recent call last):
File "scrape_tableau.py", line 9, in
ts.loads(url)
File "/mnt/c/Users/stepa8/Projects/tableau-scraping/tab-env/lib/python3.8/site-packages/tableauscraper/TableauScraper.py", line 80, in loads
soup.find("textarea", {"id": "tsConfigContainer"}).text
AttributeError: 'NoneType' object has no attribute 'text'

It appears soup.find cannot find: "textarea", {"id": "tsConfigContainer"

Is there a workaround?

@xplreitr
Copy link

I was running into a similar problem and this issue sent me in the right direction.

#30

It seems like there is a URL other than the public facing URL . You have to open chrome tools and the network tab find the url that starts with https://public.tableau.com/views....

I tried looking up the one you were interested in and couldn't find the exact tableau worksheet, but the only one published by epidemiology.immunization.services.branch was this one https://public.tableau.com/app/profile/epidemiology.immunization.services.branch/viz/COVID-19DemographicsTEST_16498711218660/DailyCounts

And if you look in the network tab when it was loading, this URL popped up

https://public.tableau.com/views/COVID-19DemographicsTEST_16498711218660/DailyCounts

Which I just did a quick test and this URL seems to work. Someone else more knowledgeable might be able to explain the difference between the two URLs. But it might be helpful to put something in the documentation that the public facing URL is not exactly the URL needed to make this work

@martinolmos
Copy link

Hello, thank you for this amazing library.

I am facing a similar issue. I found the public.tableau.com/views url but is returning an empty DataFrame.
Here is the url: 'https://public.tableau.com/views/DB_FISCA_01/Fisca_DS_RankingPeliculas'

@martinolmos
Copy link

I tried going through the source code and the thing is that data['secondaryInfo'] is empty.

Here is my code, which I took from here:

import requests
from bs4 import BeautifulSoup
import json
import re

url = "https://public.tableau.com/views/DB_FISCA_01/Fisca_DS_RankingPeliculas"

r = requests.get(
    url,
    params= {
        ":display_static_image":"y",
        ":bootstrapWhenNotified":"true",
        ":embed":"true",
        ":language":"es-ES",
        ":embed":"y",
        ":showVizHome":"n",
        ":apiID":"host0"
    }
)

soup = BeautifulSoup(r.text, "html.parser")
tableauData = json.loads(soup.find("textarea",{"id": "tsConfigContainer"}).text)

dataUrl = f'https://public.tableau.com{tableauData["vizql_root"]}/bootstrapSession/sessions/{tableauData["sessionid"]}'


r = requests.post(dataUrl, data= {
    "sheet_id": tableauData["sheetId"],
})


dataReg = re.search('\d+;({.*})\d+;({.*})', r.text, re.MULTILINE)
info = json.loads(dataReg.group(1))
data = json.loads(dataReg.group(2))

And then print(data) returns {'secondaryInfo': {}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants