Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape 'Research Tags' for professor profiles #11

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

iniyanijoseph
Copy link

Capture information about what a professor's research topics are if they exist and UTD's page about that research subject, as pictured below, as part of each profile.
image
I am attaching the profiles data from the scraping.
Profiles.json

@jpahm jpahm self-requested a review October 22, 2023 00:18
@iniyanijoseph
Copy link
Author

A slight addition, the education scraping part works pretty well, but it slows it down significantly because the way they formatted the content on the website is pretty terrible. I had to use the dom module to request the raw html under that node and then regex out the relevant information. From what I can tell its the most straightforward way to get the data, but its not very fast.
Profiles.json
Like before, I have added the JSON file, so it doesn't need to be run again
image

@jpahm
Copy link
Contributor

jpahm commented Oct 26, 2023

@iniyanijoseph This looks good, just do me a favor and make a PR on nebula-api that adds proper doc and schema changes for these new professor properties.

@jpahm
Copy link
Contributor

jpahm commented Oct 28, 2023

Edit -- This comment was meant to be on a different PR, my bad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants