Linky is a another LinkedIn scraper. Inspired by vysecurity and his LinkedInt project.
Currently, this method of extracting data from LinkedIn is limited to 1000 users at a time. So, Linky's HTML output has a small table at the bottom of the page which calculates the top 5 most common occupations that occur. This way, if the company has a weird naming scheme for devs, then Linky should be able to spot it and report it back. With these new found data points, the --keywords
flag can be used to attempt to filter the output.
This is no longer maintained. Afaik, the validation method via o365 has been patched. I also removed the blog post a while ago detailing this, so the cookie.txt
referenced in this README is the li_at
cookie on LinkedIn.
pip3 -r install requirements.txt
usage: linky.py [-h] [-c] [-i] [-k] [-d] [-o] [-f] [-v] [-a] [-t]
[--valid-emails-only] [--verbose] [--debug]
[--list-email-schemes | --version]
Yet another LinkedIn scraper.
optional arguments:
-h, --help show this help message and exit
-c , --cookie Cookie to authenticate to LinkedIn with [li_at]
-i , --company-id Company ID number
-k , --keyword Keyword for searches
-d , --domain Company domain name
-o , --output File to output to: Writes CSV, JSON and HTML.
-f , --format Format for email addresses
-v , --validate Validate email addresses: O365/Hunter API
-a , --api API Key for Hunter API
-t , --threads Amount of threads to use [default 5]
--valid-emails-only When you literally only want a txt of valid emails.
--verbose Verbosity of the output
--debug Enable debugging, will spam.
--list-email-schemes List available email schemes
--version Print current version
Example: python3 linky.py --cookie cookie.txt --company-id 1441 --domain
google.com --output google_employees --format 'firstname.surname'
python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname'
python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname' --keyword developer
Run linky.py --list-email-schemes
to see all current formats:
firstname.surname:john.doe
firstnamesurname:johndoe
f.surname:j.doe
fsurname:jdoe
surname.firstname:doe.john
surnamefirstname:doejohn
s.firstname:d.john
sfirstname:djohn
firstname.msurname:john.jdoe
They can all be referenced in --format
, E.G:
f.surname: --format f.surname
By default, Linky will count the occurence of job roles and write it out to html. But, it will also do so with a standard json file. The structure is as seen below:
{
"Software Developer": 24,
"Systems Developer": 14,
"Senior Software Developer": 11,
"Project Manager": 10,
"System Developer": 9,
"Cyber Security Consultant": 7,
"Project Developer": 7,
"Programme Manager": 6,
"Software Architect": 6,
"Development Manager": 6
}
- Run once the gain the initial data:
python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname'
-
Find the job role occurence
cat job_role_count.json|jq
-
With the roles identified, use the keyword feature:
python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname' --keyword developer
The --valid-emails-only
flag will perform the same level of enumeration. But, it will only output validated emails to a txt file. This also assumes o365
validation.
python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname' --keyword developer --valid-emails-only
From this command, a txt file will be created with nothing but emails that were found to be valid via o365.
This is basically the TL;DR version of Linky.
Happy Stalking.