Skip to content

harshsinghmp/linky

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Linky

Release Software License GitHub issues

Yet another LinkedIn Scraper...

Linky is a another LinkedIn scraper. Inspired by vysecurity and his LinkedInt project.

Currently, this method of extracting data from LinkedIn is limited to 1000 users at a time. So, Linky's HTML output has a small table at the bottom of the page which calculates the top 5 most common occupations that occur. This way, if the company has a weird naming scheme for devs, then Linky should be able to spot it and report it back. With these new found data points, the --keywords flag can be used to attempt to filter the output.


Note

This is no longer maintained. Afaik, the validation method via o365 has been patched. I also removed the blog post a while ago detailing this, so the cookie.txt referenced in this README is the li_at cookie on LinkedIn.


Installing

pip3 -r install requirements.txt

Help Page

usage: linky.py [-h] [-c] [-i] [-k] [-d] [-o] [-f] [-v] [-a] [-t]
                [--valid-emails-only] [--verbose] [--debug]
                [--list-email-schemes | --version]

Yet another LinkedIn scraper.

optional arguments:
  -h, --help            show this help message and exit
  -c , --cookie         Cookie to authenticate to LinkedIn with [li_at]
  -i , --company-id     Company ID number
  -k , --keyword        Keyword for searches
  -d , --domain         Company domain name
  -o , --output         File to output to: Writes CSV, JSON and HTML.
  -f , --format         Format for email addresses
  -v , --validate       Validate email addresses: O365/Hunter API
  -a , --api            API Key for Hunter API
  -t , --threads        Amount of threads to use [default 5]
  --valid-emails-only   When you literally only want a txt of valid emails.
  --verbose             Verbosity of the output
  --debug               Enable debugging, will spam.
  --list-email-schemes  List available email schemes
  --version             Print current version

Example: python3 linky.py --cookie cookie.txt --company-id 1441 --domain
google.com --output google_employees --format 'firstname.surname'

Usage

Get Employees

python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname'

Get Employees with keyword

python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname' --keyword developer

Supported email formats

Run linky.py --list-email-schemes to see all current formats:

firstname.surname:john.doe
firstnamesurname:johndoe
f.surname:j.doe
fsurname:jdoe
surname.firstname:doe.john
surnamefirstname:doejohn
s.firstname:d.john
sfirstname:djohn
firstname.msurname:john.jdoe

They can all be referenced in --format, E.G:

f.surname: --format f.surname

Job Role Count

By default, Linky will count the occurence of job roles and write it out to html. But, it will also do so with a standard json file. The structure is as seen below:

{
  "Software Developer": 24,
  "Systems Developer": 14,
  "Senior Software Developer": 11,
  "Project Manager": 10,
  "System Developer": 9,
  "Cyber Security Consultant": 7,
  "Project Developer": 7,
  "Programme Manager": 6,
  "Software Architect": 6,
  "Development Manager": 6
}

Efficient usage

  1. Run once the gain the initial data:

python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname'

  1. Find the job role occurence

    cat job_role_count.json|jq

  2. With the roles identified, use the keyword feature:

python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname' --keyword developer

Only print a list of validated email addresses

The --valid-emails-only flag will perform the same level of enumeration. But, it will only output validated emails to a txt file. This also assumes o365 validation.

python3 --cookie cookie.txt --company-id 1441 --domain google.com --output google_employees --format 'firstname.surname' --keyword developer --valid-emails-only

From this command, a txt file will be created with nothing but emails that were found to be valid via o365.

This is basically the TL;DR version of Linky.

Happy Stalking.

About

Yet Another LInkedIn Scraper...

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.9%
  • Other 0.1%