Automated tool for scraping job postings into a
JobFunnel requires Python 3.8 or later.
pip install git+https://github.com/PaulMcInnis/JobFunnel.git
By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets.
You can search for jobs with YAML configuration files or by passing command arguments.
Download the demo settings.yaml by running the below command:
wget https://git.io/JUWeP -O my_settings.yaml
It is recommended to provide as few search keywords as possible (i.e.
JobFunnel currently supports
funnel with your settings YAML to populate your master CSV file with jobs from available providers:
funnel load -s my_settings.yaml
Open the master CSV file and update the per-job
offer to reflect your progression on the job.
delete to remove a job from this search. You can review 'blocked' jobs within your
Writing your own Scrapers
If you have a job website you'd like to write a scraper for, you are welcome to implement it, Review the Base Scraper for implementation details.
Bypass a frustrating user experience looking for remote work by setting the search parameter
remoteness to match your desired level, i.e.
Adding Support for X Language / Job Website
JobFunnel supports scraping jobs from the same job website across locales & domains. If you are interested in adding support, you may only need to define session headers and domain strings, Review the Base Scraper for further implementation details.
Filter undesired companies by adding them to your
company_block_list in your YAML or pass them by command line as
Job Age Filter
You can configure the maximum age of scraped listings (in days) by configuring
Reviewing Jobs in Terminal
You can review the job list in the command line:
column -s, -t < master_list.csv | less -#2 -N -S
Respectfully scrape your job posts with our built-in delaying algorithms.
To better understand how to configure delaying, check out this Jupyter Notebook which breaks down the algorithm step by step with code and visualizations.
Recovering Lost Data
JobFunnel can re-build your master CSV from your
cache_folder where all the historic scrape data is located:
Running by CLI
You can run JobFunnel using CLI only, review the command structure via:
funnel inline -h
JobFunnel does not solve CAPTCHA. If, while scraping, you receive a
Unable to extract jobs from initial search result page:\ error.
Then open that url on your browser and solve the CAPTCHA manually.