How Elvis Works (At a Glance)
Procedure RunElvis()
Begin
Read seed URLs from srv/urls.txt
For each URL:
Fetch job listings
Extract company and location using SED/AWK
Deduplicate and validate results
Write output to home/calllist.txt
If --append-history is set:
Append new companies to history
End If
End
End Procedure
flowchart TD
A[Start] --> B[Read seed URLs]
B --> C[Fetch job listings]
C --> D[Extract company/location]
D --> E[Deduplicate & validate]
E --> F[Write calllist.txt]
F --> G{Append history?}
G -- Yes --> H[Update company_history.txt]
G -- No --> I[Done]
Pseudocode: Validating Output
Procedure ValidateCallList()
Begin
If home/calllist.txt does not exist or is empty then
Log error and exit
End If
For each row in calllist.txt:
Check format and required fields
If invalid, log error
End For
If all rows valid then
Print "Validation successful"
Else
Print "Validation failed"
End If
End
End Procedure
Mermaid: Elvis Main Pipeline
Mermaid: Elvis System Architecture (C4 Container Diagram)
C4Context
Person(user, "User", "Runs Elvis and reviews call lists")
System(elvis, "Elvis", "POSIX shell web scraper")
Container(bin, "bin/elvis.sh", "Shell Script", "Entrypoint orchestrator")
Container(dataInput, "lib/data_input.sh", "Shell Script", "Fetches and extracts job data")
Container(processor, "lib/processor.sh", "Shell Script", "Normalizes and deduplicates")
Container(validator, "lib/validate_calllist.sh", "Shell Script", "Validates output")
ContainerDb(output, "home/calllist.txt", "Text File", "Final call list output")
Rel(user, elvis, "Runs")
Rel(elvis, bin, "Orchestrates")
Rel(bin, dataInput, "Invokes")
Rel(dataInput, processor, "Sends extracted data")
Rel(processor, validator, "Sends processed data")
Rel(validator, output, "Writes validated call list")
Elvis is a POSIX shell-based web scraper that generates daily call lists of Australian companies from job boards (e.g., Seek). It is built for reliability, transparency, and easy customization using POSIX utilities only.
Onboarding: Choose Your Path
Start here! Use the flowchart below to find the best onboarding for your needs.
flowchart TD
A[Start Here] --> B{What do you want to do?}
B --> C[Just use Elvis to get call lists]
B --> D[Understand how Elvis works]
B --> E[Contribute code or docs]
C --> F[Non-Technical Onboarding]
D --> G[Technical Onboarding]
E --> H[Contributor Onboarding]
- Non-Technical Onboarding: Quick start for using Elvis.
- Technical Onboarding: Learn the architecture and internals.
- Contributor Onboarding: Start contributing code or docs.
See the Onboarding Guide for step-by-step help.
Glossary (Quick Reference)
Elvis Project Concepts (Mindmap)
mindmap
root((Elvis))
Usage
"Call List"
"Seed URL"
"User Agent"
Architecture
"POSIX Shell"
"Modular Scripts"
"Config in etc/elvisrc"
Compliance
"robots.txt"
"Ethical scraping"
Processing
"Deduplication"
"Validation"
"Parser"
- Call List: The output file with extracted job leads.
- Seed URL: A starting web address for scraping.
- Parser: A script that extracts information from web pages.
- Deduplication: Removing duplicate entries from results.
- POSIX Shell: A standard command-line environment for Unix systems.
- User Agent: A string that identifies the tool to websites.
- robots.txt: A file that tells scrapers whatβs allowed.
- Compliance: Following legal and ethical scraping rules.
See the full Glossary in the Wiki.
home/calllist.txt.
Add a screenshot or animated GIF at
assets/demo.pngshowing a typical run orhome/calllist.txtsample. Keep images small for mobile readability.