Sphider
a PHP spider and search engine
The roots of Sphider go back to 2005 when Ando Saabas made the first release. Releases continued to be made into 2009. Since then, with a single security fix in 2013, Sphider has gone unsupported. There have been a number of forks in the original code since that time. Sphider-Pro and Sphider-Plus are the most notable examples, but both are paid versions. Beginning in 2015, worldspaceflight.com began making improvements to the original Sphider and making the newer versions available without charge. The current version is 5.5.0 (Lite 2.6.0).
Sphider is made available without any warranty, although support is provided the best we can, via the forum. Other users are invited to leaves tips, suggestions, or lend aid as they see fit.
Features
Operating system support
- Linux
- Windows
- Mac
PHP Support
- PHP 7+
PHP Extensions
- curl - Required
- iconv - Required
- imagick - Recommended (for full version)
- mbstring - Required
- mysqli - Required
- mysqlnd - Required
Database support
- MySQL (MySQLi/MySQLnd)
- MariaDB (MySQLi/MySQLnd)
Spidering and indexing
- Performs full text indexing
- Can index both static and dynamic pages
- Finds links in href, frame, area and meta tags, and can also follow links given in javascript as strings via window.location and window.open
- Respects robots.txt protocol, and nofollow and noindex tags
- Follows server side redirections
- Allows spidering to be limited by depth (ie maximum number of clicks from the starting page), by (sub)domain or by directory
- Allows spidering only the urls matching (or not matching) certain keywords or regular expressions
- Supports indexing of pdf, doc, docx, odt, ppt, and xls files (using external binaries for file conversion)
- Ability to exclude common words from being indexed (multiple languages supported)
- Word stemming for English and selected other languages (searching for "run" finds "running", "runs" etc)
- Can create sitemaps
- Can index images (except SphiderLite)
- Can index RSS feeds. (except SphiderLite)
- Can create page link reports (except SphiderLite)
Searching
- Supports AND, OR and phrase searches
- Supports excluding words (by putting a '-' in front of a word, any page including the word will be omitted from the results)
- Supports wildcard (*) searches
- Option to add and group sites into categories
- Possibility to limit searching to a given category and its subcategories
- Possibility of searcing in a specified domain only
- "Did you mean" search suggestion on mistyped queries
- Context-sensitive auto-completion on search terms (a la Google Suggest)
- Can search for images, (except SphiderLite)
- Can search RSS feeds. (except SphiderLite)
Administering
- Includes a sophisticated web based administration interface
- Supports indexing via a web interface as well as from commandline - easy to set up cron jobs
- Comprehensive site and search statistics
- Simple template system - easy to integrate into a site