Milestone Plan

This is, as always, subject to change.

Milestone Release date
0.1.0 Cypher
March, 03, 2010 First running release of ex-crawler server
0.1.5 Agent Brown
Mai, 12, 2010
Image crawler, plugin framework, multiple databases, scheduler
0.2.0 Agent Fox
Q3 2010 crawling for: Pdf, word and other office documents, , Re-crawling indexed pages and images worker on time scheduler
0.3.0 Morpheus
Q4 2010 Fist beta release! Ftp and some other protocols, first release for real productive use
0.4.0 Tank
Q1 2011 Database and server load balancing and coordination. Big Tables, more MapReduce
1.0.0 Trinity
2011 First stable release!
1.1.0 Oracle
2011 own webserver for configuration and monitoring


Detailed roadmap:

Ex-crawler server 0.1.7

# focus on server and server protocol
# server security
+ working (basic!) interface for distributed crawling based on JADIF
+ server plugin interface

Ex-crawler server 0.1.8

# focus on re-crawling archived pages and images
(depending on config settings, website setting and priority / relevancy of page)
# Basic ExRank based on ~50 criterions, like linkcount, domain length, text quality and uniquity

Ex-crawler server 0.1.9

# focus on doc, docx, xls and xml
+ css crawler and downloader (basic)
+ css crawling plugins interface

