Welcome to DEiXTo!
DEiXTo (or ΔEiXTo) is a powerful web data extraction tool that is based on the W3C Document Object Model (DOM). It allows users to create highly accurate "extraction rules" (wrappers) that describe what pieces of data to scrape from a website. DEiXTo consists of three separate components:
- GUI DEiXTo, an MS Windows™ application implementing a friendly graphical user interface that is used to manage extraction rules (build, test, fine-tune, save and modify).
- DEiXToBot, a Perl module implementing a flexible and efficient Mechanize agent (essentially a browser emulator) capable of extracting data of interest using GUI DEiXTo generated patterns. It contains best of breed Perl technology and allows extensive customization. Thus, it facilitates tailor-made solutions.
- Command Line Executor, a stand-alone, DEiXToBot-based, cross-platform utility that can massively apply an extraction rule on multiple target pages and produce structured output in a variety of formats.
DEiXTo can contend with a wide range of websites with high precision and recall. It provides the user with an arsenal of features aiming at the construction of well-engineered extraction rules. Wrappers built with GUI DEiXTo can be scheduled to run automatically providing automated access to resources of interest and saving users a lot of time, energy and repetitive effort.
We recently launched DEiXTo's Blog in order to keep you apprised about its progress and discuss various interesting use cases and topics around web scraping. Moreover, DEiXTo Wiki is now available on Wikispaces and you are very welcome to join our brand new DEiXTo Discussion Area! Please feel free to post any questions and comments and help us build a collaborative help documentation system!
You can check out some happy DEiXTo users here. To our knowledge, DEiXTo is being used by many people as well as some companies and organizations all over the world. Indicatively, in the last 12 months DEiXTo's website had about 7.300 visits from 108 different countries!
DEiXTo is developed by Kostas Ntonas and Fotis Kokkoras.
Latest News
- We are glad to host a Greek Stemmer
- DEiXTo powers a federated, open source code search engine called OCEAN.
- Spatial data collected with DEiXTo and dispayed on Google Maps. Nice app! Don't you think? Check the details here.
- Still confused with DEiXTo components? Try this: DEiXTo Components Clarified!
- DEiXTo feeds smartphones with legacy data!
- Cooperating DEiXTo agents.
- MOpiS - Multiple Opinion Summarizer Demo
- DEiXTo was cited at the Symposium "Europeana in Greece", that took place on 19 October 2010 in Athens, Greece, as well as at the 19th Hellenic Academic Libraries Conference (3-5 November 2010, Athens). The reason for this is that we have helped some important Greek digital libraries (through scraping their websites and repurposing their data) to add their cultural content in the Hellenic Aggregator and the European Digital Library!
ΔEiXTo is an acronym for Data Extraction Tool.
First of all, Δ is the equivalent of D in Greek. Now, you are probably wondering what is this “i” character all about.
Well, in Greek “ΔEIXTO” (pron. dechto)
is the imperative form of “point at” which is what the DEiXTo user does inside
a browser window when he starts building a DEiXTo extraction rule.
Now you know... ;-)
