Friday, July 06, 2007

Talend - another open source ETL tool

Bloor's Philip Howard writes at IT-Director about Talend Open Studio - unusual among ETL solutions in that it is a code generator (of Java, SQL, Perl) rather than an "engine" type of product.

As well as the usual drag-n-drop transformation GUI, this apparently supports business process modelling - which gives Talend a feature that many "real" ETL/EAI tools don't have. There's also support for using a server grid to parallelise processing, and there's an "on demand" SaaS offering.

Version 2.0 is now available, as well as a 2.1 release candidate which is said to add features including:

  • further optimizations for performance increase
  • support of new databases (including bulk load)
  • transaction management (connection sharing, commit and rollback)
  • Slowly Changing Dimensions support
  • MOM (Message Oriented Middleware) support for real-time integration jobs
  • fuzzy logic data matching (using the Levenshtein and metaphone algorithms)
  • normalization, denormalization and flow merge
  • support of SSH remote connections
  • support of PGP file decryption (through GPG binary)
  • reinforced support of the XML standard: DTD validation, XSD, XSLT transformation, significant improvements in hierarchical XML file generation, support of the XMLRPC Web Services protocol…
  • improvements to the tMap component, to support input filters and new joins types such (the cartesian product, first match, last match…)

Sounds like it might be worth a closer look...

1 comment:

Garrett said...

Talend is a great tool!

My company uses it for our new ETL. It makes the simple things simple...

It can be daunting to learn how to use it, I think it took me a week or three to really get the hang of things.

Its not the best tool in the box for every ETL problem, but Weve found that with a mix of "pure" perl and Talend jobs we can get things done much faster than if they were all done by hand.