back to forum.

Topic: datacleaner plugin

Topic by
tech2sh

2011-01-05
01:28

datacleaner plugin

Hi,
Is there any DataCleaner plugin available for Pentaho Kettle PDI?

-Thx

Reply by
kasper

2011-01-05
10:11
Hi tech2sh,

Currently no, there isn't any plugin. BUT we have actually recently been in contact with Pentaho and have been discussing the possibility of making a plugin.

So let me answer your question with a question: Which features would you like to see in a DataCleaner plugin for Kettle?

Reply by
tech2sh

2011-01-05
17:48
Thanks Kasper,
The flow I was thinking,
1) apply datacleaner data profile on input source and based in kettle
2)Based on the data profile output apply appropriate conditions/data rules in transformation with kettle
3) at the end once data is transformed successfully, apply datacleaner validate function to make sure data is cleaned and data rules are applied successfully in step 2

These are just my thoughts.

-Thx

Reply by
kasper

2011-01-06
08:11
Well it sounds interesting and I like your suggestions. Now my only thought is that I have a hard time figuring out how to chain together something after profiling in a batch environment like Kettle (or any ETL tool basically). Maybe I'm just being stupid so please tell me whats right :)

Here's an example:

I want to apply the Pattern Finder to my column which contains names. Let's say that after a profile my results are like this:

Pattern          | Count
Aaaaa Aaaaaaa | 3210
Aaaa aaa Aaaaaaa | 1234
AAAAAA AAAAAAAAA | 32
Aaaaaaaaaa | 6
?????? | 2

How should this information then be used in Kettle? Should Kettle have a "match pattern(s)" transformation and use this to filter out certain rows? This is my best bet. But still this then will be a 2 step thing because you need to perform the profiling first before you have the patterns that you want to apply.

You need to be logged in to participate

In order to post your own comments on this topic, you need to be logged in.

Username:

Log in by clicking the login link at the top of the screen

 

Go back to forum.