Topic: datacleaner plugin
datacleaner plugin
Hi,
Is there any DataCleaner plugin available for Pentaho Kettle PDI?
-Thx
Is there any DataCleaner plugin available for Pentaho Kettle PDI?
-Thx
Hi tech2sh,
Currently no, there isn't any plugin. BUT we have actually recently been in contact with Pentaho and have been discussing the possibility of making a plugin.
So let me answer your question with a question: Which features would you like to see in a DataCleaner plugin for Kettle?
Currently no, there isn't any plugin. BUT we have actually recently been in contact with Pentaho and have been discussing the possibility of making a plugin.
So let me answer your question with a question: Which features would you like to see in a DataCleaner plugin for Kettle?
Thanks Kasper,
The flow I was thinking,
1) apply datacleaner data profile on input source and based in kettle
2)Based on the data profile output apply appropriate conditions/data rules in transformation with kettle
3) at the end once data is transformed successfully, apply datacleaner validate function to make sure data is cleaned and data rules are applied successfully in step 2
These are just my thoughts.
-Thx
The flow I was thinking,
1) apply datacleaner data profile on input source and based in kettle
2)Based on the data profile output apply appropriate conditions/data rules in transformation with kettle
3) at the end once data is transformed successfully, apply datacleaner validate function to make sure data is cleaned and data rules are applied successfully in step 2
These are just my thoughts.
-Thx
Well it sounds interesting and I like your suggestions. Now my only thought is that I have a hard time figuring out how to chain together something after profiling in a batch environment like Kettle (or any ETL tool basically). Maybe I'm just being stupid so please tell me whats right :)
Here's an example:
I want to apply the Pattern Finder to my column which contains names. Let's say that after a profile my results are like this:
How should this information then be used in Kettle? Should Kettle have a "match pattern(s)" transformation and use this to filter out certain rows? This is my best bet. But still this then will be a 2 step thing because you need to perform the profiling first before you have the patterns that you want to apply.
Here's an example:
I want to apply the Pattern Finder to my column which contains names. Let's say that after a profile my results are like this:
Pattern | Count
Aaaaa Aaaaaaa | 3210
Aaaa aaa Aaaaaaa | 1234
AAAAAA AAAAAAAAA | 32
Aaaaaaaaaa | 6
?????? | 2
How should this information then be used in Kettle? Should Kettle have a "match pattern(s)" transformation and use this to filter out certain rows? This is my best bet. But still this then will be a 2 step thing because you need to perform the profiling first before you have the patterns that you want to apply.
Log in by clicking the login link at the top of the screen
Go back to forum.


