back to forum.

Topic: Automatically formatting/transforming invalid entries

Topic by
amdc

2011-08-03
05:23

Automatically formatting/transforming invalid entries

Hello Community :), I have 2 questions

1. Is there any way to edit invalid/false entries on Preview of transformation window?

2. Can Datacleaner transform entries into their correct format (in any ways possible) after being detected as false by a string pattern? how? :)

Reply by
kasper

2011-08-03
05:24
Hi amdc,

To your questions:

1) no
2) yes

More specifically, DC in it's current form only works with batches of data and it treats the source systems as read only, so direct editing is not possible.

But what you can do is this: You can set up filters that identify your invalid cases (eg. a string pattern match filter) and then apply transformers only to the failing records by binding your transformer to that filter outcome (use the "no filter outcome" button in the transformer tab).

I hope that it makes sense. You might also wanna check out the webcasts on the media page. Specifically the first webcast 'Introduction to analyzing, cleansing and filtering data' shows a somewhat similar scenario where transformations are applied to certain filter conditions.

Reply by
amdc

2011-08-03
05:23
Hi Kasper,

I have a follow-up question regarding this scenario, I am currently working on an xls file and im just wondering after making the necessary data cleaning to a certain column and writing to csv or datastore the values inside are not in their right places, for example lets say originally Column1 has data in this order (row1-5) 1234, 4321, 3214, 5553 now changes made to column2 (row1-5) became 1234, 5553, 4321, 3214 ,i can't tell if its just random but only the first value is the same, any ideas? cause if its possible i want it to display the data in the order it was originally at



Reply by
amdc

2011-08-03
05:24
typo "now changes made to column2"

should be "now changes made to column1"

Reply by
kasper

2011-08-04
05:20
Hi amdc,

It is true that DC does not guarantee sort order based on original input. This is because by dropping this requirement we can parallelize a lot of work in the DC engine, making it quite a lot faster than anything out there :) But of course it comes at a price - the result is not ordered as the input.

Internally at HI we have some mechanisms to sort the end-result, but it's not released yet. Instead you can apply a workaround, but that will unfortunately be at the cost of performance (don't know if that's an issue - depends on how big your data is)...

In the file conf.xml you can switch parallel execution off, by replacing this element:
<multithreaded-taskrunner max-threads="30" />
with this element:
<singlethreaded-taskrunner />
Enjoy!

Reply by
amdc

2011-08-04
05:20
Hi Kasper,

I really appreciate the replies :) pardon me cause here's another one...

can you point me to a website tutorial to learn some simple Java scripting to be used for DC "Java script" Transformer?

Cause i really have no knowledge about java scripting and I just need to learn it fast to get some results,.

with regards to your previous advice about binding transformer to the filter outcome, this is what im trying to work on right now :) but unfortunate that i cant quite get it right

here is a more descriptive overview of the issue:

I wanted to replace the Invalid or Null entries with a simple text line like "Incomplete, pending bla bla..." so currently i have String pattern matcher on filter to give out my Invalid entries and, now I am thinking that the correct transformer that can only provide a custom text line is the "Java Script" Transformer ?

Reply by
kasper

2011-08-04
05:20
Hi amdc,

Actually I have it on my TODO list to make a short webcast demo (and some documentation) of the JS transformer, but ... yea ... it's on the list :)

For now, I would recommend w3schools:
http://www.w3schools.com/jsref/default.asp

The pages in the menu under "JavaScript objects" are all relevant I guess, depending on your data types. The browser and DOM objects are irrelevant for DataCleaner.

Reply by
amdc

2011-08-04
05:20
Thanks Kasper for the quick reply :) btw just a quick follow-up , its only the Java transformer that can do this custom text lines? TY TY

Reply by
kasper

2011-08-04
05:20
There are also a few options in the "String manipulation" category of transformers, but those a limited to single operations like concatenation.

And then there is also the EL transformer, which can be used for concatenations like this:

Dear #{FIRST_NAME} #{LAST_NAME}

You need to be logged in to participate

In order to post your own comments on this topic, you need to be logged in.

Username:

Log in by clicking the login link at the top of the screen

 

Go back to forum.