back to forum.

Topic: Uber noob Needs help in Data Quality Application

Topic by
amdc

2011-08-01
05:25

Uber noob Needs help in Data Quality Application

hi all,

I am currently testing the DataCleaner 2.2 with an xls file, unfortunately I am a newbie at this even after reviewing previous tutorial versions for datacleaner 1.5 im still somewhat at a loss, would it be ok to disgust you with what im trying to work out in particular? or if someone can really hand me more documentations for the full functionality and how to's of the Datacleaner 2.2 i would really appreciate it

Reply by
kasper

2011-08-02
05:27
Hi amdc,

You don't need to ask for permission to discuss what you want to do, just write away :) This forum is here for just that.

Please describe what issues you are facing and what you would like to do, and let's see who catches the bait :)

Reply by
amdc

2011-08-02
05:35
Thank you for kind words,.I know datacleaner has the ability to transform data, and i just don't know how to utilize it properly,.

..so here it is, I want to change my date format to another and i can see that there is a transformer "Matching and Standardization -> Date Mask Matcher" . I tried choosing a particular date mask and upon Preview of transformation the table created gives me "False" values, obviously none of the dates where transformed to another date mask/format.

Am I doing it all wrong? :(

Reply by
kasper

2011-08-02
05:27
Hi amdc,

The "Date Mask Matcher" is intended to be used as a way to check that if certain date mask(s) are being used. It's not used to actually change the value. As you can see on the output columns of this transformer - the outputs are boolean values! You can use the transformer in combination with the boolean analyzer to get a matrix of which date masks occur.

If you want to transform to a new date mask, then you need to start by converting to "real" dates (eg. not just formatted string). Use the "Convert to date" transformer for this. Then you can extract certain parts from the date using either the "Extract date part" transformer or reference it through javascript transformer, like this:

return my_date.year + "-" + my_date.month;

Reply by
amdc

2011-08-02
22:05
Good God that makes sense, I cant say anything less :) Thanks!

Reply by
cygnus

2012-01-24
17:29
Hi,

Since this is also a rather newbie question I hope you agree this is the best place to post this query:

I'm trying to merge 2 customer data sets and clean them for our small family business (we have over 100,000+ records).

My first task I want to complete is address standardisation. I've bought the PAF file and I want to use this to replace the address fields with the data from this file based on a postcode match.

How can I do this with DC? I'd hoped to be able to add the PAF file as a dictionary in DC however I don't know how to then get DC to replace street, Post Town, County fields from the matched data in the PAF dictionary?

I'd be very grateful for some help or any suggestions on a better way to clean and standardise 100,000+ address records?

Reply by
kasper

2012-01-24
17:40
I dont know what a PAF file is, so it depends a bit on it's format. But assuming it is a regular CSV file or Excel spreadsheet or something, you could register it as a datastore and then use that datastore when performing a table lookup based on your records. Or the easier way: Use EasyDQ address cleansing, which is fully integrated and requires quite less work to make it work. But of course your PAF file is not used there.

PS: I dont consider this a newbie question, or that all newbie questions should be bundled together, so don't be shy to create a new topic ;-)

Reply by
cygnus

2012-01-24
19:01
Hi Kasper,

Thanks for that. Sorry the PAF file is in a .csv format and it's a perfected formatted list of all UK addresses bought from the Post Office.

Looking at the EasyDQ it would be a lot easier however for the number of records this would be extremely expensive so I'm going to give it a go using your suggestion and then maybe if necessary use EasyDQ to cleanup the really difficult or international addresses we have.

Thanks!

You need to be logged in to participate

In order to post your own comments on this topic, you need to be logged in.

Username:

Log in by clicking the login link at the top of the screen

 

Go back to forum.