back to forum.

Topic: Phonetic Similarity finder - some things to wish for

Topic by
franklin

2011-11-24
04:24

Phonetic Similarity finder - some things to wish for

Hi all,

I have been looking at some of the webcasts who show how to use the Phonetic Similarity Finder extension. It is pretty smart, but what I find is an annoying bug (or known limitation?) that it only finds similar records based on single fields. But what if I have eg. both a Product name and a Product type that I want to find similar sounding products by. That is currently not possible, or is it?

Reply by
kasper

2011-11-25
04:21
Hi franklin,

Agreed, the Phonetic Similarity finder is far from perfect. In fact, that is why it is on the extensionswap and not in the main distribution of DC. It's there as an "invitation to help improve it".

But that said, you can actually apply a little hack that will at least improve your results: Concatenate your fields before doing similarity finding. If you have field A, B and C, then simply concatenate them (using the Concatenator transformer) and do a similarity check on the resulting transformed column.

Hope it works out.

If not, then actually the next version of DC (2.4) will include a much nicer deduplication analyzer! Hurray!

You need to be logged in to participate

In order to post your own comments on this topic, you need to be logged in.

Username:

Log in by clicking the login link at the top of the screen

 

Go back to forum.