Topic: Phonetic similarity finder Analyser
Phonetic similarity finder Analyser
Hi,
I had been exploring this tool.. were I came across this analyser Phonetic similarity finder, as per the explanation provided it displays all the similar sounding dataset results working on a column level.
i had been testing this for GUID, i got the o/p as
similar 2("8a5ef40a-ac9..)
the final/detailed result for this column is 8a5ef40a-ac92-452b-9b7f-657c5186f064 and this
96ffa8c3-6345-4b0e-9e83-1b416fb1f26a
which in no way are phonetic.. could u please expalin the actual function of this anayser?? as this is confusing now
Thanks & Regards
Rsegan
I had been exploring this tool.. were I came across this analyser Phonetic similarity finder, as per the explanation provided it displays all the similar sounding dataset results working on a column level.
i had been testing this for GUID, i got the o/p as
similar 2("8a5ef40a-ac9..)
the final/detailed result for this column is 8a5ef40a-ac92-452b-9b7f-657c5186f064 and this
96ffa8c3-6345-4b0e-9e83-1b416fb1f26a
which in no way are phonetic.. could u please expalin the actual function of this anayser?? as this is confusing now
Thanks & Regards
Rsegan
You should apply this analyzer to columns that are actually phonetically comparable, typically things like names or sentences.
Since this is an extension to DC, I propose that you post your evaluation and questions at the extension page for it: http://datacleaner.eobjects.org/extension/Phonetic-similarity-finder/
Since this is an extension to DC, I propose that you post your evaluation and questions at the extension page for it: http://datacleaner.eobjects.org/extension/Phonetic-similarity-finder/
Thanks a lot kasper..
one last thing is it that Phonetic analyser works or compares only about first 11 or 12 characters and not the complete string... please let me know in case my assumption is wrong..
since i applied the logig to a file name column which had starting 16 characters of the name same!1 n rest was a bit diffrent e.g Myreport-2011-ras-games
Myreport-2011-ras-studies
and so on..
in this case itz considering all as phonetic and the same.. kindly guide about this
Thanks in advance
one last thing is it that Phonetic analyser works or compares only about first 11 or 12 characters and not the complete string... please let me know in case my assumption is wrong..
since i applied the logig to a file name column which had starting 16 characters of the name same!1 n rest was a bit diffrent e.g Myreport-2011-ras-games
Myreport-2011-ras-studies
and so on..
in this case itz considering all as phonetic and the same.. kindly guide about this
Thanks in advance
well phonetic matching is obviously not the same as exact matching. And in your case I guess you're seeing some false positives, which means that the criteria is too loose.
To be honest - this is one of the key reasons that the Phonetic Similarity Finder is not included in the main distribution of DataCleaner - because it is still at the draft stage and because it is also limited at comparing only values in a single column. Several companies, including Human Inference, provide commercial matching engines that are much more advanced, scalable, configurable and provides better results.
To be honest - this is one of the key reasons that the Phonetic Similarity Finder is not included in the main distribution of DataCleaner - because it is still at the draft stage and because it is also limited at comparing only values in a single column. Several companies, including Human Inference, provide commercial matching engines that are much more advanced, scalable, configurable and provides better results.
Oh and I should add that also one of the reasons to make the phonetic similarity finder available is to invite people to help improving a feature that started just as a funny little side-project. So you're invited to improve it, because it is actually a pretty simple piece of code (besides the actual matching algorhitms, but those are provided separately by apache commons-codec).
Log in by clicking the login link at the top of the screen
Go back to forum.


