Topic: [SOLVED] Profiling Fails
[SOLVED] Profiling Fails
First of all thanks for this impressive sw! I'm really interested in using DataCleaner to profile a large amount of databases but I'm experiencing some troubles. When trying to query large tables (about 1 million rows) the profiling fails returning a jdbc error. Is this error related to the big number of rows or there's some parameter to be tuned in jdbc as it happens in odbc?
Hi MFontana.
It's hard to say from just reading your description what goes wrong... Can you perhaps paste in some more details of the error? Perhaps the whole stack-trace? And furthermore, can you tell me if the error occurs in the console or in an error dialog?
/Kasper
It's hard to say from just reading your description what goes wrong... Can you perhaps paste in some more details of the error? Perhaps the whole stack-trace? And furthermore, can you tell me if the error occurs in the console or in an error dialog?
/Kasper
There will be some improvements to performance in the 1.1 release of DataCleaner, so stay tuned. In the meanwhile, if what you're experiencing OutOfMemoryError's (which is the only type of error I can think of that matches your description), try and increase the size of your heap space. This is done by opening your start-datacleaner.cmd/.sh file and changing the argument "-Xmx1024m" (equivalent to 1GB of heap space) to something bigger. If you have loads of memory available I'd recommend you beef it up to a couple of gigabytes if you want to do a really large profiling.
Thank you, I've tried increasing heap space and everything is working fine.
Great, I'm really glad to hear that! :)
Even though this fixes your problem for now I've gone ahead and made an issue about this...
Even though the JDBC spec. has specific ways of keeping memory consumption low (by streaming data and setting a fetch size for how many rows to keep in memory) I've experienced that some database vendors do not live up to this part of the specification. So we might try and solve the problem using other methods, for example by rewriting the queries to retrieve more than one resultset of smaller sizes instead of one big query.
You can follow the issue progress if you're interested, here: #134
If we come up with a good solution you can be certain that I'll also write a lot about it on my blog ;-)
Even though this fixes your problem for now I've gone ahead and made an issue about this...
Even though the JDBC spec. has specific ways of keeping memory consumption low (by streaming data and setting a fetch size for how many rows to keep in memory) I've experienced that some database vendors do not live up to this part of the specification. So we might try and solve the problem using other methods, for example by rewriting the queries to retrieve more than one resultset of smaller sizes instead of one big query.
You can follow the issue progress if you're interested, here: #134
If we come up with a good solution you can be certain that I'll also write a lot about it on my blog ;-)
Log in by clicking the login link at the top of the screen
Go back to forum.


