Topic: DQ monitoring server?
DQ monitoring server?
Dear DC users and developers,
At Human Inference we've been thinking of ways to build upon the foundation of datacleaner for making it a more complete data quality solution. An idea have been shaping that we want to ask you about your opinions about and maybe use those opinions to make some sort of decision on whether or not we're going to build it!
The idea is to have a server-side counterpart for the DataCleaner application. The purpose of the server-app would be to be able to schedule jobs, gather and persist results and show trends over time. I would call this functionality "DQ monitoring". The current DC app would be extended with a way to upload jobs to the server so that you can still work with your jobs in the regular DataCleaner application, but for enterprise deployment you would probably run them in batches on the server.
In terms of reporting we have in mind that you should of course be able to see the results for a single run, but you should ALSO be able to see the evolution of your profiling metrics. For example you might be interested in seeing trends in the patterns found or in the metrics available in the various analyzers.
Another possible feature would be to have email bursting built-in, so that in case you have a threshold value for some particular metric, you could recieve email alerts if your metrics no longer lives up to your goals.
What is your oppinion on such a DQ monitoring application? Do you think it would fit in nicely with DataCleaner? Or would it not add a lot of value?
At Human Inference we've been thinking of ways to build upon the foundation of datacleaner for making it a more complete data quality solution. An idea have been shaping that we want to ask you about your opinions about and maybe use those opinions to make some sort of decision on whether or not we're going to build it!
The idea is to have a server-side counterpart for the DataCleaner application. The purpose of the server-app would be to be able to schedule jobs, gather and persist results and show trends over time. I would call this functionality "DQ monitoring". The current DC app would be extended with a way to upload jobs to the server so that you can still work with your jobs in the regular DataCleaner application, but for enterprise deployment you would probably run them in batches on the server.
In terms of reporting we have in mind that you should of course be able to see the results for a single run, but you should ALSO be able to see the evolution of your profiling metrics. For example you might be interested in seeing trends in the patterns found or in the metrics available in the various analyzers.
Another possible feature would be to have email bursting built-in, so that in case you have a threshold value for some particular metric, you could recieve email alerts if your metrics no longer lives up to your goals.
What is your oppinion on such a DQ monitoring application? Do you think it would fit in nicely with DataCleaner? Or would it not add a lot of value?
kasper,
not sure if you guys have made a decision on this. but i would definitely love to have something like this in my tool list.
-tach4
not sure if you guys have made a decision on this. but i would definitely love to have something like this in my tool list.
-tach4
I also think this would be an excellent addition to DataCleaner.
Sounds reasonable, but I hope that with this server-enhancements still the client will be available.
The evolution and/or trend reporting should be also available in the client version. This would be very useful.
Christian
The evolution and/or trend reporting should be also available in the client version. This would be very useful.
Christian
It's necesary the automatization of monitoring of the data quality, because in some cases the information can change in any moment and depending the customer needs to check this quality in the data.
On a related note, there's a discussion on the DataCleaner-dev mailing list about this ... Take a look here, and feel free to join the conversation:
http://groups.google.com/group/datacleaner-dev/browse_thread/thread/fcb16c4f86f482d2
http://groups.google.com/group/datacleaner-dev/browse_thread/thread/fcb16c4f86f482d2
Happy to say that this work is now going on :) You can find it in the 3.0 branch of DataCleaner's source:
http://eobjects.org/svn/DataCleaner/branches/3.0-monitor/
Current situation is that we support a timeline view, manual building of repository (but there is an example) and drill-to-details from the timeline view, which results in a single (historic) profiling result. Pretty neat! Will try and blog about it soon.
http://eobjects.org/svn/DataCleaner/branches/3.0-monitor/
Current situation is that we support a timeline view, manual building of repository (but there is an example) and drill-to-details from the timeline view, which results in a single (historic) profiling result. Pretty neat! Will try and blog about it soon.
Log in by clicking the login link at the top of the screen
Go back to forum.


