Translate

Sunday, July 19, 2009

Business Intelligence and Statistical Error



















Surface Response Model: From a Bull/Cow Fertility Test
(in my Advanced Design of Experiments Class)

Statisticians often study errors of Type I to refer to the probability of rejecting the null hypothesis when it is in fact true and Type II when accepting that null hypothesis when it is false. Beyond that point, systematic error can occur in general when wrong or inaccurate information is entered into a database and subsequently propagated to various other areas of information systems replication and reporting. While the most common type of data error is inadmissibly the human error, there are other reasons why decision support systems, and business intelligence systems may be subject to erroneous decision or reporting, other common causes are inappropriate forecasting strategies, for instance inappropriate categorization and choosing the wrong sources of data verification and validation or relevant resources. The well-known belief that “garbage in, garbage out” will drive quality into inconsistency and in many other instances into wrong data derivation. In some data quality for a calibration procedure project I worked on, a procedural error was entered not by the code but by a random reading of data files, which caused for the procedure to introduce inappropriate measurements based on the timestamp associated with the data files. After I found why the algorithm, in spite of being accurate, would introduce an error, the lead researcher refused to accept an improved algorithm that would have eliminated the error, simply because this would have meant to redo the entire experiment. Therefore, the cell calibration test introduced a small percent of inconsistent data, which did not invalidate the regression, correlation, and factorial design model used to validate the test. The entire data went to an Oracle database, where it was analyzed, mined, and presented for calibration test of the CDMA wireless protocol. Indeed, it is obvious that time management was a key factor in making this decision. Having also worked directly with statisticians, biostatisticians and operations research analysts and managers, where my role was a DBA, data analyst, software engineer, software developer or simply consultant, I believe that such a decision might have been contested by others or at least carefully revised.

There are other scenarios, where a DBA role may be the underlying reason or perhaps not so directly. For instance, if a company adopts a DBA corporate approach, such that the CURSOR_SHARING parameter is set to SIMILAR, this may perform consistently in most databases, but could become an issue in many OLTP scenarios, in particular, in queries widely using bind variables. This is because the SQL bind peeking will not be so much concerned with the actual value of the bind variables. In those scenarios, setting CURSOR_SHARING to EXACT is more practical, since it avoids the typical scenario where the cached statement becomes the “SQL bind peeking bible” matching the query and reparsing of the statement never actually occurs. In these queries were used in BI environments, it would obvious that there is the potential and probability that at some point erroneous reporting occurs. Therefore, there is the possibility that a large amount of BI effort, such as applications, and tool-driven implementation can span into a significant level where some rows have been neglected for displaying. In some cases, I have tested, the rendering of result set from re-indexing of tables on additional columns can lead for this misbehavior of the SQL bind peeking and statement parsing to take place, which occurs mostly in OLTP databases.

While error and error propagation of various types is always possible, achieving data quality and integrity can only be measure in relation to the impact. For instance, in a research environment the error may have great or no significance, but it could be devastating in mission-critical environments such as medical and financial systems. Error tracking then becomes a pro-active objective to attain data integrity, consistency, and quality that is to be handled not only by manual human control and resilient data consistency but quality corporate auditing controls as well. In most instances, internal controls are more valuable than external controls such as outsourced data verification and validation, simply because internal resources usually know more about the applications, data domains, and data contents overall.

In some instances, database-driven data mining and BI tools, such as Discoverer, can be complemented not only their own visualization capabilities but also with the usage of other methodologies such as, for instance, surface response models and artificial intelligence mining. While visualization is unlikely to eliminate or resolve any existing database error, i.e., data quality or consistency, it can certainly congruently derive data trend and convey confirming or rejecting the desirable results to verify or validate. Therefore, visualization models are helpful to regression, correlation, and statistical tests, simply because they can verify and validate the data quality with one simple view. While BI models that include functions such as CUBE and ROLLUP, which can further aggregate and categorize data, visualization techniques are powerful tools for decision making in that they provide truthful real-life validation of data trends. Both database and statistical tools and models can expand their data mining capabilities with analysis of central tendency, — such as kurtosis and skewness—, and measures of variability such as mean deviation, variance and standard deviation, and further studies on analysis of variance, but it the end it goes for good to present to have chosen a visualization tool that provides the greatest expectations overall.

13 comments:

pbsl said...

you have a nice site. thanks for sharing this valuable resources. keep it up. anyway, various kinds of ebooks are available here

http://feboook.blogspot.com

miladawley said...

情趣用品kk俱樂部thmt聊天室尋夢園aio交友愛情館080聊天室女優盒子aisex美腿小站糟糕貼圖韓國a片85cc成人片0951成人頻道下載免費a片18成人sex18禁少女遊戲ivy許嘉凌寫真av女優介紹sex tube賓館偷拍限制寫真女郎dvd歐美版日本熟女人85cc免費影城85cc嘟嘟性愛高手自拍人妻裸體自拍嘟嘟情色網3d美女圖視訊交友自拍俱樂部女友情慾自拍褲襪美腿小說歐美辣妹寫真集正妹日報重生版85cc st小魔女自拍avvcd舒淇寫真集圖片18成人網熟女人正妹牆成人網wretchxd wall 無名正妹牆無碼 dvd女優報報sex俱樂部情色網站aio 交友愛情館影片免費85cc視訊辣妹ec成人色情片

懷念 said...

今天是個好天氣~祝你愉快~^^~~..............................

餐廳 said...

希望是風雨之夜所現之曉霞........................................

精采 said...

來給你加加油~打打氣!!!更新之餘,也要注意休息哦~~........................................

珍雅 said...

路過--你好嗎..很棒的BLOG.........................................

政倫政倫 said...

cool blog,期待更新........................................

香君 said...

祝福你人氣不減ˊˇˋ.........................

W1219estonMitcham said...

KK777一夜激情聊天live show成人自拍貼圖自慰少婦自拍裸體圖片台灣色情成人網站情人視訊網情色留言板視訊美女免費視訊聊天室限制級極度震撼情色論壇色情特區自拍裸女貼圖潮吹性影片觀賞小穴情色片a圖片sex story性愛影片美女做愛成人色情網站性愛圖片成人情色貼圖全裸寫真集圖片走光圖女生陰毛自慰影片色情av1007成人色情聊天室女生自慰裸體照成人影音聊天台灣色情網站色情片打手槍情色天堂成人視訊聊天免費情色網站av網超性感辣妹激突成人論壇情色視訊聊天鹹濕成人網站av成人論壇免費美女視訊

蘇pet0701em_halvorsen said...

Nice blog85cc,咆哮小老鼠,85街,免費影片,情趣爽翻天,愛戀情人用品,交友找啦咧,線上a片,女同志聊天室,sexy,色情網站,網愛聊天室,情色性愛貼圖,小穴,性愛姿勢,陰脣室,成人圖貼,性愛技巧,a片論壇,色情,85c,sexy網,人妻,脫衣,6k,18禁,手淫,性幻想,77p2p,情色,1007,85c,0401,後宮,色情,淫蕩,正妹,77p2p,ut室

瑋玲 said...

喜樂的心是健康良藥,憂傷的靈使骨枯乾。........................................

韋于倫成 said...

Nice blog..................................................

志文 said...

臺灣自拍貼圖網做愛自拍照免費線上自拍影片自拍短片國中自拍av99直播av99洪爺影城av99影片亞洲av99影片金瓶梅影片av99影片洪爺avdvd168影片av女優線上免費看av女優論壇av免費情色短片av免費電影av免费电影av亞洲成人區av直播台av直播式av看影片av動畫卡通av專賣店av情色網站排行榜av情色影片免費觀賞av情圖貼av援交妹av免費區AV免費看短片av色貼圖av女優館成人文學網a18禁女生遊戲區85cc成人片視訊美女