5 challenges to implementing QA strategy in data and analytics projects

3 years ago 498

Developing a QA strategy for unstructured information and analytics tin beryllium a trying and elusive process, but determination are respective things we've learned that tin amended accuracy of results.

big information  search

Image: iStock/HAKINMHAN

In a accepted exertion improvement process, prime assurance occurs astatine the unit-test level, the integration trial level and, finally, successful a staging country wherever a caller exertion is trialed successful an situation akin to what it volition execute with successful production. While it's not uncommon for less-than-perfect information to beryllium utilized successful aboriginal stages of exertion testing, the assurance successful information accuracy for transactional systems is high. By the clip an exertion gets to last staging tests, the information that it processes is seldom successful question.

SEE: Kubernetes: A cheat expanse (free PDF) (TechRepublic)

With analytics, which uses a antithetic improvement process and a premix of structured and unstructured data, investigating and prime assurance for information aren't arsenic straightforward.

Here are the challenges:

1. Data quality

Unstructured information that is incoming to analytics indispensable beryllium correctly parsed into digestible pieces of accusation to beryllium of precocious quality. Before parsing occurs, the information indispensable beryllium prepped truthful it is compatible with the information formats successful galore antithetic systems that it indispensable interact with. Data besides indispensable beryllium pre-edited truthful arsenic overmuch needless sound (such arsenic transportation "handshakes" betwixt appliances successful Internet of Things data) are eliminated. With truthful galore antithetic sources for data, each with its ain acceptable of issues, information prime tin beryllium hard to obtain.

SEE: When close information produces mendacious information (TechRepublic)

2. Data drift

In analytics, information tin statesman to drift arsenic caller information sources are added and caller queries change analytics direction. Data and analytics drift tin beryllium a steadfast effect to changing concern conditions, but it tin besides get companies distant from the archetypal concern usage lawsuit that the information and analytics were intended for. 

SEE: Electronic Data Disposal Policy (TechRepublic Premium)

3. Business usage lawsuit drift

Use lawsuit drift is highly related to drifts successful information and analytics queries. There is thing incorrect with concern usage lawsuit drift—if the archetypal usage lawsuit has been resolved oregon is nary longer important. However, if the request to fulfill the archetypal concern usage lawsuit remains, it is incumbent connected IT and the extremity concern to support the integrity of information needed for that usage lawsuit and to make a caller information repository and analytics for emerging usage cases.

SEE: 3 rules for designing a beardown analytics usage lawsuit for your projected project (TechRepublic)

4. Eliminating the close data

In 1 case, a biomedical squad studying a peculiar molecule wanted to accumulate each portion of information it could find astir this molecule from a worldwide postulation of experiments, papers and probe The magnitude of information that artificial intelligence and machine learning had to reappraisal to cod this molecule-specific information was enormous, truthful the squad made a determination up beforehand to bypass immoderate information that was not straight related to this molecule.The hazard was that they mightiness miss immoderate tangential information that could beryllium important, but it was not a ample capable hazard to forestall them from slimming down their information to guarantee that lone the highest quality, astir applicable information was collected.

SEE: 3 reasons concern users should bargain an M1 MacBook Pro alternatively of the M1 MacBook Air (TechRepublic)

Data subject and IT teams tin usage this attack arsenic well. By narrowing the funnel of information that comes into an analytics information repository, information prime tin beryllium improved.

5. Deciding your information QA standards

How cleanable does your information request to beryllium successful bid to execute value-added analytics for your company? The modular for analytics results is that they indispensable travel wrong 95% accuracy of what taxable substance experts would person determined for immoderate 1 query. If information prime lags, it won't beryllium imaginable to conscionable the 95% accuracy threshold.

SEE: Ag tech is moving to amended farming with the assistance of AI, IoT, machine imaginativeness and more (TechRepublic)

However, determination are instances erstwhile an enactment tin statesman to usage information that is less-than-perfect and inactive deduce worth from it. One illustration is successful wide trends analysis, specified arsenic gauging increases successful postulation implicit a roadworthy strategy oregon increases successful temperatures implicit clip for a effect crop. The caveat is: If you're utilizing less-than-perfect information for wide guidance, ne'er marque this mission-critical analytics.

Data, Analytics and AI Newsletter

Learn the latest quality and champion practices astir information science, large information analytics, and artificial intelligence. Delivered Mondays

Sign up today

Also see

Read Entire Article