In this day and age, it is crucial to link your company’s business needs and data science, and the best way to marry the two is by helping artificial intelligence fulfill its obligations. Machine learning is the best route to achieve artificial intelligence. Specifically, machine learning is a set of tools and approaches that permit a computer to sort through large datasets. When implementing machine learning, a computer is given a large dataset of “training” data which includes an “answer key”, and the computer uses the set to acquire that answer key. The model is then tested a second time against a different dataset to check for accuracy.
Before bringing machine learning onboard you must decide whether it’s the right tool for the job. According to Professor Yaser Abu-Mostafa of Caltech, there are three criteria for whether a problem is a good match for employing machine learning. The first criterion is if there is enough data. There must be a large amount of data to train the system; the more training, the more effective the system. The second criterion is that there must be a relationship between the data inputs and the results you’re predicting. Lastly, the pattern you’re observing cannot be explained using plain English. The intricacies of the data should contain a lot of nuances and hidden variables. If you can describe everything in plain English, then you can build a simpler model, rendering machine learning unnecessary.
When you do bring machine learning onboard you should view it from an ROI perspective. The criteria to determine the value of the machine learning models is accuracy. You will have to ask yourself, if the correct data made by the system is of greater value than incorrect data. For example, the value of correctly guessing whether an employee will quit could be worth a few thousand dollars, but an incorrect prediction of a resignation could cost very little. Another important consideration is the stability of your machine learning model. Basically, it comes down to the question, will small changes in the data affect the predicted outcome. In the employee resignation example, we can risk inaccuracy if we’re following an employee’s probability of quitting over time and the age changes every month, changing the decimal point in their age, e.g. 25.7.
One final question you should consider is whether, from an analysis of your business needs, you need to know the relationship between the input fields and the outputs, or should you use “black box” methods. Using neural networks are the black box models, they don’t reveal how they got to the outputs or which inputs were the most important when it came to develop the outputs. Generally, neural networks are more accurate but less transparent. Other models, such as linear regression or decisions trees will reveal which inputs contributed to what outputs, but you may lose or the ability to see how it’s working or a reduced insight. You will have to decide on which method or to what degree you want to mix the methods; it comes down to what best suits your business.