Nicole IbarraAugust 15, 2019
The Next Wave of Machine Learning
Machine learning (ML) is shaping the cyberworld and how users interact with organizations. It has made incredible impacts on the anti-fraud industry and strengthened present security solutions. Learn what Appgate's machine learning experts are seeing now and predicting for the future of machine learning – and how the technology can be leveraged by both attackers and security teams.
Adversarial Machine Learning
Adversarial machine learning is a technique in which algorithms are fed malicious input in an attempt to fool them into making analysis mistakes. Criminals looking to exploit algorithms can already use this technique to a limited extent, and Appgate's experts foresee adversarial machine learning being increasingly leveraged by criminals, making it more prevalent in the future of fraud attacks.
An example of such an attack involves tricking the facial recognition algorithms used to identify users for biometric authentication. Fraudsters may inject nearly undetectable noise, either electronically or through physical stickers or printed items, into the images used for facial recognition, causing the algorithm to misbehave. Fraudsters can then convince the algorithm to misclassify their face, allowing them to successfully impersonate a user and hack their account.
Nefarious actors can also use adversarial machine learning to directly alter banking website addresses. Many banking URLs are very long, and most end users do not take the time to check the address bar of their browsers when accessing a site they routinely use. Fraudsters take advantage of this and make alterations to URLs that are nearly undetectable, such as manipulating the characters in the address to create noise. These subtle changes are able to trick not only users, but also most machine-learning-based phishing detection algorithms into thinking that the addresses are legitimate.
Privacy-Preserving Machine Learning
Authentication through facial recognition is considered to be a secure way to log in to sensitive accounts, and a lot of trust is placed in this system. However, data that is valuable to fraudsters, such as recordings and images of users’ faces, are commonly stored unencrypted in the cloud and are vulnerable to hacks and data breaches.
This is where machine learning can step in and provide extra security – to an extent. To continue with the facial recognition example, current ML algorithms work best with raw data, or non-anonymized information about each user. This means that the stored data is vulnerable to attackers who want to gain sensitive information about the users’ faces. Our experts speculate that future algorithms will be able to work with anonymized or encrypted data, providing strong privacy protection while maintaining high-performance machine learning solutions.
End-to-End Machine Learning Platforms
Machine learning models play an essential part in the future of cybersecurity. Due to their complexity, the creation, training, and implementation of new models can be particularly time-consuming, an issue exacerbated by the several teams involved in the deployment of machine learning platforms.
End-to-end machine learning enables developers and data scientists to connect and seamlessly work together throughout the stages of data preparation, algorithm selection, model development, implementation adjustment and optimization, and finally, launch. Every stage can be easily uploaded to the cloud with the touch of a single button. This results in faster deployment of new algorithms which, in security, translates to faster and more accurate responses to new phishing, malware, and other attacks.
Experts here at Appgate predict that the cybersecurity industry will soon be able to leverage end-to end to provide quicker and more consistent machine learning advancements. Algorithm development processes and times, for example, could be reduced from months to days, and end-to-end machine learning will allow institutions to respond at a more rapid rate to any new attacks, providing overall improved analysis and countermeasures.
A problem that plagues nearly every machine learning developer is the need for labeled data. Labeling is the process through which tags or categories are assigned to data points. For example, when training a Random Forest algorithm to classify animals into groups, each piece of training data, in this case information about each individual animal, must be labeled in order to accurately train the model.
Tagging individual data points is time-consuming and expensive, and can be prohibitive for new projects to be started.
Active learning algorithms use machine learning to automatically label data during the algorithm training process, leaving only a small portion of the work to a human annotator. This saves large amounts of precious time and money for teams that often work with massive amounts of data.
For now, active learning algorithms have been used mostly for research and academic purposes. Appgate experts predict that the anti-fraud industry will quickly begin to use this technology in its research efforts, and is currently developing algorithms for security applications.
Machine algorithms thrive on data – they must consistently learn in order to perform well and adapt. However, the security industry presents some unique challenges in which algorithms must learn from very limited data.
Machine learning, without sufficient information to make conclusions from, is not able to accurately recognize patterns that it hasn’t seen before. If the system encounters something that it has not been taught, it may incorrectly label it, resulting in misclassification.
For example, let’s look at fictional Bank Y. Bank Y has 100 users that have been banking with them for more than 12 months. One day, 20 brand new users create accounts – is it possible to train Bank Y’s transaction anomaly detection algorithm to recognize and identify the new users? The answer is yes, but it’s difficult, as there is not enough training data to make conclusions about the new population.
Few-shot learning is a way of adapting certain algorithms, such as Neural Networks, in order to enhance their performance when classifying populations for which there is scarce data. The use of few-shot learning is currently limited to research applications, but the experts at Appgate anticipate that, in the future, algorithms used in the anti-fraud industry will be able to be trained with small amounts of data.
It is undeniable that machine learning will continue to flourish in the security industry, making categorizing and predicting new information relevant to the product. However, like any good thing that comes into fruition, there are those who desire to use it for malicious intent. By staying on top of the latest developments in machine learning, your institution will have a better understanding of the future of fraud.