Abstract
Detecting abusive and fraudulent claims is one of the key challenges in online food delivery. This is further aggravated by the fact that it is not practical to do reverse-logistics on food unlike in e-commerce. This makes the already-hard problem of harvesting labels for fraud even harder because we cannot confirm if the claim was legitimate by inspecting the item(s). Using manual effort to analyze transactions to generate labels is often expensive and time-consuming. On the other hand, typically, there is a wealth of ‘noisy’ information about what constitutes fraud, in the form of customer service interactions, weak and hard rules derived from data analytics, business intuition and domain understanding.
In this paper, we present a novel end-to-end framework for detecting fraudulent transactions based on large-scale label generation using weak supervision. We directly use Stanford AI Lab’s (SAIL) Snorkel and tree based methods to do manual and automated discovery of labeling functions, to generate weak labels. We follow this up with an auto-encoder reconstruction-error based method to reduce label noise. The final step is a discriminator model which is an ensemble of an MLP and an LSTM. In addition to cross-sectional and longitudinal features around customer history, transactions, we also harvest customer embeddings from a Graph Convolution Network (GCN) on a customer-customer relationship graph, to capture collusive behavior. The final score is thresholded and used in decision making.
This solution is currently deployed for real-time serving and has yielded a 16% points’ improvement in recall at a given precision level. These results are against a baseline MLP model based on manually labeled data and are highly significant at our scale. Our approach can easily scale to additional fraud scenarios or to use-cases where ‘strong’ labels are hard to get but weak labels are prevalent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: rapid training data creation with weak supervision. In: VLDB Endow 11, 3, 269–282 (2017). https://doi.org/10.14778/3157794.3157797
Varma, P., Ré, C.: Snuba: automating weak supervision to label training data. In: VLDB Endow 12, 3, 223–236 (2018). https://doi.org/10.14778/3291264.3291268
Zhang, W., Wang, D., Tan, X.: Robust class-specific autoencoder for data cleaning and classification in the presence of label noise. Neural Process. Lett. 50(2), 1845–1860 (2018). https://doi.org/10.1007/s11063-018-9963-9
Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., Jiang, C.: Random forest for credit card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), Zhuhai, pp. 1–6 (2018). https://doi.org/10.1109/ICNSC.2018.8361343
Sahin, P.Y., Duman, E.: Detecting credit card fraud by decision trees and support vector machines. In: IMECS 2011 - International Multi Conference of Engineers and Computer Scientists, 1, 442–447 (2011)
Gomez, J.A., Arevalo, J., Paredes, R., Nin, J.: End-to-end neural network architecture for fraud scoring in card payments. Pattern Recogn. Lett. 105, 175–181 (2018)
Wang, S., Liu, C., Gao, X., Qu, H., Xu, W.: Session-based fraud detection in online e-commerce transactions using recurrent neural networks. In: Altun, Y., Das, K., Mielikäinen, T., Malerba, D., Stefanowski, J., Read, J., Žitnik, M., Ceci, M., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 241–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_20
Jiang, J., et al.: Anomaly detection with graph convolutional networks for insider threat and fraud detection. In: MILCOM 2019–2019 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, pp. 109–114 (2019). https://doi.org/10.1109/MILCOM47813.2019.9020760
Cao, S., Yang, X., Chen, C., Zhou, J., Li, X., Qi, Y.: TitAnt: Online Real-time Transaction Fraud Detection in Ant Financial (2019)
Chen, C., et al.: InfDetect: a Large Scale Graph-based Fraud Detection System for E-Commerce Insurance (2020)
Branco, B., Abreu, P., Gomes, A., Almeida, M., Ascensão, J., Bizarro, P.: Interleaved sequence RNNs for fraud detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020)
Kingma, D.P., Welling, M.: Auto-Encoding Variational Bayes (2014)
Im, D., Ahn, S., Memisevic, R., Bengio, Y.: Denoising criterion for variational auto-encoding framework. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2059–2065 (2017). AAAI Press
Guo, J., Liu, G., Zuo, Y., Wu, J.: Learning sequential behavior representations for fraud detection. In: 2018 IEEE International Conference on Data Mining (ICDM), Singapore, pp. 127–136 (2018). https://doi.org/10.1109/ICDM.2018.00028
Zheng, Y.J., Zhou, X.H., Sheng, W.G., Xue, Y., Chen, S.Y.: Generative adversarial network based telecom fraud detection at the receiving bank. Neural Netw. 102, 78–86 (2018)
Deng, R., Rua, N., Zhang, G., Zhang, X.: FraudJudger: Fraud Detection on Digital Payment Platforms with Fewer Labels, arXiv:1909.02398 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mathew, J., Negi, M., Vijjali, R., Sathyanarayana, J. (2021). DeFraudNet: An End-to-End Weak Supervision Framework to Detect Fraud in Online Food Delivery. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12978. Springer, Cham. https://doi.org/10.1007/978-3-030-86514-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-86514-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86513-9
Online ISBN: 978-3-030-86514-6
eBook Packages: Computer ScienceComputer Science (R0)