Aiming at the problem of object recognition caused by small object scale, multi-interaction (occlusion), and strong hiding characteristics in the scene analysis task, an object-region-enhanced network based on deep learning was proposed. The network integrated two core modules designed for the task: object area enhancement strategy and black-hole-filling strategy. The former directly corresponded the object region with high semantic confidence to the local region of the specific category channel of the convolutional feature image. Weighted features were used to improve contextual relationships, and difficult object regions were identified. The latter avoided the mistake of identifying some difficult areas as additional background classes by masking additional background classes. The results showed that the modular design scheme improved the overall parsing performance of the model by replacing the modules, and the two strategies were applied to other existing scenario parsing networks. A unified framework is proposed for handling scene resolution tasks. Benefiting from the modular design approach, the proposed algorithm improves overall performance by replacing convolution or detection modules. Object enhancement and black hole filling are applied to other systems to improve the system’s ability to parse objects. Object area enhancement methods are used to recall objects that are not recognized in a standard split network. Black hole fill techniques can be used to resolve pixels that are incorrectly categorized into additional background classes that do not exist. Therefore, a variety of contextual semantic fusion strategies have certain reference value in the theoretical level of computer vision. More critically, this method has certain reference significance for the design and development of robust and practical application systems.
Deep learning Hierarchical semantics Image recognition Image retrieval Scene analysis Convolutional neural network Spatial pyramid pooling
This is a preview of subscription content, log in to check access.
Compliance with ethical standards
Conflict of interest
All authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
Ahmad J, Mehmood I, Rho S, Chilamkurti N, Baik SW (2017) Embedded deep vision in smart cameras for multi-view objects representation and retrieval. Comput Electr Eng 61:297–311CrossRefGoogle Scholar
Andrienko G, Andrienko N, Chen W, Maciejewski R, Zhao Y (2017) Visual analytics of mobility and transportation: state of the art and further research directions. IEEE Trans Intell Transp Syst 18(8):2232–2249CrossRefGoogle Scholar
Backfrieder C, Ostermayer G, Mecklenbräuker CF (2017) Increased traffic flow through node-based bottleneck prediction and v2x communication. IEEE Trans Intell Transp Syst 18(2):349–363CrossRefGoogle Scholar
Chen KW, Wang CH, Wei X, Liang Q, Chen CS, Yang MH et al (2017) Vision-based positioning for internet-of-vehicles. IEEE Trans Intell Transp Syst 18(2):364–376CrossRefGoogle Scholar
Delhey K, Peters A (2017) Conservation implications of anthropogenic impacts on visual communication and camouflage. Conserv Biol 31(1):30–39CrossRefGoogle Scholar
Everett HL (2014) Consistency & contrast: a content analysis of web design instruction. Tech Commun 61(4):245–256Google Scholar
Fadlullah Z, Tang F, Mao B, Kato N, Akashi O, Inoue T, Mizutani K (2017) State-of-the-art deep learning: evolving machine intelligence toward tomorrow’s intelligent network traffic control systems. IEEE Commun Surv Tutor 19(4):2432–2455CrossRefGoogle Scholar
Geise S, Baden C (2014) Putting the image back into the frame: modeling the linkage between visual communication and frame-processing theory. Commun Theory 25(1):46–69CrossRefGoogle Scholar
Gravet R, Cabrera-Vives G, Pérez-González PG, Kartaltepe JS, Barro G, Bernardi M, Kocevski D (2015) A catalog of visual-like morphologies in the 5 candels fields using deep learning. Astrophys J Suppl Ser 221(1):8CrossRefGoogle Scholar
Holmström O, Linder N, Ngasala B, Mårtensson A, Linder E, Lundin M, Lundin J (2017) Point-of-care mobile digital microscopy and deep learning for the detection of soil-transmitted helminths and Schistosoma haematobium. Global Health Action 10(sup3):1337325CrossRefGoogle Scholar
Kelly M (2015) Visual communication design as a form of public pedagogy. Aust J Adult Learn 55(3):390–407Google Scholar
Kim H, Lee H (2016) Cognitive activity-based design methodology for novice visual communication designers. Int J Art Des Educ 35(2):196–212CrossRefGoogle Scholar
Kinkeldey C, Maceachren AM, Schiewe J (2014) How to assess visual communication of uncertainty? A systematic review of geospatial uncertainty visualisation user studies. Cartogr J 51(4):372–386CrossRefGoogle Scholar
Ledin P, Machin D (2016) A discourse–design approach to multimodality: the visual communication of neoliberal management discourse. Soc Semiot 26(1):1–18CrossRefGoogle Scholar
Lemley J, Bazrafkan S, Corcoran P (2017) Deep learning for consumer devices and services: pushing the limits for machine learning, artificial intelligence, and computer vision. IEEE Consum Electron Mag 6(2):48–56CrossRefGoogle Scholar
Lillicrap TP, Cownden D, Tweed DB, Akerman CJ (2016) Random synaptic feedback weights support error backpropagation for deep learning. Nat Commun 7:13276CrossRefGoogle Scholar
Mao B, Fadlullah ZM, Tang F, Kato N, Akashi O, Inoue T, Mizutani K (2017) Routing or computing? the paradigm shift towards intelligent computer network packet transmission based on deep learning. IEEE Trans Comput 66(11):1946–1960MathSciNetCrossRefzbMATHGoogle Scholar
Muhammad K, Ahmad J, Baik SW (2018) Early fire detection using convolutional neural networks during surveillance for effective disaster management. Neurocomputing 288:30–42CrossRefGoogle Scholar
Nguyen VN, Jenssen R, Roverso D (2018) Automatic autonomous vision-based power line inspection: a review of current status and the potential role of deep learning. Int J Electr Power Energy Syst 99:107–120CrossRefGoogle Scholar
Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. Appl Intell 42(4):722–737CrossRefGoogle Scholar
Oliveira AW, Cook K (2016) Student visual communication of evolution. Res Sci Educ 47(3):1–20Google Scholar
Peng C, Dang Y, Liang R, Wei Z, He X (2017) Real-time object tracking on a drone with multi-inertial sensing data. IEEE Trans Intell Transp Syst 19(1):131–139Google Scholar
Perrett T, Mirmehdi M, Dias E (2017) Visual monitoring of driver and passenger control panel interactions. IEEE Trans Intell Transp Syst 18(2):321–331CrossRefGoogle Scholar
Riveiro M, Lebram M, Elmer M (2017) Anomaly detection for road traffic: a visual analytics framework. IEEE Trans Intell Transp Syst 18(8):2260–2270CrossRefGoogle Scholar
Senaratne H, Mueller M, Behrisch M, Lalanne F, Bustos-Jiménez J, Schneidewind J et al (2018) Urban mobility analysis with mobile network data: a visual analytics approach. IEEE Trans Intell Transp Syst 19(5):1537–1546CrossRefGoogle Scholar
Stanger-Hall KF, Lloyd JE (2015) Flash signal evolution in Photinus fireflies: character displacement and signal exploitation in a visual communication system. Evolution 69(3):666–682CrossRefGoogle Scholar
Tong L, Cao X, Jiang J (2017) Visual object tracking with partition loss schemes. IEEE Trans Intell Transp Syst 18(3):633–642CrossRefGoogle Scholar
Zhou M, Qu X, Jin S (2017) On the impact of cooperative autonomous vehicles in improving freeway merging: a modified intelligent driver model-based approach. IEEE Trans Intell Transp Syst 18(6):1422–1428Google Scholar
Ziatdinov M, Dyck O, Maksov A, Li X, Sang X, Xiao K, Kalinin SV (2017) Deep learning of atomically resolved scanning transmission electron microscopy images: chemical identification and tracking local transformations. ACS Nano 11(12):12742–12752CrossRefGoogle Scholar