1 Introduction

The rapid growth of the web, and searching for data on the web is becoming a difficult task. Web searching is the main activity on the World Wide Web and the main web search engine such as DuckDuckGo, Yahoo, Bing, Google, etc., are commonly used search engines for finding exact results in the massive area of the web. Search engine algorithms are Extremely surreptitious. Nobody knows precisely what each search engine weighs and what importance it attaches to each factor in the formula. Each search engine employs its own filters to eliminate spam. For getting data or information, a user enters the query on the search engine and expects the top possible result. Search engine returns a list of web pages which contains the anticipated information. But when more than one search engines are used to performing the similar task, the diversity of results increases which can enhance the satisfaction level of the users. Due to this diversity of web, meta-search engine is getting more popularity. Scientists have confirmed that merging search results in a meta-search engine makes a substantial progress in a search result. Combining the results is an important phase in meta-search engines. When a person enters a query to the meta-search engine, then it selects the suitable set of web search engines based on the search query. Execute a search query in different search engines and extract the data. Then combine these extracted searched data into a single search result list. Combining the search results into a particular result list is a difficult action because of the diversity of results. The main contribution of this paper is to introduce a new approach to optimize meta-search results using the combination of linear search and semantic search.

Some related research work connected to this topic is discussed in Sect. 2. The explanation of solution is discussed in Sect. 3. The algorithm and architecture design are discussed in Sect. 4. In Sect. 5, experimental result are shown. In Sect. 6, conclusion are presented.

2 Related Work

The study in [1] shows that not even one research used mathematical optimization theory for optimizing the search results. Linear programming is a margining technique, which is used to combine the search result obtained from meta-search engines. In this proposed approach, pages are ranked based on the position of each document from member search engines, and then linear programming is used for combining the result for find best page with the highest rank in the result. This approach is one of the best existing techniques for optimizing the meta-search engine result.

A novel search result integration algorithm for meta-search engine is introduced in [2] and the efficiency of the merging algorithm is evaluated. The algorithm contains position merge algorithm and the titles and snippets merging process. The ideas of position merging technique are to take advantage of the real position of the document from individual search engines. If the same search query is passed to other search engines, there is a change that pages which will occur in several result lists of not the same search engines, and the position of the document in the result cannot be equal. By the use of titles and snippets technique, the similarity among the query and documents is calculated. It concludes that the improved merging algorithm could increase the excellence of searching.

The paper [3] describes the Meta search engine ranking method based on user vote system. The user model proposed based on personal behavior is contrasted, which helps the user to vote to search results obtained from meta-search engine, and those updates repeatedly change the result list. Based on the user model, a search result list ranking of the meta-search engine is calculated. The paper [3] discuss a user model based search engine query result ranking method. It constructs a user model, based on user preferences. In addition, it allows users to take an ingenuity vote to the search results. Moreover, it updates dynamically and automatically.

The study in [4] shows no single search engine can index the whole Internet, proposes a new method for vertical search engines with page rank algorithm and latent semantic indexing. The system takes the web pages with same ranks and those are semantically related to a relevant query for the crawling process. Sajeev and Ramya [5] proposes a new system that accepts time information, logical information, or semantic information combining the navigational mode, and sorting out users according to their interest and nature. In this paper, [6] describes a dictionary-based synonyms technique, which helps to find a semantically similar word. Here, the dictionary consists of the most commonly used English word. When a word appears, the learning model will generate synonym for that word.

In this paper, [7] a comparative study of various search engine techniques has been done. For that, they used different factors like n-gram indices, keyword, form, and flash optimization. This tells crawling and fetching are the best method for faster and efficient data retrieval. This paper [8] also describes the different search engine optimization techniques. Rather than giving more importance to search engine ranking, it enables the website owner. For each website owner, it results better user interaction to a website which has owned by them.

3 Proposed Solution

In this section, we give a brief explanation of linear search and semantic search in general. The linear search method is used to re-rank and optimizes meta-search engine result. In the proposed method, initially the user input the query to search. After that, the meta-search engine repeats the query and passes the search query to several other existing search engines. The different results will be obtained from different search engines. For each document or page, it has a dissimilar rank in different search engines so the linear method is used to merge the rank from several existing search engines and re-rank those documents.

For example, for a meta-search engine if there is a use of four separate search engines, then let us have an assumption that the first page of each search engine has 10 documents consider D1, D2, D3, D4, D5,..., D10 in which some of the documents may be common. For each document, assign some integer value to the search engine result in a particular position. Search engine result contains D1 (Document 1) is on the first position. Then assign a rank 10 to it. If it is on the last position, then assign a rank 1 to it, and if it is on the second position, then assign a rank 9 to it. Repeat the same procedure for all documents. There may be a chance that document D1 can appear on different search engines on a different position of a search result. As per this example, rank of D1 calculated on the meta-search engine will be the sum of result position on all four separate search engines, i.e., (\(10+1+9+5=25\)). The similar way for document D2 on the first search engine on the first position, D2 on the second search engine on the second position, D2 on the third one on the third position, and then if D2 is not present on the fourth search engine, its rank will be (\(10+9+8+0=27\)).

Once the result is obtained from the linear search, those search result rank will be recalculated by semantic search. For semantic search WordNet used, WordNet is one of the best lexical databases that covers all possible synonyms, and assemble words into the synonyms which will return the semantic meaning between them. In this method, a semantic search performed on each document is obtained from the linear search. Then, if any similar word present on the document, its rank will get updated.

Linear search rank of each document is calculated by Eq. (1).

$$\begin{aligned} LP_{W_i} = \sum _{i=1}^{n} \text {Pos} * \text {Rank}(P_i) \end{aligned}$$
(1)

where \(LP_{W_i}\) denotes linear rank of each document, Pos denotes position of document in a search result, and n denotes number of search engines. Based on Eq. (2), the final rank of each document is calculated.

$$\begin{aligned} P_{W_i} = LP_{W_i} + SP_{W_i} \end{aligned}$$
(2)

where \(SP_{W_i}\) denotes semantic rank of each document and \( P_{W_i}\) denotes combined rank that contains the best rank for each document.

4 Architecture of Proposed System

4.1 Architecture of Proposed System

In this section, we present a meta-search engine architecture design based on a new result ranking approach shown in Fig. 1. The main objective of our system is to help any user for receiving more relevant search results from the Web. This is accomplished by querying multiple search engines at the same time, then retrieving the results, combine those result using the proposed ranking method and presenting them to the user in a single ranked list.

Fig. 1
figure 1

Architecture of proposed system

The proposed system has the main six important modules, Interface Module, Query Processing Module, Search Result Retrieving Module, Position Ranking Module, Semantic Search Module, and Re-ranking Module. To better understand the proposed system, we will describe the functioning of each module.

Interface module lets the users to type their queries in a simple and natural way using their own keywords or search term without any particular search engine representation or restriction. It receives the query as input, then it sends to query processing module. Query processing module receives the query from a user, then it converts it into a query specific to the search engine and it encapsulates it in the search engine URL. After executing each URL one by one, the search result-retrieving module receives the search result from different search engines as input. Then it extracts the top specified number of URL from each search engines. Search result retrieving module receive the search result from different search engines as input, then it extracts the top specified number of URL from each search engine. Then, each URL sets sends to the next module. Position ranking module receives the URL sets as input. It will perform the proposed linear search algorithm, and then calculate the rank for individual URL. Semantic search module receives the document as input. It will find the synonym for the keyword entered by the user. Then, it will search for that synonym in the document, calculate the rank for each document. Finally, the re-ranking module calculates a new rank for each document based on the result obtained from linear search and semantic search.

4.2 Meta-Search Engine Optimization Algorithm

figure a

Assume \(n\) search engines symbolized by \(SE_1,SE_2,\ldots ,SE_n\) and \(D_1,D_2,\ldots , D_n\) denotes search result from ith search engine. Search query \(q\) containing some keywords is sent to the search engines, and individual search engine returns the initial \(m\) result. Each \(m\) set of URLs from \(SE_i\) search engines is stored in \(D_i\). \(P_i\) denotes ith web page presented in the \(D_i\) search engine result where \(P_{W_i}\) denotes position rank of each web page. \(list\) contain set of synonyms for keyword contained in the search query.

This proposed algorithm can be able to combine the results of meta-search engine, finding best list involving of m documents between search result obtained from n search engines for a given search query q.

5 Experimental Result

For testing efficiency of our new meta-search result ranking algorithm, we implemented a prototype of the proposed system. Our meta-search engine supports querying several search engines, and we selected in the first step to fix the number of search engines to 4. And, we selected most of the famous and most used search engines such as Google, DuckDuckGo, Bing, and Yahoo. Instead of the testing system on a document collection, we decided to test it directly on the World Wide Web. The following graph made in the first 10 results of 4 search engines for a keyword engineering. Numbers represent the presence of a particular page, for example, if the number is 3 then that particular page is presented in three search engines. Similarly, if the number is 1, then that page is presented only in one search engine. Every search engine may return the same page but with different positions. From the following graph, we can understand the diversity of results. Because, we only extracted the top 10 pages from each search engines. From that search engines, we obtained 27 pages as result. It showed in Fig. 2.

The proposed system shown in Fig. 1 can be used for optimizing the results of the meta-search engine. For this purpose, the table is given below, consisting of four search engines and five documents from each search engine (Fig. 3; Table 1).

Fig. 2
figure 2

Meta-search result for the keyword engineering

Fig. 3
figure 3

Search engines participation in the top 10 of the search result

Performing linear search on meta-search result gives the best document out of all other documents or search results. From those best documents, semantic search gives the document that more relevant information. Table 2 shows that final optimized result almost similar to Google search result also final result contain document from other search engines DuckDuckGo and Yahoo.

Table 3 shows the similarity score of each search engine with meta- search engine result. Search engines search result similarity increase if the search query is more popular for example above table shows some sample keyword and matching of each search engine result with the final optimized result. For the keywords in the IT technology, all four search engines results are almost the same.

Table 1 Search result for the keyword engineering
Table 2 Document position for keyword engineering of proposed approach
Table 3 Result obtained for different keywords
Fig. 4
figure 4

Comparison of the ranked list results

As we can see in Fig. 4, Google has the highest participation in search engine result. Because, final meta-search engine result contains 30 percentage of Google search result. However, DuckDuckGo and Yahoo has almost equal proportions of participation. Bing has the lowest participation among all other, and the reason behind that Bing search result does not match with any other search engine result. Figure 4, shows the ranking of each search engine, Google web page ranking order of 60 percentage similar to final meta-search engine result followed by DuckDuckGo and Yahoo. From this experiment, we can rank these search engines (first Google, second DuckDuckGo, and third Yahoo).

The experimental results shown in this paper are average of testing different search query. It is highly possible that the most significant document with respect to the user query is located in the top of meta-search engine result.

6 Conclusion

In this paper, we presented a novel approach to meta-search engine result optimization algorithm that combines two techniques, linear search, and semantic search. According to the experimental results, we can see that the proposed system produces a well-optimized search result list. In future, we need to integrate multi-threading technique to our system, that will speed up the processing and save time. Multi-threading technique can put the programs that need time for processing background. So, we can do the semantic search on multiple web pages at the same time, and it will help the system to run so faster and deliver the best search result.