1 Introduction

With the rapid development of science and technology, the booming information has become a major feature of today’s society. At the same time, modern tennis training technology has been widely disseminated. Computer technology, modern communication technology, and artificial intelligence technology have been widely used in tennis training [1]. This has greatly promoted the development of the theory and practice of tennis training and optimization. Tennis training must take full advantage of modern tennis training techniques according to its own characteristics to improve the quality of training and improve the quality of athletes. Serving technique is one of the key technologies in tennis. In training, it is also one of the most difficult techniques to master. In the traditional tennis training, the traditional training mode is generally used to train, that is, coaches explain the demonstration, the athletes practice on the venue, and the coach conducts individual guidance, finds easy-to-make mistakes to correct, and then allows athletes to personally practice the exercise, so that athletes do not have a correct comparison standard, and it is difficult to fundamentally understand their own mistakes [2]. It is difficult to form movement representations in the brain, which is not conducive to athletes’ mastery of serving techniques.

Video image analysis technology is a high-end intelligent system that comprehensively uses image data synthesis technology to realize the real-time analysis of tennis tactics and simultaneously outputs the analysis results. The system can analyze real-time images of tennis match scenes or training scenes and has many functions such as statistical techniques and tactic data, linked motion data and images, and playback of motion processes. It is a combination of computer image analysis technology and competitive sports. Video image analysis technology, as the most remarkable achievement in the training of tennis in the century and the crystallization of human wisdom, has become the main component of tennis training science and modern tennis training techniques [3]. It has a very positive significance for breaking through the traditional tennis training concept. Tennis technical training requires athletes to have a lot of observations, imitations, feedbacks, and corrections in the process of forming athletic skills. Sensory information other than proprioception is needed, especially audiovisual information. Traditional training is difficult to implement these processes, and video image analysis technology fully demonstrates technical advantages in demonstrating the breadth, integrity, vividness, and detail of practical content and the key points and difficulties in certain training materials. The details, which are clearly presented to the athletes in words, voices, images, animations, etc., can provide effective help to the coaches. For athletes, their lively displays are more attractive.

2 Related work

Science and technology are playing an indispensable role in today’s society and their importance is even more prominent in sports. In tennis day-to-day teaching and training, the effects of coaches’ visually visible training mode on athletes’ competitive level have been insignificant since ancient times, and the dominant position of sports science and technology in sports training has been increasingly revealed. Therefore, how to introduce sports science and technology into sports training and continuously improve the effectiveness and scientific nature of sports training has become an important subject worth studying at present. On this basis, this paper proposes the application of video technology in the analysis of tennis serve speed and success rate. As we all know, the human eye is an important external manifestation of the visual system. It is also an important organ for people to obtain information from the outside world. Through the human visual system, it can obtain external, direct information and leave the most intuitive impression. Usually, this activity needs to rely on the static image or the dynamic video media to achieve [4].

For a long time, the path of scientific training in China’s competitive training has become increasingly difficult, and there has been a clear gap compared with other countries with developed science and technology. At present, some authoritative experts and coaches at home and abroad have come to a conclusion after long-term research and practice, that is, in the process of athletes completing their actions, they record the athlete’s movements with the camera, obtain the athlete’s action videos through computer software, and perform processing to make the whole action more detailed. The image or data information represented by the video processing result is the most authentic portrayal of the athlete in the training process [5]. There is no other interference factor, and the technical movement of the athlete’s technical movement can be directly assessed, and the technical index can be evaluated to find the physical characteristics of the athlete and expertise and found that it is training in the existing deficiencies and shortcomings, avoiding weaknesses, in order to achieve the purpose of improving sports technology and inspire athletes to learn interest. At the same time, the introduction of technical videos in sports programs not only is an analysis of athlete skills, but also provides a basis for controversial penalties in competitions [6].

Video analysis systems have been introduced in tennis games: Hawkeye technology and speedometers. Among them, Hawkeye technology records and plays back the content of the disputed ball between the player and the referee in the game, and provides evidence for the punishment. The speedometer is also calculated by the distance the ball is recorded in the unit of time. Like the speed of the ball, these are all accomplished using video image analysis techniques [7]. Video analysis technology is an extremely advantageous teaching tool in the process of tennis teaching. It can be applied to the acquisition of skills, the engraving of technical movements, the visualization of everything, the prevention of injuries, and the training of coaches. In fact, when they see their technical moves or performances on the video, the results are very different. Obviously, there is an error between ideals and reality. Video technology is an excellent helper to correct this illusion of motion [8]. For athletes, it is very important to understand what is a correct and reasonable technical movement. Video analysis technology can assist the players in the implementation of new technical movements. This study uses the sports training video analysis system in combination with the teaching of tennis special classes in the China University of Mining and Technology and uses the system’s convenient and timely feedback function, video contrast playback, background similar video overlay, motion picture decomposition, and other functions to serve tennis special class students. Prepare gestures, turn shoulders, bend knees, accelerate rackets, hit the ball, swing with the ball, and close the action to analyze and count the technical movements. Try to grasp the technical movements of the special class as a whole and perform intensive training to correct mistakes. The technical movements provide a reference [9].

Since tennis is really at rest only when it is serving, serving is the only action that can be fully controlled by the player. At the same time, serving is also considered to be the most difficult one. It requires a series of very complex actions to be performed at the right time. The rotation of the upper arm before the impact of the racket and ball has a contribution of up to 54% to the linear swing speed at the time of impact. The athlete’s exercise data in training and competition can be used to help understand the physical condition and competition skills in the competition. Therefore, these data are very important for coaches and sports workers. Video shooting was used to study high-speed serve. In the study, two synchronized high-speed cameras were used to collect data at 200 Hz. During the entire serving process, the two cameras take pictures from the front and side of the athlete, respectively, and the shooting range covers the entire range of the athlete’s activities. In this study, the inertial gyro method and the marker-based virtual gyro method (abbreviated as MBVG) were used for the monitoring of the upper arm rotation when serving a tennis ball, and the monitoring results were compared with the results obtained by the video shooting method. The MBVG method can also be used to confirm the peak part of the gyroscopic data. It does not need to give accurate results, only rough changes are required.

Girard et al. studied the effect of knee motion on tennis ball-shot serve by limiting the extension of the knee during the serve. During the experiment, thirty athletes were divided into three groups: beginner, intermediate, and elite athletes. A splint was placed on the thigh to extend the knee at an angle of 10° (0° indicates full knee extension), and each person performed 30 flat serves, with normal (without knee restraint) and 15 knees restricted. The experimental results show that knee exercise is an important factor affecting the efficiency of the serve and has nothing to do with the athlete’s level.

3 Methods

3.1 Background difference method

In this study, some young elite tennis players (secondary athletes and above) were selected during the study. Actual combat data served during training and competition was used as a research object. A team of 20 male students from a sports college tennis special class was selected as the research object, and the students were randomly divided into the experimental group and the control group.

As can be seen from Table 1, the average height of students in the experimental group was 177.2 cm, the average age was 21.4 years, the average arm length was 73.16 cm, and the average weight was 65.63 kg. The average age of the control group was 21.9 years, the average height was 176.8 cm, the average body weight was 64.7 kg, and the average arm length was 72.6 cm. The average age of excellent tennis players is 15.5 years old, the average height is 175.1 cm, the average weight is 64.4 kg, and the average arm length is 73.2 cm. There was no significant difference in the average height, average arm length, and average weight of experimental group students, control group students, and high-level athletes [10, 11].

Table 1 Comparison of the average physical condition of experimental group students, control group students, and high-level athletes

Background difference method is a relatively common method for detecting moving objects. For example, most video surveillance and intelligent traffic systems are based on background difference techniques. Because there is a difference between the current frame and the background model, the part where the current frame and the background frame are similar is taken as the background, and the part with the larger difference is defined as the foreground [2, 12]. The basic principle is as follows: extract the static background from the video sequence and then use the difference between the current frame and the background to get the motion foreground. In a static scene, the background model may be captured in advance without foreground moving objects or noise [13, 14]. Differentiating the current image frame from the background reference model, determining the moving target area by statistically changing the information in the histogram, or determining the change in the grayscale feature, and finally performing the threshold determination and calculation of the difference result image. Therefore, we can know the size, position, shape, and other relevant information of the sports foreground target. Figure 1 shows the background difference method schematic.

Fig. 1
figure 1

Background difference method schematic

The background difference method can effectively extract the target of the motion. Given the background reference model, the background difference method is an efficient detection method for moving targets. The most common way to initialize the background model is to extract a certain frame image directly from the image sequence or to calculate the average value of the multi-frame image.

$$ B\left(x,y,t\right)=\frac{1}{N}\sum \limits_{K=t-N}^{t-1}I\left(x,y,k\right) $$
(1)

In which, B(x,y,t) denotes the background pixel value at the position of (x,y) at time t, I(x,y,k) denotes the image information of the kth frame, and the background is taken from the previous N frame images. The advantage of this method is that it is simple and can extract complete image information features, but is more susceptible to occlusion, shadows, and changes in light. Markov random field can solve the problem of occlusion, but for obvious occlusion, the effect is still relatively poor. In order to reduce the impact of dynamic scene changes on the extraction of moving objects, the simplest and most effective time-averaged image can be used to create a background model [15, 16]. For example, Haritaoglu uses the minimum and maximum gray values and the largest time difference value for image sequence scenes. All the pixels in the statistics are used to create a background model that can adapt to changes in climate and light [17, 18].

The background difference algorithm process is to store the background image first, and then use the difference image between the current image and the background image to perform motion detection on the target. fk(x, y) is the current frame, Bk(x, y) is the background frame of the current frame, and Dk(x, y) is the result of the difference between the current frame and the background frame. The principle is as shown in the following formula.

$$ {D}_k\left(x,y\right)=\left|{f}_k\left(x,y\right)-{B}_k\Big(x,y\Big)\right| $$
(2)
$$ {T}_k\left(x,y\right)=\left\{\begin{array}{l}1,{D}_k\left(x,y\right)\ge T\\ {}0,{D}_k\left(x,y\right)<T\end{array}\right. $$
(3)

When Tk(x, y) is 0, it represents the background in the image; when Tk(x, y) is 1, it represents the moving target area in the image. T is the threshold value. When the difference image point value is greater than the set T value, we think this point is a point on the moving target; otherwise, it is considered as the image background point.

Since the target detection needs to perform differential processing on the image to be detected in each frame of the video and the background image model, the modeling method of the background model is very important for the accuracy of this method, and the accuracy of the model directly affects the target motion detection result [19]. The background difference method usually requires that the background model does not produce drastic changes. However, the background model is not absolutely immutable. Sometimes it is necessary to update the background model in time to ensure the correctness of the moving target detection. For example, video noise, changes in illumination, and the transition between objects’ motions will require the background model to be updated in time [20].

3.2 Optical flow method

Gibson first proposed the concept of optical flow in 1950. Optical flow refers to the apparent motion of the image brightness pattern in an image sequence. A moving object will leave behind a series of constantly changing images on the retina of the naked eye, so objects in motion will be discovered by human eyes. The basic idea of the optical flow detection method is that according to the image information of the current frame and its subsequent frames, each pixel in the image is assigned a velocity vector to establish a two-dimensional motion field. When there are moving objects in the image, the optical flow vectors generated by the pixel points of the target area and the neighboring background in the image must be different. The target detection can be achieved by determining the optical flow vector of the pixel in the image.

Since a two-dimensional image is a projection of a three-dimensional object motion on a camera, we can use a two-dimensional image sequence to record three-dimensional motion information of an object in a real space. When there is a relative movement between the camera and the object, a corresponding change will occur. From the change between the images, the mutual motion between the object and the camera can be known.

In Fig. 2, Z is the object distance between the center of the camera lens and the moving object, f is the focal length, ri is the direct distance between the image point and the center of the lens, and ro is the direct distance between the object point and the center of the camera lens. The basic equation of the optical flow method is based on the assumption that the image gray is constant, that is, the pixel gray value of the object in the same position remains unchanged in two adjacent frames of the video sequence [12]. In 1981, Horn and Schunck derived the basic equations of optical flow based on this assumption. The gray value of the pixel (x,y) at time t is I(x,y,t). At time t + δt, the pixel moves to (x + δx, y + δy) and the pixel gray value is I(x + δx, y + δy). According to the previous assumptions:

$$ I\left(x+\delta x,y+\delta y,\mathrm{t}+\delta \mathrm{t}\right)=I\left(\mathrm{x},\mathrm{y},\mathrm{t}\right) $$
(4)
Fig. 2
figure 2

Stable performance of the improved algorithm before and after the signal

Formula (4) according to the Taylor formula expansion, in δt0 limit and finishing:

$$ \frac{\partial I}{\partial x}\frac{dx}{dt}+\frac{\partial I}{\partial y}\frac{dy}{dt}+\frac{\partial I}{\partial t}=0 $$
(5)

Order \( u=\frac{dx}{dt},v=\frac{dy}{dt},{I}_x=\frac{\partial I}{\partial x},{I}_y=\frac{\partial I}{\partial y},{I}_t=\frac{\partial I}{\partial t} \), then formula (5) can be turned into:

$$ {I}_xu+{I}_yv+{I}_t=0 $$
(6)

Equation (6) is called the basic equation of the optical flow method. Among them, Ix, Iy, It can be obtained directly from the image. For the aperture problems caused by u, v, and two unknowns, various optical flow calculation methods such as the Horn-Schunck algorithm and Lucas-Kanade algorithm are formed by adding various optical flow constraint conditions.

3.3 Inter-frame difference method

Similar to the background difference method, the inter-frame difference method is also one of the most commonly used algorithms in moving object detection algorithms. The principle of this algorithm is as follows: when the gray level of the image sequence changes slightly, the difference operation is performed using the corresponding pixels of the two or three consecutive frames of the image. If the change of the pixel value of a certain point of the difference image is higher than the threshold, this is considered as the point area and is caused by the motion of the target; if the change in the pixel value of a point in the difference image is lower than the threshold, the point area is considered as the background in the image sequence. Calibrate the motion area of the target in the video and use these calibrations to lock the position of the video target. Using the inter-frame difference method directly or indirectly can remove invalid information between frames of the image sequence data, thereby obtaining the change monitoring target [13].

The two-frame difference method performs differential operations on two successive frames of an image sequence. Then, a binarization threshold decision is made on the difference result image, the static background is eliminated, and the moving target region is selected, thereby marking the moving target. The principle of this method is shown in Fig. 3.

Fig. 3
figure 3

Schematic diagram of the difference between frames

In order to detect an effective moving target, the two-frame difference method needs to satisfy the following condition: the target needs to have a moving speed, the background scene is still while its gray value changes little, other interference noise is small, the target gray value changes relatively large, etc. Due to the influence of noise, the influence of background brightness, etc., these factors will affect the effect of the two-frame difference method image to varying degrees. The algorithm operation process is as follows:

Set Dk(x, y) as the difference result image, the gray values at points (x, y) in the k − 1 frame image and the k − t frame image are fk − 1(x, y) and fk(x, y), and the k − 1 and k-frames are calculated using the following equation (7). Image difference processing, where Dk(x, y) is the resulting image after difference calculation.

$$ {D}_k\left(x,y\right)=\left|{f}_k\left(x,y\right)-{f}_{k-1}\Big(x,y\Big)\right| $$
(7)

By using the following equation (8), Dk(x, y) is the threshold to detect the background and the moving target.

$$ {T}_k\left(x,y\right)=\left\{\begin{array}{l}1,{D}_k\left(x,y\right)\ge T\\ {}0,{D}_k\left(x,y\right)<T\end{array}\right. $$
(8)

In which, T is the threshold and Tk(x, y) is 1, this point represents the target motion area in the image. The detection formula reflects the accuracy of the target change location depends on the threshold selection during the threshold calculation process.

The two-frame difference method is the same as the background difference method. It achieves a simple program with low complexity and good robustness. In the case of dynamic background, it is more adaptive than other algorithms. Unlike the background difference method, it does not establish a background model, which saves a lot of processing calculations and eliminates the errors generated by the model. However, this algorithm also has some disadvantages. First, when the target motion speed is too fast, the moving target is easily missed. Second, only part of the target-related motion information can usually be detected, and thus the interior of the target will appear hollow, thus the connectivity of the moving target in the image. Finally, if a slight gray level change in the background area is misjudged as a change in the target, such interference noise will cause noise points in the detection target.

In summary, the disadvantages of the two-frame difference method include the fact that the target speed is too fast and it is easy to miss the inspection. It will cause the interior of the target to have a void affecting the connectivity, and the background region’s micro-motion will increase the noise interference; the background difference method has disadvantages that are difficult to be in a dynamic scene. Under the circumstances, it is difficult to detect the target; the disadvantage of the optical flow method is high complexity and large amount of calculation, so it is difficult to achieve. The three-frame difference rule is to improve the inter-frame difference method. In the three consecutive frames of images, first use the classic inter-frame difference method to perform motion detection on the first two frames and the last two frames respectively; secondly, two results are obtained. The image is cross-referenced to the common part, which is considered to be the target area of the second frame. The principle is shown in Fig. 4.

Fig. 4
figure 4

Three-frame difference method schematic

Suppose three frames of an image, k − 1, k, and k + l, at the point (x, y) gray values are fk − 1(x, y), fk(x, y), and fk + 1(x, y), according to the principle of Fig. 5, the difference calculation is as follows:

$$ {D}_{k-1,k}\left(x,y\right)=\left|{f}_k\left(x,y\right)-{f}_{k-1}\Big(x,y\Big)\right| $$
(9)
$$ {D}_{k,k+1}\left(x,y\right)=\left|{f}_{k+1}\left(x,y\right)-{f}_k\Big(x,y\Big)\right| $$
(10)
Fig. 5
figure 5

Athlete joint point capture

An appropriate threshold T is selected to threshold the result image to obtain a binarized image.

$$ {T}_{k-1,k}\left(x,y\right)=\left\{\begin{array}{l}1,{D}_{k-1,k}\left(x,y\right)\ge T\\ {}0,{D}_{k-1,k}\left(x,y\right)<T\end{array}\right. $$
(11)
$$ {T}_{k,k+1}\left(x,y\right)=\left\{\begin{array}{l}1,{D}_{k,k+1}\left(x,y\right)\ge T\\ {}0,{D}_{k,k+1}\left(x,y\right)<T\end{array}\right. $$
(12)

The above-processed images Tk − 1, k(x, y) and Tk, k + 1(x, y) are logically intersected with each other to obtain the final target result set Tk(x, y).

$$ {T}_k\left(x,y\right)=\left\{\begin{array}{l}1,{\mathrm{T}}_{k-1,k}\left(x,y\right)\cap {\mathrm{T}}_{k,k+1}\left(x,y\right)=1\\ {}0,{\mathrm{T}}_{k-1,k}\left(x,y\right)\cap {\mathrm{T}}_{k,k+1}\left(x,y\right)\ne 1\end{array}\right. $$
(13)

In the same way, the point at which Tk(x, y) is 1 represents the target area of motion.

3.4 Mixed Gaussian background modeling

Mixed Gaussian background modeling is a background description method based on statistical information of pixel samples. It uses statistical information such as the probability density of pixels (such as the expected and standard deviation of each mode, the number of modes). Long time sample values represent the background. Then, the target pixel is determined by statistical difference method (such as 3δ principle, δ is standard deviation). Complex dynamic backgrounds can also be modeled, but the amount of computation may be large.

In the mixed Gaussian background model, the color information between the pixels is defined as not related to each other, and the processing of each pixel point is also independent of each other. In the video image, the change of the value of each pixel in the image sequence can be seen as a random process of continuously generating pixel values, that is, a Gaussian distribution network can be used to represent the regularity of the color of each pixel. For the multimodal Gaussian distribution model, first, assign different weights to each pixel in the image and then make multiple Gaussian distributions superimposed by different weights to build the model. Each Gaussian distribution and one pixel may generate the rendered color state corresponds. Over time, the weights and distribution parameters of each Gaussian distribution will be continuously updated. When processing a color image, it is assumed that the RGB three color channels of the pixel point are independent of each other and the variance is the same. Mixing Gaussian background modeling in order to describe the state of a pixel at a certain moment, K Gaussian models are created for this pixel. The mixed Gaussian distribution probability function is as follows:

$$ p\left({X}_t\right)=\sum \limits_{t-1}^K{\omega}_{i,t}\times \eta \left({X}_t,{\mu}_{i,t},\sum i,t\right) $$
(14)
$$ \eta \left({X}_t,{\mu}_{i,t},\sum i,t\right)=\frac{1}{{\left(2\pi \right)}^{\frac{n}{2}}{\left|\sum i,t\right|}^{\frac{1}{2}}}{e}^{-\frac{1}{2}{\left({X}_t-{\mu}_{i,t}\right)}^T\sum i,{t}^{-1}\left({X}_t-{\mu}_{i,t}\right)} $$
(15)

in which Xt is the pixel value of a pixel at time t, K is the number of Gaussian models, and μi, t and ωi, t represent the mean and weight of the ith Gaussian model at time t. η(Xt, μi, t, ωi, t) represents the probability density function, ∑i, t is the covariance matrix of the Gaussian model, where i = 1, 2…k.

The detailed algorithm flow is as follows:

  1. (1)

    According to Eq. (16), each pixel value Xt is compared with the current K models until a distribution model matching the new pixel value is found, i.e., the expected deviation from the model is within 2.5δ.

$$ \left|{X}_t-{\mu}_{t,t-1}\right|\le 2.5{\sigma}_{t,t-1} $$
(16)
  1. (2)

    If the matched pattern matches the background requirement, the pixel belongs to the background; otherwise, it will belong to the foreground.

  2. (3)

    The weight of each mode is updated according to Eq. (17), where α is the learning rate, and for pattern Mk, t = 1 that matches successfully; otherwise, Mk, t = 0, and then the weights of the modes are normalized.

$$ {w}_{k,t}=\left(1-\alpha \right)\ast {w}_{k,t-1}+\alpha \ast M $$
(17)
  1. (4)

    Mean and standard deviations that do not match the pattern of success remain unchanged, and the parameters for the matching pattern are updated according to the following formula.

$$ \rho =\alpha \ast \eta \left({X}_t\left|{\mu}_k,{\sigma}_k\right.\right) $$
(18)
$$ {\mu}_t=\left(1-\rho \right)\ast {\mu}_{t-1}+\rho \ast {X}_t $$
(19)
$$ {\sigma}_t^2=\left(1-\rho \right)\ast {\sigma}_{t-1}^2+\rho \ast {\left({X}_t-{\mu}_t\right)}^T\left({X}_t-{\mu}_t\right) $$
(20)
  1. (5)

    If no pattern matching is successful in (1), the pattern with the least weight will be updated. The mean of the pattern is the current pixel value, the standard deviation is replaced by the initial large value, and the weight is updated to a smaller value.

  2. (6)

    The patterns are arranged in descending order, with the pattern of heavy weight and small standard deviation placed first.

  3. (7)

    Select the first B patterns as the background, B satisfies the formula (21), and T represents the proportion of the background.

$$ B=\arg \left(\min \left(\sum \limits_{k=1}^b{w}_k>T\right)\right) $$
(21)

In which, T represents the proportion of the background, and by setting the value of T, the best background pixel can be selected.

Among these methods, the background difference method is suitable for situations where light and shadows do not change significantly, and the background modeling and updating process is more complicated; the optical flow method is extremely sensitive to noise and has a large amount of calculation, but the camera is in the process of shooting. The motion and sloshing do not affect the detection results, and the robustness is strong. The inter-frame difference method has a simple operation and strong anti-interference. However, when the foreground part of the scene stops moving, it cannot detect the complete foreground, the sampling frequency, and prospect. The speed of the target will also affect the detection effect. Mixed Gaussian background modeling has strong robustness to the dynamic changes of the scene, but is sensitive to changes in illumination.

4 Experiment

Variability is an inherent feature of human motion. It cannot be accurately repeated twice by human eyes. This chapter marks the joints of the delivery arm with marked points and obtains the tennis serve video through a high-speed camera. In the process of video processing, the video image sequence is first denoised, and then the foreground of the movement is acquired through the target detection algorithm. Perform coordinate extraction and analyze the trajectory of the marker points. With the analysis of the movement trajectory and serving data of the marked points, the best hitting point of the tennis is predicted. The use of video image processing technology to improve the quality of training will help athletes better master motor skills and improve training efficiency.

The collected tennis ball leveling serve video was analyzed by video image technology. When the noise interference is filtered out, median filtering, wavelet denoising, and sparse denoising are used for comparative analysis. Then, the mixture of Gaussian background modeling is used to extract the foreground of motion, and the foreground is further analyzed and processed to obtain the coordinate information of the three markers. Three stages of the tennis ball serve are selected: throwing the ball, backward swinging, and hitting the ball. Through data analysis, a range of best hitting points was obtained, and hitting the ball within this range can improve the accuracy of serving.

The various joints of athletes in the actual movement will be effectively captured, so that to have the corresponding joint movement process. Figure 6 is an example of utilizing this method to position the joint node of athletes. During the intense movement, each joint of the player will be effectively captured, without leaking any point.

Fig. 6
figure 6

Results of tennis singles

It can be seen that net ball and competition have become world competitions and sports events. In China, the main tennis events are the Chinese Open Tennis, Shanghai Masters, and so on. At present, competitive tennis has been treated as an important project in the development of competitive sports in China. Therefore, it is an important way to improve the competitive level of tennis in China by strengthening the scientific research on tennis competition, probing into the winning factors and rules of tennis competition, and providing scientific and technological support for competitive tennis. It is a subject worthy of further study as shown in Fig. 6.

The corresponding coordinate system is established with the left elbow joint point as the origin, and then the tracking of the joint point can be realized. In the actual application process, the coordinate transformation of the joint point can be obtained by combining the coordinate transformation of the joint point. On this basis, we can get the corresponding joint point coordinates, even the effective analysis of the whole joints, and thus draw the corresponding conclusions (Fig. 7).

Fig. 7
figure 7

Example of dynamic tracking of human joint node model

The main function of the tennis player information capture system is to collect the information of the athletes during the actual movement and carry out the corresponding data analysis. It can be observed from Fig. 8 that the system will transmit the corresponding information of the athletes to the computer information processing center module in the actual analysis. Then, the information can be integrated and processed. During the actual analysis, it is necessary to combine the training process of athletes. For this reason, the structure has facilitated the terminal display module for athletes to have simulating exercises, which could capture the movement process through the high-definition camera equipment and draw on the corresponding conclusions. In this regard, the cycled information collection system of the process of athletes can be achieved, promoting the athletes’ skills. The simulation results are shown in Fig. 8.

Fig. 8
figure 8

Simulation results

5 Results and discussion

Sports analysis based on video image processing is a research hotspot and a difficult point in the field of computer vision. It detects moving objects from video sequences, extracts key parts of the human body, and obtains useful information for human movements to achieve human movements, postures, etc. The essay made predictions on the best hitting point for tennis. We color-coded the joints of the serving arm, collected the tennis ball video by a high-speed camera, and used the coordinate of the point in each frame instead of the joint point coordinate to study the trajectory of the arm when the ball was served. During the video processing, in the process, the interference of the environment with the mark color should be avoided, blending Gaussian background modeling for motion foreground extraction. After obtaining the motion foreground, the marker points are extracted by the color features; then, the binarization operation is performed on the marker points; then, the contour search is performed, the outline is surrounded by the minimum circle, and the returned center coordinates are the joint point coordinates. Through the analysis of the trajectory, we find that the arms have periodical mathematical characteristics in the process of serving and find the inherent characteristics of the best ball hitting point in the trajectory analysis of each set of serving movements. For the prediction of the best shot point, the efficiency of serving can be improved and the purpose of auxiliary training can be achieved.