1 Introduction

Tech United Eindhoven is a RoboCup team of Eindhoven University of Technology. Our team consists of PhD, MSc and BSc students, supplemented with academic staff members from different departments. The team was founded in 2005, originally only participating in the Middle-Size League (MSL). Six years later service robot AMIGO was added to the team, which since also participates in the RoboCup@Home league. Knowledge acquired in designing our soccer robots proved to be an important resource in creating a service robot [3].

This paper describes our major scientific improvements over the past year which helped us to become the winner of RoboCup 2014. First we introduce our current robot platform, followed by a description of the robot skills we have improved (we will focus on accurate shooting). Hereafter we describe our progress in strategy and human-robot interaction and lastly the advancements in a new four-wheeled soccer robot platform we designed in collaboration with an industrial partner.

Many of the points of improvement described in this paper are a direct result of rulechanges. In 2012 the mid-line passing rule was introduced, which was a large boost for the league in terms of stimulating smart team-play. Enforcing teams to make a pass before scoring provides an interesting academic challenge, but it also makes the matches more fun to watch for spectators.Footnote 1 \(^{,}\) Footnote 2 Rule-changes for RoboCup 2014 limited continuous dribbling distance, allow robot coaching along channels that are natural to human beings and replace the mid-line passing rule by a more general ‘pass before scoring’ rule. The combination of these rule changes and introducing human-robot interaction to the middle-size league (Sect. 4.2) and moved the competition towards an even higher level of multi-agent coordination (Sect. 4).

2 Robot Platform

Our robots have been named TURTLEs (acronym for Tech United RoboCup Team: Limited Edition). Currently we are employing the fifth redesign of these robots, built in 2010, together with a goalkeeper robot which was built one year later (Fig. 1).

Three 12 V Maxon motors, driven by Elmec Violin 25/60 amplifiers and two Makita 24 V, 3.3 Ah batteries, are used to power our omnidirectional platform. Our solenoid shooting mechanism, powered by a 450 V, 4.7 mF capacitor, provides an adjustable, accurate and powerful shot [4]. Each robot, except for the goalkeeper, is equipped with an active ball handling mechanism, enabling it to control the ball when driving forwards, while turning, and even when driving backwards [1]. As said before, we aimed on improving passing abilities and conducted experiments on directly catching lob balls with our ball handling system (Sect. 3.1.1).

To acquire information about its surroundings, the robot uses an omnivision unit, consisting of a camera focussed on a parabolic mirror [2]. An electronic compass is implemented to differentiate between omnivision images on our own side versus on the opponent side of the field. We also added a kinect sensor to each robot. A detailed list of hardware specifications, along with CAD files of the base, upper-body, ball handling and shooting mechanism, has been published on a ROP wiki.Footnote 3

Fig. 1.
figure 1

Fifth generation TURTLE robots, with on the left the goalkeeper robot.

To facilitate data-acquisition and high-bandwidth motion control, the robots are equipped with EtherCAT devices provided by Beckhoff. These are connected to the onboard host computer via ethernet. Each robot is equipped with an industrial mini-pc running a preemptive Linux kernel. The software is automatically generated from Matlab/Simulink models via the RTW toolbox, recently renamed to ‘Simulink Coder’. In order to allow asynchronous processing we have created a multitasking target for Simulinks code generation toolchain.Footnote 4

Software for our robots is divided in three main executables: Vision, Worldmodel and Motion. On-board and robot-to-robot they communicate via a real-time database tool made by the CAMBADA team [5]. The vision module provides a localization of ball, obstacles and the robot itself. Hereafter the worldmodel combines this information with data acquired from other team members to get a unified representation of the world. While vision runs at 60 Hz and worldmodel at 20 Hz, motion contains the controllers for shooting, ball handling en driving. Therefore it samples at a much higher rate (1000 Hz). On top of the controllers, the motion executable also contains strategy and pathplanning, partly implemented as a subtask running at a much lower sample rate.

3 Improved Skills

Considering the rule changes in the middle size league, it is likely that passing and catching will become increasingly important compared to dribbling. During RoboCup 2014 in Brazil, this was indeed the case. In the section below, we will describe how we prepared for RoboCup 2014 by improving our accuracy for flat passing and by increasing our abilities to accurately catch and shoot a lob ball. The latter is not only beneficial for passing but also for shots at goal.

3.1 Shooting

The electrical scheme of our kicker consists of a battery pack charging a capacitor via a DC-DC converter (Fig. 2). Once fully charged, in roughly 20 s, the capacitor can be discharged via an IGBT switch, creating a pulse-width modulated signal. The energy of the capacitor drives a solenoid actuator connected to a mechanical transmission (a shooting lever). The lever can be adjusted in height to allow for lob- and flat shots.

Fig. 2.
figure 2

Schematic overview of our shooting system. One half of the plunger is made of a non-magnetic material, the other half consists of a soft-magnetic material.

3.1.1 Shoot Lob Balls

To accurately shoot lob balls, the shooting system needs to be calibrated. Preferably we do this under conditions as close as possible to the conditions our robots face during the matches, i.e., on the official field with the same ball that will be used for competition. But during a tournament, testing time on the field is limited. Therefore our approach was to simply put the robot at the maximum distance it could take a lob shot from during a game, tune the PWM duty cycle until the ball lands exactly in the goal, and store the resulting duty cycle value. By linear interpolation between zero and the duty cycle we obtained during calibration, we could shoot from any spot within shooting range. The same calibration was used for all robots.

Although the above method is fast, it is also inaccurate. The relation between shooting distance and required duty cycle is non-linear, and since each robot has its own mechanical and electrical components, each robot has its own shooting characteristics. Therefore, calibration of each robot individually would be better.

For RoboCup 2014 we designed and implemented a tool to quickly do robot-dependent calibration. Furthermore, empirically we identified the relation between the shooting distance (x) and the required duty cycle (u) is exponential for a lob-shot (Eq. 1). Parameters a, b and c are robot-dependent parameters. They have to be obtained by measuring the travelled distance for multiple duty cycles. To make a correct fit at least four measurements are required, though more are preferred.

$$\begin{aligned} u = b^{-1}\mathrm {ln}(a^{-1}(c-x)) \end{aligned}$$
(1)

3.1.2 Catch Lob Balls

During the technical challenge of RoboCup 2013 we showed an initial attempt to shoot and catch lob passes. In terms of catching the ball, our approach there was to simply wait until the ball bounces were low enough to simply intercept it as if it were a flat pass. Building on these first tries, this year we worked on a much more challenging lob pass approach, where we use our current ball handling system to grab the ball exactly when it hits the ground after a flight-phase. We call this coordinate the point of intercept (POI).

The teammate shooting the lob ball communicates to the receiving robot, where the ball is expected to land (the feedforward position, FFP). When consecutive bounces are taken into account, multiple FFP’s exist (example in Fig. 3). Each of them has a certain inaccuracy, for now modelled as a circle around the point itself. Based on the estimated time to reach each of the FFP’s, the receiving robot drives towards one of the feedforward points when a lob ball is expected, but not actually shot yet.

Once the ball is in-air, a kinect camera mounted on the receiving robot is used to measure the ball position. Based on these observations, a simplified ball model, without drag and spin, predicts the ball trajectory. The receiving robot will respond to this ball-tracking based POI prediction, but only if it is located within the uncertainty circle. In case the estimated POI is located outside the circle, the robot will wait at the edge of the circle.

Fig. 3.
figure 3

Lob ball intercept strategy, the receiving robot chooses one of the points of intercept.

3.1.3 Shooting-Lever Velocity Feedback Control

Similar to what we described for lob shots, currently our control for flat shots and for flat passes is fully based on feedforward. As said before, many disturbances are robot-, ball- or field-dependent. Feedback control would allow to compensate for those.

Fig. 4.
figure 4

Shooting lever end-effector for more accurate passing.

We are using an encoder mounted on the rotational joint of the shooting lever (Fig. 2) as a feedback signal for velocity control. For full-power shots the end-effector of the shooting lever is pushed into the ball almost entirely before the ball itself even starts to move.Footnote 5 Using lever angular velocity as a feedback signal to control the resulting ball velocity would be hard in this case, because it is hard to exactly predict the dynamic behaviour of the deformed ball.

For slow shooting on the other hand, it is possible to make the lever and ball move as one body before the ball leaves the robot. Especially for passing, being able to accurately control ball-velocity would be of great help.

What was particularly challenging was the limited time one has available (a shot takes between 20 and 50 ms) and the limited spatial resolution of the encoder (130 ticks over the entire shooting lever stroke). Furthermore the solenoid actuator can only push in a single direction, therefore no overshoot is allowed (Fig. 4).

4 Improved Strategy

Our strategy takes into account the estimated positions of all peers and opponents, represented in a worldmodel. We developed a method to also use velocities and estimated game state to assess the feasibility of various tactical actions (plans). Instead of instantaneously seeking the free space on the field.

As a first step in moving to a more plan-based level of cognition, we have created a skill-selector, which we will describe in the upcoming section. Further we worked on in-game optimization of decision making in refbox tasks, either via human coaching (Sect. 4.2) or via machine learning (Sect. 4.3).

4.1 Skill Selector GUI

In our strategy, first we assign a unique role to each of the robots. Every role contains a number of actions/skills which can be executed during play. The main attacker for instance has five different skills to choose from: Flat shot, lob shot, pass, dribble and push-attack (i.e., bouncing the ball towards the goal with the side of the robot).

To decide on which skill to use at a certain moment in time, hard-coded conditional statements are evaluated. For the original system, these conditions were solemnly true/false evaluations (e.g., to shoot at goal, there must be a clear path to the goal). They are evaluated in the order they appear in programming and therefore immediately discard all other possible actions. This creates situations for which the TURTLEs do not take the optimal action. In order to solve this problem, a more generic framework for skill-selection has been developed.

In our improved skill-selector framework, for each of the skills the hard-coded conditions are complemented with normalized ranking functions (e.g., while turning towards the goal, the ranking for shooting at the goal will increase). After evaluating all ranking functions the skill selector chooses the skill with the highest overall ranking. In case multiple rankings are the same, the default skill ‘dribble’ will be selected. To make sure the chosen skill consistently ranks higher than the current skill, a hysteresis function has been added.

For debugging and tuning purposes we have created a graphical user interface which visualises skill-selector output for a given game state (Fig. 5).

Fig. 5.
figure 5

Skill-selector visualization (Color figure online).

4.2 Human Coaching

For the world championships 2014 human-robot coaching in our league was allowed. Coaching instructions are intended to pass high-level instructions like ‘shoot more often’, as opposed to low-level commands like ‘shoot now’. As a first step, this year we used qr-codes to tell our robots which predefined play to use, e.g., during a free-kick.

We use a freely available open source library to scan a video stream coming from our robots kinect sensor to scan for qr-codes.Footnote 6 With the maximum allowed qr-code size (i.e., 30\(\times \)30 cm), containing three chars of encoded information, we experimentally searched for the maximum distance for which the code could be scanned. Averaged over 35 trails, using seven different char-combinations, this distance turned out to be 5.1 m (with a standard deviation of 0.29). False positives within the code detection regularly occurred, especially against a non-plain background. But since none of these false positives matched any of the known strings, we could simply keep scanning until a combination of symbols was recognized that was actually grounded in the robots knowledge base.

In any trial of the experiment, if the code got detected, it was recognized within four seconds. Since in the current rules coaching is only allowed during ‘dead time’ between stop and start of a refbox task, we were interested how often a robot could actually get within five metres from the coaching spot and stay there for at least four seconds to receive a coaching instruction. Therefore we looked back at logged data of the final match during RoboCup 2013 in Eindhoven. In total this match involved 58 refbox tasks, 21 of them did not involve direct scoring risk (i.e., at least one of our robots was available to come to the side for coaching). Taking into account constraints on the robots acceleration and velocity, with our current qr-code detection system 17 coach moments would have succeeded.

4.3 Learning Refbox Play Decisions

In the previous section we described a way to do a hard, human-imposed, reset within our robots decision making. On top of these hard resets, we also worked on a basic reinforcement learning algorithm for a more subtle optimization of strategic play-choice during refbox tasks.

A reinforcement learning algorithm is built around actions, states and rewards [6]. Applying this framework to our free-kick strategy, we use six existing refbox plays as our action-space (single kick and shoot, double kick and pass etc.). Based on which opponent we face and the location of the free-kick (state), one play may result in slightly better scoring chances than the others. As a reward function we give high virtual reward for a scored goal, lower reward for a shot attempt, small punishment for loss of ball possession and severe punishment for a goal scored by the opponent (all weighted for time passed after the refbox task start signal).

Within this framework of rewards, states and actions we are able to store an expected reward for each state, based on past experience.

5 Four-Wheeled Platform with Suspension System

Already since our first generation of soccer robots, we have been using a robot base with three omniwheels positioned in a triangle. Such a three-wheeled design makes control easier because, regardless of field irregularities, all of the wheels will maintain in touch with the ground. But disadvantages also exist. Although driving straight forward is the most common direction of acceleration, it is also the direction for which our three-wheeled robot experiences the least traction during acceleration. The robot tends to tilt backwards, putting most of its pressure on the only wheel that cannot be used to transfer a torque to the ground when driving forward. For our current robot-design, traction is the limiting factor in achieving higher acceleration.

Fig. 6.
figure 6

Base structure.

For a four-wheeled base accelerating forward, i.e., in the direction of the ball-handling mechanism, additional pressure is put on wheels that are actively used in acceleration. We worked with an industrial partner to realize a prototype of a four-wheeled robot.Footnote 7 On top of the RoboCup rulebook requirements with respect to weight and size, an additional requirement was created: Without any of the wheels losing contact with the floor, the robot should be able to take bumps of at least 10 mm in any direction while maintaining a ground clearance of 15 mm (Fig. 6).

To meet this latter requirement, a suspension system is needed. In the current prototype design, each of the wheels is equipped with an independent suspension system. Wheels and motor are still directly connected via a gearbox but the combination of the two is connected to the base via a passive spring-damper combination. The prototype of the four-wheeled base is being produced and during RoboCup 2014 the robot played several matches.

6 Conclusions

In this paper we have discussed concrete steps towards more accurate shooting which, together with better ball tracking abilities, will enable passing via lob balls. Also we have presented proof of concept experiments for qr-code based human coaching and for learning algorithms in refbox strategy.

Altogether these improvements helped us to recapture the world title and this progress contributed to a higher level of dynamic an scientifically challenging robot soccer during RoboCup 2014. While at the same time maintaining the attractiveness of our competition for a general audience.