Keywords

1 Introduction

Flick keyboards and QWERTY keyboards are commonly used to input characters on smartphones. However, if they used on smartwatch, they occupy ratio of screen is high because it is need to arrange 20 or more keys on a small screen. In addition, because the key is too small compared to the fingertip, the key surrounding the target key is erroneously entered by the fat finger error [1]. Therefore, to reduce both of screen occupation and input error, number of keys must be reduced.

Using gestures can reduce screen occupancy. But, since all gestures must be learned, it takes time till to start character input if the gestures are complicated. A smartwatch is often used in conjunction with a smartphone. In this case, the smartwatch is used to input short sentences, so that the use time will be short. Focusing on this point, we thought that it is important that the time to learn operations of input system is short.

Voice input is a technique to input characters without using the touch screen. Especially training to use it is not required. Although speech recognition rate has been improved recently, recognition errors still occur. Mistakes in speech are also happen. Because it is very difficult to correct such wrong words with only voice commands, a character input system to correct them is necessary.

In order to create the input system in feature of low screen occupation and easy to use, we enter characters with combination of a few keys and simple gestures. Also, by selecting a key with slide-in, we reduce width of the keys.

2 Related Studies

In FlickKey Mini Keyboard [2] and Flit Keyboard [3], the number of keys is reduced by assigning nine letters to one key. However, since one character is selected by flicking in eight directions, accurate finger movement is required more than a general flick method. Although the number of keys was reduced, the keyboard still occupies more than half of the screen.

ZoomBoard [4] is a QWERTY keyboard with a function to zoom in the touched position. In the initial state, the reduced QWERTY keyboard is displayed but it is too small to choose one key. When touching the keyboard, the area centered of the touch point is zoomed in. The key in this state is large enough to select one with the fingertip. When the enlargement ratio is high, the target key will be off the display due to a slight fluctuation of the touch position. Thus if area for displaying keyboard is small, multi times zooming in is needed. With Callout keyboard [5], when the user touches the reduced QWERTY keyboard, the key existing at the fingertip position is enlarged and displayed in the pop-up window. The ZShift keyboard [5] enlarges the keyboard at the touch position as same as ZoomBoard, and display it in the popup window. Since with CallBoard and ZShift a user can adjust the position of the fingertip using visual feedback, zoom-up of QWERTY keyboard is not needed. These keyboard occupy about half of the screen in actual, because displaying reduced QWERTY keyboard need a certain area.

Gboard [6] is the method to enter one word by one gesture. In this method, execute a gesture connecting the keys for one word over the shrunk QWERTY keyboard. Since the keys are too small, keys detected from the touch position is not reliable. However, by comparing the gesture to the motion patterns saved in the dictionary the word can be assumed. The dictionary for Japanese input is not provided. In addition, in order to adapt this method to Japanese, an additional method to separate long gestures into words is necessary, since Japanese sentences are not separated into words by spaces.

In Shuttle Board [7], 10 keys assigned to each of rows of the table of the Japanese syllabary and 5 keys assigned to each of the columns of the table are displayed. One Hiragana character is entered by determining the row from the position where the finger touched on the screen and determining the column from the position where the finger released. Continuous input of characters is available. Since this method needs to display 15 keys on the screen, each key is too small for reliable tap. Accordingly, the error rate of character input exceeds 10%.

In any of the above methods, the keyboard occupies about half of the screen. In order to input characters comfortably to a smartwatch with a small screen, it is necessary to lower the occupation ratio of the screen by reducing the number of keys.

TouchOne keyboard [8] is a one-line keyboard [9] that uses eight keys arranged around the screen. Minimum keyboard [10] is the similar. Since the number of keys is reduced by gathering three or four alphabets to one key in these methods, it is not possible to fix a character by tapping only one key. So that, keys for one word are typed consecutively, after that the word is estimated by searching the dedicated dictionary with the line of the keys. This method is exclusively for English. No dictionary for Japanese is served. To enter a word that is not in the dictionary, you must specify letters of the word one by one by touching a key and then moving to certain direction.

5-TILES keyboard [11] places five keys horizontally in a row at the bottom of the screen and assigns 5 to 6 alphabets to a single key. In character input, you touch a key, move your fingertip to another key as necessary, and release your finger from the screen. The input character is selected by the combination of the touch position and the release position. Since it is necessary to divide the width of the screen into five equal parts in order to arrange five keys in the horizontal, the width of each key becomes about 5 mm for smartwatches. Therefore, fat finger errors cannot be avoided.

The methods based on handwritten character input do not use keyboards. Google Handwriting Input [12] and 7notes [13] are such applications for smartphones. Ana-log Keyboard for Android Wear [14] works on smartwatches. However, since the screen of the smartwatch is small, the area where the user writes the characters is one-character size. As characters are written contiguously in the same place, a method segmenting the input strokes into strokes of each character is needed.

If one character is input by one continuous gesture, characters can be separated at the ends of the gestures. In EdgeWrite [15, 16], such gestures which is designed based on the stroke order of the alphabet are used. EdgeWrite for Japanese Katakana [17] is also researched. However each stroke of the EdgeWrite must be connect two corners of the square input area. And all strokes of a character must be joined to be one continuous gesture. Thus, the gestures are different to the motions of the pen when a person writes characters. As a result, training for memorizing gestures is needed before starting character input.

3 Proposed Method

3.1 User Interface at Stand-by Status

The purpose of this research is developing a character input method for Japanese in feature of low screen occupation and easy to use. From the viewpoint of ease of use, it is better to use the key, as you can know what will be entered from the letters written on the key tops. However, from the viewpoint of screen occupation, number of keys must be a few. This is because the lower limit of key size required for reliable touch is 7 mm2 [18]. By combining gestures, the number of keys can be reduced. If the gestures are simple and the number of them is small, you can immediately memorize the gestures. So, we decided to enter a character by selecting a key with a tap or a simple stroke.

In order to down the screen occupancy, keys displayed on the screen at the stand-by state must be reduced. Thus we decided enter one character through two steps. Characters were divided into groups of five letters. For Japanese Hiragana characters, each row of a table of the Japanese syllabary, is used as the group. For alphabets, every five letters in alphabetical order are gathered to one group. At the first step, one of the group is selected. Since the number of the groups is smaller than the number of characters, the keys which must be displayed on the screen is reduced.

Figure 1 shows the interface at the stand-by state. Each of the left edge, the upper edge, and the right edge of the screen is divided into two segments. A pair of groups are assigned to each of the segment. The length of each segment is about 12 mm if the screen is 1.6 in.2. This length is sufficient to select with finger, however a half of it is too short. So we allocated two groups to one segment. One of them is selected by a stroke.

Fig. 1.
figure 1

Screen display in stand-by state. (a) is the Hiragana mode, here 10 rows of a table of the Japanese syllabary are assigned from the lower left to the upper right segments. (b) is the alphanumeric mode, here alphabets are assigned to the left segments and the lefty top segment. Numbers are placed at the upper right. In the both modes, symbols and the mode change command are assigned to lower right segment. When BS (backspace), SP (space), or ENT (enter) is entered, the display is changed to (c) in temporal.

The width of the segment is 2 mm. It is too narrow to tap, but it is enough for selecting with slide-in. The slide-in is the operation that first touch a finger to outside of the screen then move it inward as it is. Since the fingertip definitely passes through the edge of the screen during the slide-in, the intersection is correctly detected with only 2 mm in width.

Figure 1(a) is the stand-by display of the Hiragana mode, here 10 rows of a table of the Japanese syllabary are assigned from the lower left to the upper right segments. Each of the rows is called Gyou in Japanese. Their names, “”, “”… are indicated on the segments. In the stand-by display of the alphanumeric mode shown in Fig. 1(b), alphabets are assigned to the lower left, the upper left, and the lefty top segment. Since there is no popular name for the groups of alphabets, range of alphabets selectable from the segment is written on it. Numbers are divided to 0–4 and 5–9 group and placed at the upper right. On the segment they are indicated together as “” (Kanji meaning “number”). In the both modes, symbols (“”) and the mode change command (“” or “”) are assigned to lower right segment.

The bottom edge of the screen is divided three in equal. Here backspace (BS), space (SP), and enter (ENT) are assigned. When one of these three keys is selected, display of the interface is changed to Fig. 1(c) in temporal. Since the height of the three keys enlarge to 5 mm, tapping them is available. This function is implemented to make it speedy to enter these keys consecutively.

3.2 Entering Hiragana

One Hiragana character is entered by choosing Gyou (the rows of the table of Japanese syllabary) at first, then choosing Dan (the columns of the table). At first, you make slide-in crossing the segment where the name of Gyou you want to enter is written. When your finger passes through one segment, the screen is divided into two areas in vertical as shown in Fig. 2(b), or in horizontal. The direction of the division depends on the position of the segment. And each name of the row assigned to the selected segment is largely displayed on the own area. Next, when your finger moves to a certain distance, the area where your finger is placed is highlighted in green as shown in Fig. 2(c). By leaving your finger from the screen at this time, you can select the highlighted row. Our system prevents the fat finger errors by displaying large letters on the screen and indicating the selection by color.

Fig. 2.
figure 2

Screen display while entering a Hiragana character. (a) Firstly, you select one segment by crossing slide-in. (b) Since names of two rows are displayed, move your finger onto one of them. (c) After the background turned green, release your finger from the screen to select the row. (d) Tap one of five characters, member of the row, to enter it. W When you want enter a voiced, a semi-voiced, or a lowercase character, change display by tapping the lower right button first, then tap it. (Color figure online)

If you do not see the name of Gyou you want to select, you are selecting a wrong segment. You can cancel the selection by returning your finger to near the starting point of the slide-in even after the area turns green. Since the Gyou is selected by the position where you release your finger from the screen, you can change your selection at any time till you release your finger. The user interface of our system is designed to reduce the cost of correcting operation errors.

When the group is selected, the screen is divided to \( 3 \times 3 \) blocks as shown in Fig. 2(d). In 5 of the blocks, characters that are the member of the selected group are displayed. By tapping on one of the characters, you can enter it. After that, the system goes back to the stand-by and displays Fig. 2(a) again. To the lower left block, the button for returning to the stand-by state after canceling the selection is assigned.

The button at the lower right corner is used to change the displayed characters to voiced, semi-voiced, and lowercase characters. Since this operation loopes, the modified characters return to the original by pushing again. By tapping the button at the bottom center, characters with lowercase added, like “”, “”, “”, are displayed. Each of the characters is a “” Dan character followed with a lowercase “” Gyou character. These two buttons can be applied at the same time.

Since characters displayed on the screen of smartwatches are small, with the method attaching the voiced mark or the semi-voiced mark to characters already entered, input errors often occur. In our system, to select those characters from the enlarged characters, errors in entering similar shaped characters like “” and “” can be reduced.

3.3 Entering Alphanumeric Characters

The alphabet characters are divided into groups of each five characters in the alphabet order. Number of groups is 6. The last group includes z and Caps lock (CL), period, and comma. As with Hiragana, two groups are assigned to one segment. Firstly, do slide-in crossing a segment. Figure 3(a) is the display when the slide-in passed through the lefty top segment. The last two groups are indicated. Then one of the two groups is selected by the release position. Figure 3(b) is the display after the u-y group is chosen. Here five characters from “u” to “y” are displayed large on the 5 buttons. By tapping one of the character, it is entered. The lowercase characters are changed to the capital letters by tapping the button on the bottom right. Because this button toggles, the characters are returned to lowercase by tapping again.

Fig. 3.
figure 3

Average input speed of beginners. In each trial, 5 words (25 characters in average) were entered. Input speed of 5 test subjects were averaged by each trial. The vertical bars show the standard deviation. The first trial was executed with completely no practice. Average of time spent for the 5 trials was 320 ms.

It was expected that Japanese make more mistakes in selecting groups of alphabet than Hiragana, because the alphabet groups are not general. So that, the function show all groups is allocated to the button on the bottom of the center in order to reduce the cost in repairing the mistake. By tapping this button, Fig. 3(c) is displayed. Here all 6 groups and all characters of the groups are show. By tapping one of group, members of the group are displayed. For novices, this method is easier than selecting the correct segment after cancelling.

Numerical characters are assigned to the upper right segment. They are divided two groups, 0 to 4 and 5 to 9, and entered as same as alphabet. However, on the number selection interface, the capital/lowercase button is replaced to the exchange button of 0–4 and 5–9.

3.4 Entering Symbols

Symbols are assigned to the lower right segment at both of the Hiragana mode and the alphanumeric mode. However, single-byte characters are entered in the alphanumeric mode and double-byte characters in the Hiragana mode. Symbols also are divided into groups of five characters each. When the symbols are selected, the first group is displayed. Here, the function to move to the next group is assigned to the button on the lower right, and the function to move forward is on the bottom center. Till the symbol which you want to enter is displayed, you change groups with these buttons, then you tap it. Since it was expected that long training is necessary to memorize grouping of the symbols, we chosen this simple approach.

4 Input Speed and Error Rate of Beginners

Smartwatches are generally used along with smartphones. In this case, smartwatches are not used for entering long sentences. The use time also becomes short. Therefore, we thought the input speed of beginners is more important than that of skilled people.

Five university students who have never used both our system and the systems similar to ours were chosen as beginners, and input speed and error rate were measured. This experiment was done with ZenWatch2 Wl501Q (1.63 in.2 screen and 320 × 320 pixels). For comparison, the same experiment was done to an expert.

4.1 Experimental Method

Each test subject wore the smartwatch on his/her non-dominant arm. They sat on the chair, and operated the smartwatch by raising their arms horizontally at the front of their chests. Since we didn’t specify the finger to use for the operation, each subject used the finger he/her prefer. Before the experiment, we made presentation in 5 min about the operating procedure to enter characters with our system, meaning of display, the experiment sequence, and some attentions. And we asked subjects enter characters as accurately as possible.

In the experiment, the trial of entering five words of Hiragana is done five times with a 3-minute break in between. Each trial starts when a subject touches the screen. At this time, one task word of Hiragana is displayed at the top of the screen. The characters entered by the test subject are shown underlined and just under the task word. After entering all characters of the test word, tap the ENT key. (Even if the entered characters do not match task word, the ENT is accepted.) This completes the input of one word. And the next word is displayed. When five words are entered, one trial is completed.

The task words were selected from the word list of the Balanced Corpus of Contemporary Written Japanese (BCCWJ) [19] of the National Institute for Japanese Language and Linguistics. From each of the 4-, 5-, and 6-letter nouns in Hiragana notation, except numerals and quantifiers, 100 words were extracted in order of higher frequency. Then words which are same in Hiragana notation but different in Kanji notation were merged. Frequency of the merged word is sum of frequency of the original words. Finally, top 50 words were chosen from each of the 4-, 5-, and 6-letter word.

A total of 150 words were gathered in one list and sorted randomly for each test subject. After that, the list was divided to sets of 5 words from the first. Each set was used at only one trial. Therefore, one subject never enters the same word twice. And for each test subject, the sets of words are different.

4.2 Experimental Result

The input time for one word is the difference of the time when the word was dis-played and the time when the ENT is tapped. The input time of one trial is sum of the input time of the five words. The input speed, that is measured with CPM (characters per minute), is computed by dividing the number of characters entered in the trial by the input time of the trial.

Figure 4 is the average input speed of five beginners. In the first trial, the input speed is about 20.6 [CPM]. This speed was achieved without training at all. The speed increased as the trial progressed, and the input speed reached 28.7 [CPM] in the fifth trial. This speed is sufficient to write short sentences like reply to e-mail within one minute. It will be finished earlier than taking out the smartphone from the pocket and writing a reply. Since the total operation time of 5 trials is 320 s on average, we think that it will be ready for use in practical with 5-min training.

Fig. 4.
figure 4

Average input speed of beginners. In each trial, 5 words (25 characters in average) were entered. Input speed of 5 test subjects were averaged by each trial. The vertical bars show the standard deviation. The first trial was executed with completely no practice. Average of time spent for the 5 trials was 320 ms.

The average speed of 5 trials of an expert was 67.7 [CPM]. This means you can enter one character in under one second if you become familiar with the system. Since the standard deviation between trials is 2.74, the input speed is stable.

Figure 5 shows the average of the Total Error Rate (TER) [20] of the beginners. TER is calculated by dividing the sum of the number of wrong characters and the number of corrected characters by the sum of the characters of the five task words. The wrong characters are characters eventually entered incorrectly. The corrected characters are characters which were entered in mistake but deleted.

Fig. 5.
figure 5

Average of total error rate (TER) of beginners. The error contains both the characters entered incorrectly and the characters corrected.

The TER of the first and the second trials are over 10% because the subjects were not familiar the operations for entering characters. At the second trial, the TER increased than the first trial, but input speed increase, too. This reason may be that the speed of moving fingers has increased. If the moving speed increased but gestures were not reliable, both error rate and input speed increase. After the third trial, the TER decreased to less than 10%. And in the fifth trial, the TER was 4.7%. From the comparison of two graphs, one of the reasons for the speed up at the third to fifth trials is considered to be the reduction of the input errors.

5 Input Speed During One Month

As a simulation of usage of smartwatches, we did the experiment entering 10 words once a day for one month. The words for the experiment were prepared in the same way as the experiment for beginners. However, the numbers of 4-, 5-, and 6-lettter words was enlarged to 150 in each. Therefore, words are entered only once even in this experiment. On average, 50 characters are entered every day. This number corresponds to several times of short replies of e-mail.

This examination was done to one university student. He is a completely newbie and he has not participated in the previous experiment. The time of the trial is different from day to day depending on the subject’s circumstances. The result is shown in Fig. 6. The input speed at the first trial is a little faster than the average of beginners but it is in the range of the standard deviation. The speed is continuously increased, and reached over 50 [CPM] in the last day. Input speed fluctuates day by day. But since the approximate curve tends to rise even on the last day, the input speed is expected to be faster. The time spent in the task is less than 2 min in a day. In the second half of the experimental period, it is a little over one minute. This result shows that even short time operation improves the skill. Because operation time total in the 30 days was only 39.5 min, it can say that learning speed is fast.

Fig. 6.
figure 6

Input speed in the one month experiment. In this experiment, 10 words (50 characters in average) were entered every day. Input time for one month test was 39.5 min in total. Also short usage less than 2 min per day, input speed was improved.

6 Comparison with Related Methods

6.1 Screen Occupation

About the area available for text display, we compared our method with the competitive methods, the Google Japanese, the 5-TILES, and the TouchOne key-board. The Google Japanese input is the flick keyboard for Android Wear provided by Google. The 5-TILES is known as a method with low screen occupancy. The TouchOne Keyboard allocates keys on the edge of the screen like our method.

We displayed each of the method on the Zen Watch 2 after setting the back-ground color to cyan. Their screen shots are shown in Fig. 7. From each image, we counted the number of pixels with background color. Since the rest pixels are used by the input systems to display user interface, the screen occupancy of each system is computed as the proportion of the remaining pixels to all pixels of the screen.

Fig. 7.
figure 7

Comparison of screen ocupation. In the above images, the area where sentences can be displayed is colored Cyan. SInce the Cyan colored area is larger, the screen occupation rate (SOR) is lower. SOR of each method is shown under the image. Our method is lowest, however 5-TILES is almost same. (Color figure online)

The screen occupation rate (SOR) of each method is shown at under the images of Fig. 7. Our method is the lowest but the 5-TILES is almost same. The SOR of the Google Japanese Input is very high. Only small are that is less than 10% of the screen can be used for displaying text. The SOR of the TouchOne is also high. One of the reason is that the method uses edge area over 3 mm in width. With the TouchOne, because character input is started by touching the edge area, this width is needed. On the other hand, our slid-in requires only 2 mm in width. This is the advantage of our method. In addition, the TouchOne exclusively uses bottom of the screen for mode selection. Since our method allocates one segment of the screen edge to that application, extra area is not needed.

6.2 Input Speed

With screen occupancy, our system and 5-TILES were almost equal. So, we compared the character input speed of the two methods. Prior to the experiment, each subject had practice of both methods for a sufficient number of days until he judged himself to be accustomed to the methods.

The experiment was done as following. Each subject performs one trial in a day with each method. It is kept that interval of the two trials is at least one hour. This trials are done for 5 days. The method to be tested at first is randomly changed each day. In each trial, the subject enters 45 kinds of hiragana letters except “” once of each. Since one hiragana character is displayed on the screen, the subject enters it under the test letter. If the entered character is wrong, the entry is not accepted, thus a test subject must correct it. When the correct character is entered, next test letter is shown. When the 45 characters are entered correctly, one trial is finished

The average input speed of 5 days is shown in Table 1. In all three subjects our method was faster than 5-TILES. The advantage in the speed has been confirmed by the T test with a significance level of 5%.

Table 1. Comparison of our method and 5-TILES in input speed. [CPM] shows average input speed measured by Characters Per Minute (CPM). SD means standard deviation of each subject in the 5 trials. Our method is faster than 5-TILES.

With the 5-TILES, one alphabet character is entered either by one stroke connecting two keys or by one tap. Except vowels, entering one Hiragana character needs two alphabet characters, because Hiragana is entered after it transliterated into the Roman alphabet. Therefore, in each trial, 85 operations (65 strokes + 20 taps) are required in total. On the other hand, since our method needs one slide-in and one tap for one Hiragana character, 90 operations (45 slide-in + 45 taps) are required at each trial. About times of operations, the two methods are almost equal.

However, exact pointing is required for the stroking of the 5-TILES, because it must be started and ended on a 5 mm2 area. The tapping is also needed the same accuracy. In contrast, our slide-in accepts rough positioning. This is because it is sufficient by crossing a 12 mm wide segment and ending in a 12 × 20 mm area. And the keys to be tapped is 9 mm2. We think that one of the reasons for the difference in input speed is the difference in accuracy required for each operation.

Table 2 shows Corrected Error Rate (Cerr) of the three subjects. Since in this experiment entry is not accepted if the character is wrong, Total Error Rate (TER) is equal to Cerr. The Cerr of our system was under 5% for all subjects. The value is less than half of the Cerr of the 5-TILES. This is another reason of the difference of input speed. It is thought that the input error of 5-TILES increased due to the fat finger problem. Since the size of key of the 5-TILES is much smaller than the size of a fingertip, the key is completely hidden. Therefore, many errors are expected to occur, especially when stopping strokes.

Table 2. Comparison of our method and 5-TILES in error rate. Cerr is the average of corrected error rate in the five trials for each subject. SD means standard deviation of each subject in the 5 trials. Ceer of our method is less than the half of Cerr of 5-TILES.

7 Conclusion

We proposed a character input method for smartwatches using slide-in. The slide-in certainly crosses the segments allocated on the screen edge, so the width of the segment can be reduced to 2 mm. Therefore, the screen occupancy of our method is 26.4% for a 1.63-in.2 screen for character display, so the system occupancy area is narrower than the competition method.

In the experiment with novice, input speed at the beginning of use was 20.6 [CPM] in the average of 5 subjects. The input speed increased to 28.7 [CPM] at the last of the 5 trials. during which sum of the operation time was about 5 min and 20 s and 125 characters were entered in total. The error rate was over 10% at first, but it reduced at the third trial, and finally it become 4.7%. From this result, we can say that the subject was able to get used to the system in a short time. In our system, the screen displays changes for each finger movement. This helps beginners in selecting operations. Large size buttons that is 9 mm2, is effective to reduce mistakes in touch.

In the experiment that input 10 words a day for 30 days, the input speed was over 50 [CPM] in the last day. It is about 70% of the speed of expert. In this test, the total time of operations was 39.5 min. This means that you can get the skill even with short and fragmentary usage.