Keywords

1 Introduction

Images offer a very rich way of representing information about our world. This is true for the basic two dimensional monochrome projection image in a classical photograph and an artistic drawing or painting. It is also true for a film or video adding the temporal dimension. For a color image adding the spectral dimension, possible to generalize to multi- or hyper- spectral images. For a 3D image through stereo, depth scanning, tomography or holography registering information about the third dimension. We can talk about 2D, 3D, 4D or 5D images referring to how many of the spatial, temporal and spectral dimensions are represented in the image. In the rest of this paper I will simply use the term “image” to refer to image data of any number of dimensions.

One reason while images are so important is that the highest capacity channel for getting information into our brains is through our visual system. Images would be almost meaningless if no one was there to see them. And it is not only we as humans who have great use of our visual system. There are theories claiming that the “invention” of visual systems by evolution caused the “Cambrian explosion”, the first major dramatic increase in the number and diversity of species [2]. Through the development of image analysis and computer vision the entity “seeing” the image may be a machine.

Images existed long before computers were invented but it is not surprising that images has become a very important data structure in computers. It is likely that images form the greatest part of all data handled in all the computers in the world. We take it for granted today that any computer can handle images, even with high spatial and temporal resolution and in full color. But that has not always been the case. When one of the long standing conference series in the field, the Scandinavian Conference on Image Analysis, SCIA, celebrates the milestone of the 20th conference it may be worth reflecting on what has been driving the development of computer capacity of handling images. That is the topic of this paper.

Computers can be used to handle images in two fundamentally different ways.

  1. 1.

    We may have some data that we want to turn into an image so that we can use our visual system to perceive, enjoy or understand the data. We have the fields of computer graphics and visualization. This can be seen as a forward problem: data to image.

  2. 2.

    We may load an image of some part of the world into the computer and try to analyse it to get some useful quantitative data out of it or to reach some kind of understanding of what is in the image. We have the fields of image analysis, image understanding, computer vision. This is an inverse problem: image to data. From the very complex representation in an image we need to find the underlying information of interest. As most inverse problems it is under determined, there may be many possible solutions and we need some models to choose the best one.

We may combine the two and use visualization of real world images to interactively analyse the images and extract more meaningful data from them than could be done without computer support.

2 The Early History

Working memory and storage capacity was a very limited and expensive resource in early computers. Images are heavy data structures. Even a very low resolution image say \(256\times 256\times 8\) bit greyscale occupies 64 kB, typically the whole working memory of a research computer of the early 1970-is or of the first generation personal computers appearing towards the end of that decade. Since a whole image could not be stored in memory there were no display units that could visualize it. The only output units generally available for the early computers were alphanumeric printers and, a bit later, alphanumeric screens.

The first digital images were obtained by scanning a physical picture or an optical image of some part of the world pixel by pixel, reading the intensity into computer memory. Application areas were mainly remote sensing (mostly military) and medicine. There was also research trying to develop more general computer vision.

2.1 Remote Sensing

The first (civilian) remote sensing satellite, Landsat 1, was launched in 1972. It transmitted images from three different sensors down to earth. The highest resolution one, MSS, had pixels of \(68\times 83\) meters over areas covering \(185\times 185\) km with four spectral bands [1]. This meant that the images were \(2720\times 2228\) or 6 Mega pixels. That was far beyond the primary memory capacity of any computer in those days, it was even more than many computers had as secondary storage capacity. So the images were written to photographic film and analysed visually in the same way as aerial photography images had been analysed since the first world-war. But the availability of these images also triggered research on computerized image analysis, although that had to be done without being able to see the digital images that were being analysed.

The founder of the Swedish IAPR section SSAB and one of the key persons behind the SCIA conference series was Thorleif Orhaug who was head of the research unit at the Swedish Defence Research Institute developing remote sensing techniques. That group also had a drum scanner that could read photographs and produce high resolution digital images and write high resolution images to film or photographic paper. This was to my knowledge the first high resolution digital image input-output device in Scandinavia. So it was possible to input and output digital images of high resolution but no one could see the images while they were in the computer and thus not interact with them in any efficient manner. The progress in actually being able to analyse the satellite images with computers in useful ways was very slow.

2.2 Computed X-Ray Images

In the early 1970-is Mc Cormack and Hounsfield invented computer tomography, CT [3]. This was the first time useful images were the result of computations, not only direct measurements. It revolutionized medicine by providing images of cross sections of the body rather than projections through the body. The first tomographic images were very small some tens of kilobytes still there were no computers around that could store and display them in a useful way so the images were written to film and handled as all other medical X-ray images. Even though the first CT systems only provided single, or a few images, that was the first step towards truly volumetric images, since several CT slices stacked on top of each other formed a 3D volume image. The 3D reconstruction had to be in the head of the radiologist based on a number of 2D slices displayed as a mosaic on X-ray film. Most early image processing research in the radiology field was focused on creating better images faster, not on doing any computerized analysis of the images.

2.3 Automated Cell Image Analysis

Long before the advent of computed tomography, already in the 1950-is, a system for cell image analysis for automated early detection of cervical cancer was developed in the US. The background was that Papanicolaou a decade earlier had shown that precancerous lesions could be detected by visual inspection of cell smears. One distinguishing feature was that cancer cell nuclei were bigger than those of normal cells [4]. But the visual screening was tedious and expensive. So a system was developed that scanned the microscopy samples and “looked” for nuclei larger than 10 microns in diameter. The developed system was based on hard-wired analogue video processing circuits since there were no useful computers in the 1950-is [5].

The system was a failure since it could not tell the difference between a cancer cell and two normal cells on top of each other. But as soon as computers became available in the 60-is and 70-is research projects started to develop image analysis systems for the same purpose. Those systems used image processing to detect more features than the nuclear diameter, such as shape and nuclear texture. They also needed to do robust automated segmentation of the cells. The development was initially done with the line printer of the computer as the only tool for displaying the cell images. A cell image was typically around \(128\times 128\) pixels and could be processed in the computer memory. To screen a whole smear many thousand image fields needed to be analysed. To achieve that several slide scanners were developed [6]. All the image analysis had to be done on the fly since there was no way of storing the hundreds of megabyte that comprised a specimen. Around 30 years later memory capacities had caught up with the need and slide scanners that actually stored the images were developed as the first step towards digital pathology, a very active field today.

2.4 Early Computer Vision

In addition to the applied work on remote sensing and medical applications there were also early attempts at using the first generations of computers for developing computer vision systems. In 1966, Seymour Papert at MIT wrote a proposal for building a vision system as a summer project [7]. The abstract of the proposal starts stating a simple goal: “The summer vision project is an attempt to use our summer workers effectively in the construction of a significant part of a visual system”. The difficulty in creating a vision system was somewhat underestimated at that time.

2.5 The Birth of Computer Graphics

The first generations of computers generally operated in batch mode. Piles of punched cards which contained both programs and data were handed in at a computer center and a few hours later a printed list with results (or usually error listings) could be collected. In 1962 Sutherland presented Sketchpad, a computer drawing system that was the first graphical user interface [8]. This was really pioneering work but there were no products available that made it possible for people to use the graphical interface. The IBM 2250 Graphics Display Unit was announced in 1964. It cost around USD 280 000 in 1970, equivalent to around USD 2 million in 2017 currency, so it was not widely available [9]. The system was vector oriented, could draw bright lines on a darker background on a display area of around \(30\times 30\) cm with \(1\mathrm{k}\times 1\mathrm{k}\) resolution. The Tektronix 4010 series which appeared in the early 1970-is had the same resolution and drawing capability but was based on storage tube technology which meant that you could only erase graphics by flashing the entire screen, erasing everything where after a new graphical image could be written. Since the price-tag was almost 100 times lower, around USD 3000 they became much more widely available. Based on the Tektronix 4010 and similar rather primitive graphical display devices it was possible to create the first interactive image analysis systems. Even the limited display capacity helped increase the effectiveness of algorithm development significantly.

2.6 The Need of Special Image Processing Hardware

The very limited memory and processing capacity of early computers led to much interest in special hardware architectures for image processing. An early noteworthy Scandinavian project was the PICAP [10]. Internationally we had the GLOPR, Cytocomputer, CLIP projects [11]. They all in different ways explored neighbourhood relations and the potentially parallel processing possibilities in images usually through pipe lining architectures although in the CLIP case through a physically parallel architecture. Single or a few units were built but the special architectures never caught on to become successful products.

Fig. 1.
figure 1

The number of papers presented for different topics during the SCIA conferences. The topics have been judged from the titles of the papers, a paper may belong to several categories, one has been chosen here based on an estimate of what was the dominant topic. The total number of papers presented at all the SCIA conferences so far including this year is 2148.

3 Image Analysis Established as a Research Field - Early SCIA Conferences

3.1 First SCIA Linköping 1980

The first SCIA conference which took place in January 1980 was clearly influenced by the need of increased processing power. There was for instance a paper about the work on PICAP II, a successor of the pioneering PICAP system but with a very different architecture. The need of interactive image analysis systems and the possibilities of creating such systems because of improved display facilities led to a number of papers on image processing software systems. In total 17 papers dealt with system design, more than at any other SCIA conference. The main application fields discussed at the conference were those mentioned above, remote sensing and medical image analysis, in particular microscopy and various industrial applications [12].

3.2 Second SCIA, Helsinki 1981 and Third, Köbenhavn 1983

At the second SCIA in June the following year (the organizer realized it was easier to attract international colleagues to Scandinavia in June than in January) the great interest in developing image analysis systems was illustrated by a survey from VTT which had identified 123 such systems in the literature, many from the US, but Scandinavia was in relation to the population strongly overrepresented. A hardware architecture for GOP, the general operator processor which implemented hierarchical local processing with feed-back between levels was presented. It was similar to convolutional networks, but with hand coded filters coefficients rather than defined by machine learning. There was also strong focus on theoretical foundations of image analysis. KS Fu one of the global pioneers in image analysis gave a keynote lecture on syntactic methods in image analysis. JI Zuravlev from the Soviet Academy of Science gave a most theoretical lecture, spending an hour of intensive writing of equations on a black board, proving the solution to the pattern recognition problem gamma, QED. T Kohonen presented the first results about self-organizing neural networks. The size of the conference and profile of topics was very similar at SCIA 3 two years later [12].

3.3 Image Processing Hardware Developments

During the following years a number of companies were founded in particular in Linköping, based on the idea that it should be possible to use image processing for a number of useful applications, but the general purpose computer architectures were not really suited for processing image data and far too slow. Imtec was based on the PICAP II architecture. It later split in one branch Teragon which developed a first generation desktop publishing system for use at newspapers. The other branch which retained the name Imtec moved to Uppsala to develop medical image analysis based on the research that had been going on there since the early seventies. Sectra was formed to initially develop data coding for secure transmissions but soon also became active in the radiology image processing field. The GOP became the basis of Context Vision, a company that first addressed image analysis and computer vision in a very general way but later specialized in medical image enhancement.

These companies developed graphics subsystem capable of showing high resolution full color images. But they did not come cheaply, the \(1024 \times 1280\) full color display system developed by IMTEC for medical image display cost more than USD 10 000 in manufacturing cost for a single unit. The international companies specializing in computers for research such as Digital Equipment, Sun and Silicon Graphics also offered products with high quality display options for tens of thousands of dollars.

3.4 SCIA 4 Trondheim 1985 and 5 Stockholm 1987

At the 5th SCIA the Scandinavian commercial efforts were at their peak with a large industrial exhibit and much optimism about image analysis finally having come through from an academic curiosity to mainstream commercial products relevant for many application fields. But the number of application papers presented was the lowest for all the SCIA conferences, the application focus had moved to commercial efforts. Instead papers dealing with computer vision problems, understanding 3D scenes and video sequences started appearing at SCIA 4 and were the most common topics at SCIA 5 [12].

3.5 Computers Becoming a Consumer Product

In parallel to these developments computers were moving out of the research labs and becoming a consumer product. The IBM PC was announced in August 1981 about the same time as the second SCIA. It had rudimentary color graphics capability. The CGA-standard graphics card had 16 kB video memory and could display \(640\times 200\) binary monochrome graphics and up to four colors at the \(320\times 200\) resolution. But the vendors realized that in order to sell computers to consumers they had to have good graphics capabilities and still be affordable. The potential mass market made it possible to develop such products. So during the 1980-is there were several generations of improving graphic standards. Specialized brands such as Amiga was first with good graphics performance but towards the end of the decade also standard PC:s could display images and videos with reasonable quality and performance.

So in the early 1990-is we had affordable basic image display systems and standard computers had reached a capacity of storing and processing images that made it possible to develop image analysis applications without special hardware. The special image processing hardware developments more or less disappeared. The exponential growth of computer capacity dubbed “Moores law” had led to standard computers having the capacity the special architectures had a decade earlier. The companies that based their business idea on special image processing hardware had either disappeared or reformulated the business to be strongly application oriented using mainly standard hardware. The lesson learned was that image analysis is too narrow a field to support special hardware that keeps ahead of general purpose computing hardware.

3.6 SCIA 6 Oulu 1989, 7 Aalborg 1991 and 8 Tromsø1993

The fact that general purpose computers had reached sufficient capacity to handle process images of useful size and resolution led to a boom in the research in the field. The following three SCIA conferences were the largest of all with attendances of around 250 persons. The relative number of papers on computer architectures and systems had decreased since people no longer had to develop their own tools to be able to do research on image analysis. There were numerous computer vision papers dealing both with static 3D scenes and motion problems. There were also many method oriented papers dealing with mathematical morphology, segmentation, feature extraction, object recognition, classification. As an example, H. Knutsson presented a paper about representation of local structure using tensors which later received rather many citations. The main application fields were still medical and remote sensing but also document analysis was discussed in many papers. But most applications were in rather narrow fields and far from having general impact in society [12].

4 Computer Games Driving Graphics Developments

The development was different for the other, forward, aspect of processing images in computers. Computer games had found mass markets and there were very strong economic motivations for developing powerful image display and realistic real time 3D image rendering systems. Several vendors e.g. ATI, 3Dfx, NVIDIA were competing intensely, developing ever more powerful 3D rendering chips and display systems [13]. At this time the invention of the world wide web had turned internet from being a convenient way for researchers to communicate to become a mainstream communication platform for multimedia, again needing great capacity for image handling and display, but not so much for image analysis.

4.1 SCIA 9 Uppsala 1995, 10 Lappeenranta 1997, 11 Kangerlussuaq 1999 and 12 Bergen 2001

Towards the end of the millennium the size of the SCIA conferences decreased steadily. In particular the number of computer vision related topics decreased, instead there were more application papers reaching around a third of all papers. On the methods side the number of papers dealing with recognition problems increased. Also during these years we saw the first boost in the interest in neural networks, in the first conferences there were a single paper on that topic, now we had between five and ten, but the interest died after the 12th conference, returning to single papers per conference [12, 14].

4.2 Image Analysis Lacking a “killer Application”

So the situation was that we had very strong general public impact for image generation and display but not at all corresponding developments for image analysis or computer vision. The markets were very different, there were no “killer applications” like computer games for the image analysis field. The powerful display capabilities were of course useful when doing image analysis research but the processing was done in standard software. Many image analysis applications were actually running slower than they had earlier since the increasing computer capacity had made it possible to write image analysis algorithms in very high level systems such as Matlab and still get them executed at acceptable speeds, although much slower than the same algorithm would have executed if implemented more efficiently.

4.3 SCIA 13 Göteborg 2003, 14 Joensuu 2005, 15 Aalborg 2007, 16 Oslo 2009

SCIA 13 saw a recovery in the number of accepted papers, almost reaching the numbers of the conferences ten years earlier. Application papers decreased in numbers and instead we saw more papers on methods of various kinds, texture, segmentation, feature extraction, shape analysis were popular topics. The first presentations about local binary patterns, LBP, at a SCIA conference was given in 2003, the year after the ground breaking publication in PAMI [15]. They covered an extension to greater neighbourhoods and a combination with neural networks. The LBP methods was later also applied to face detection and analysis, a popular topic at this time [16].

5 Image Processing to Benefit from Graphics Hardware

Around the turn of the millennium it had become possible to do some programming of the graphical processing units. Initially quite difficult low level programming but ten years ago the CUDA software development kit was made public. Later additional packages such as OpenCL offered increased convenience and functionality. So now the hardware developed for generating images became available for helping us analyse images. That’s the second major way in which graphics has made a decisive difference for image analysis (the first one was when it became possible to display images on computer screens, remember).

The availability of high performance graphics processors that can be programmed to do general purpose parallel image processing operations at very high speed has been exploited in a large number of image analysis applications significantly speeding up the processing. But the greatest impact has been on convolutional neural networks. As mentioned above, the concept of artificial neural networks is not new but the interest died due to lack of successful applications. But a few years ago something happened, we saw a leap in classification performance for well established, difficult computer vision tasks. And the reason was the availability of huge numbers of images and massive computing power in the GPU:s making it possible to train the networks much more extensively than was possible before. Today deep convolutional neural networks are used for all kinds of image analysis tasks, often with impressive performance. So finally special hardware and software architecture is having an impact on image analysis. But that hardware was not developed for image analysis, it was developed for image synthesis and display for computer games.

5.1 SCIA 17 Ystad 2011, 18 Espoo 2013, 19 Köbenhavn 2015 and 20 Tromsö 2017

During the most recent SCIA conferences the focus was on methods for analysing images and recognizing features and structures. More than half of the presented applications were from the medical field. The numbers of presented papers has been declining, the 19th SCIA was the smallest of all so far. But fortunately this years conference, the 20th SCIA is breaking that trend, with 87 accepted papers we have to go back a decade to find a bigger SCIA conference, it is in the middle size range among all the conferences. As expected a record number of the papers are dealing with neural networks, 17 papers, which is almost 20% of all papers at the conference and also 20% of all neural network papers presented at any SCIA conference. [17].

6 Conclusion and Future Work

So where do we stand today? The forward image processing, generating realistic images in real time is seeing enormous markets, the computer game market is greater than the film market in economic terms. We are rapidly approaching the point where it is impossible to distinguish between computer generated characters and filmed real humans actors. This is likely going to have a huge impact on entertainment. In particular when it is combined with virtual reality so that the spectator can be immersed in the action in the artificially generated world.

The inverse problem, image analysis is also finally finding some mass market applications. Most mobile phones and digital cameras today have face detection, helping to focus on the most important parts of the image, a common research topic at the SCIA conferences a decade ago. We can also do image search on the internet by showing an image to Google receiving similar images in return. Also this a research problem discussed at SCIA a decade ago. But these applications are not at all of the same impact as the gaming industry.

6.1 Augmented Reality

We may see a future high impact image analysis application in augmented reality. If the system can understand the visual neighbourhood of the person taking part in a game the visual and real world can be joined in very powerful ways. We have already seen great public impact of primitive augmented reality in the PokemonGo game. But there the integration with the world was only based on location not vision. In addition to gaming, this technology could find usage in significantly improved user interfaces to smart phones. If the phone can understand what you are saying and see where you are pointing in an improved, heads-up display that is comfortable enough to wear all day we could get away from staring at and manipulating tiny screens.

6.2 Self Driving Cars - a “killer Application”

The “killer application” for image analysis that is beginning to appear is self-driving cars. Even though there are other sensors than passive imaging cameras most of them generate image information that need to be analysed in very strict real time. The car industry is big enough to motivate special hardware development for the needed image analysis capacity so even if the hardware developed for games is currently having a major impact on the development of image analysis for self-driving cars we will most likely see specially developed image analysis hardware solutions in the near future. Self-driving cars could transform our society in very profound ways, even helping solve the climate crisis. But it is crucial that the image analysis works in a really reliable way, otherwise we may have a much too literal killer application.

6.3 Big Brother Is Watching

Another image analysis application that can have major social impact is identification of individuals based on general appearance, gait and facial features. We already have the image analysis performance to identify a person and to follow that person from one camera to another if the cameras are set-up to cover a large area such as a campus or a whole city. This can have major usage in monitoring areas for possible terrorism threats or other unauthorized behaviour. But it really is an implementation of “big brother” that is quite scary in its possibilities for misuse by authoritarian authorities.

6.4 Concluding Remarks

When the SCIA conferences started computers were hardly capable of processing images with meaningful performance. The general development of microelectronics gave us exponential growth of computing capacity, millions of times larger now than at the time of the first SCIA. Consumer markets for graphics gave us even higher performance parallel processing units. For a long time I had the feeling that we were not at all living up to using that capacity to create applications that had any greater impact on society. But now my conclusion is that we are at a point in time where image analysis is beginning to reach real social impact. Perhaps at the 30th SCIA you will be able to look back at a research field that has had a major role in transforming society.