Description of study area
The city of Malindi, which is a major tourist destination as well as a significant local market town and small industrial center, is located on the Indian Ocean in Coast Province, Kenya and has a population of 81,000 people [24]. The climate is considered tropical (3°14'S latitude and 40°04'E longitude). April through June and October through November are considered the long and short rains, respectively, with precipitation varying from 750 to over 1,200 mm per year.
Malindi comprises commercial, undeveloped, farmed, and residential areas. Although most roads in Malindi are a mixture of sand and dirt, the city center is paved and has both covered and uncovered engineered drainage systems along main streets. The main roads used for travel and transport in and out of the city, as well as some roads along the beach are also paved. Tourism, fishing, commercial trade and retail, and service professions are the major economic activities. Many residents also engage in small-scale farming for personal consumption and sale. Skilled labor accounts for less than 5% of the workforce in Malindi [24]. The informal economic sector comprises street vendors, sex workers, and tour guide services. According to the latest census, approximately 60% of the urban population has access to piped water, either in the home or at community taps.
Malaria parasite transmission occurs year-round on the coast of Kenya, although the incidence usually increases shortly after the onset of the rains [25, 26]. No malaria prevalence studies have been carried out within Malindi.
Sample frame development
Keating et al. (2003) described the sampling strategy as it was originally conceived and implemented [22]. Although the approach used in this study was fundamentally the same, changes to the design were made and are described herein. An existing base-map for Malindi was updated in September of 2002 using ArcView 3.3® and Garmin E-Trex® GPS receivers, to include additional landmarks and points of reference throughout the study area. Latitude, longitude, and elevation data were collected for roads and reference points, and information relative to land-use type and the level of drainage were noted. The information contained in the 1999 census, existing town maps, and environmental descriptions from the field formed the basis for urban boundary demarcation in this study. ArcView 3.2® was used in 2001 to generate and overlay a series of 270 meter × 270 meter grid cells on the base-map [22]. The grid created in 2001 was also used in this study because data collection teams were already familiar with the boundaries of the existing grid cell locations relative to important landmarks. Two hundred and forty eight grid cells fell within the urban study area, which constituted the sampling frame for this study.
Stratifying Malindi into two categories involved assessing the level of drainage in each grid cell and assigning a value of 1 if the grid cell was well drained and 0 if poorly drained. A grid cell was classified as well-drained if functional (e.g. clear of debris or vegetation at the time of observation) engineered drainage systems covered the majority of the grid cell and no standing water was visible, or if the grid cell was located on a slope and no standing water was visible. A grid cell was classified as poorly drained if it was located in a depression or valley and had either no drainage systems, or the drainage systems were blocked with debris or vegetation. Topographical and town maps were used to assist with this process. Although the same person characterized all grid cells in this study, no algorithm or coding system was used to standardize the process. Seventy-three grid cells were classified as well drained and 175 grid cells were classified as poorly drained. Twenty-five grid cells were selected from each stratum (n = 50). A systematic random sample with a random start was used to select grid cells. This insured that the probability of selection was equal for each grid cell within the respective strata. The probability of selection was equal to 0.1429 (25/175) for the poorly drained stratum and 0.3424 (25/73) for the well-drained stratum. These numbers were used in the calculation of sampling weights. Figure 1 illustrates the randomly selected grid cells used in this study. The number of grid cells selected was a function of time and logistic feasibility. The boundaries of selected grid cells were located in the field using hand-held navigational units (GPS), a compass, and base-maps with landmarks, paths, and roads indicated. Latitude and longitude readings were taken at the corners and center of each selected grid cell to confirm the location and extent of grid cell boundaries.
Water body identification
Each selected grid cell was visually inspected for the presence of water bodies during November and December 2002. All accessible water bodies within the selected grid cells were identified to avoid the bias associated with sampling in areas most likely to contain anopheline larvae. In this analysis, multiple water-filled containers in close proximity (e.g. bucket or tire piles) were considered to be one aquatic habitat. Likewise, artificial water storage containers existing in isolation of other containers were considered to be one aquatic habitat.
Standard dipping methods were used to collect mosquito larvae at each water body [27]. Larvae were preserved and transported to the laboratory for further identification. Water bodies containing no larvae were revisited two weeks later to confirm the absence of mosquito larvae. Environmental and human-ecological information were recorded for each water body.
Household sampling
A two-stage cluster format was used to select households from within the 50 grid cells. The grid cells served as primary sampling units (PSU), or clusters, from which the ultimate sampling units (USU), or households, were selected. Equal allocation, and a design effect of 3 [23], was used to calculate the target sample size. The most conservative estimate of p (0.50) was used in the sample size calculation. The alpha level was set at 95% (α = 1.96). The maximum tolerable error was equal to 10%. The sample size formula was equal to: n ≥ 3(1.96)2(0.50)(1 - 0.50) / (0.10)2. An additional 10% was added to account for non-response, yielding a target sample size of 318 households in the well-drained stratum and 318 households in the poorly drained stratum (n = 636). Because household level enumeration lists were not available for grid cells, a random direction method was used to approach approximately 13 houses (318/25) from within each selected grid cell. The middle of the grid cell was located and a random direction was selected for each interviewer. Interviewers traveled along their respective axis until a household respondent was identified. Additional houses were sampled along the same axis until the boundary of the grid cell was reached. At which time, a new direction from the center was selected and the process repeated until approximately 13 houses had been sampled. In grid cells containing fewer than 13 households, all responsive households were approached. Households selected, but with no resident adult available were revisited once and then replaced with the closest house.
A questionnaire was developed and pre-tested in Malindi during October 2002. Households were defined as residential units with one or more individuals in occupation. Multiple families residing in the same house were considered one household. Multiple structures within a compound occupied by dependents of household head were considered one household. The total number of households per selected grid cell was obtained by counting the total number of occupied households contained within each selected grid cell.
Interviews were conducted with any resident adult (>15 years) willing to be interviewed. A brief explanation of the study was provided and informed consent obtained. Variables used in this analysis were created based on responses to specific questions related to agricultural practices, land-use, home ownership, and access to electricity. The Tulane University Institutional Review Board (IRB) and the Kenya National Ethical Review Board approved the study.
Data analysis
The first objective was to describe the water bodies identified and determine if the characteristics were fundamentally the same in well and poorly drained areas, and to identify human-ecological factors associated with the probability of anopheline larvae occurring in a water body. The variables used were based on field observations and information obtained at the time the water body was identified. Land-use was equal to 1 if the water body was located in a residential, or residential and commercial area, and equal to 0 if located in an agricultural or undeveloped area. Water body type was recorded as a description (e.g. ditch, pond). Water body size was equal to 1 if less than or equal to 3 meters squared, and 0 if not. Water body nature was dichotomized as natural (0) or human-made (1). The level of permanency was equal to 1 if the water body was permanent or semi-permanent (> 3 mo), and 0 if temporary (< 3 mo). Because this study used a cross-sectional approach, and water bodies could not be observed over time, permanency was determined based on the source of water, water body size, previous experience with specific habitat types, and expert opinion. Substrate type was classified as cement (1), mud or soil (2), or rubber or plastic (3). The substrate variable was equal to 1 if substrate was cement or plastic and 0 if mud or soil. No rubber substrates were observed. The distance to the nearest house variable was recorded as 1 if a water body was located < 20 meters from the nearest house, and 0 if > 20 meters. A 20-meter dichotomization criterion was used based on the distribution of the data, as many of houses were less than 20 meters from a water body, with very little variation in values. Pollution and floating debris was recorded as 1 if debris were present or the water body was discolored or foul smelling, and 0 if absent. This distinction was also based on expert opinion, as no physiochemical analysis of water samples were conducted. The shade variable was equal to 1 if some canopy coverage was present, or if a structure provided shade, and 0 if no coverage was present.
Chi-square analysis was conducted to determine if the proportions of the respective independent variable categories for water bodies differed by strata (n = 29). Chi-square analysis was also conducted to determine if the proportions of water bodies positive for anopheline larvae differed by strata, and by the respective categories of the data described above (n = 29). Although the original intention was to test the direction and magnitude of the controlled effect of the respective covariates using logistic regression, the low number of water bodies identified, coupled with the low number of anopheline larvae collected and lack of variability in some habitat characteristics precluded further analysis on the distribution or abundance of anopheline larvae.
The second research objective was to quantify the effect of household-level farming on the probability of water body occurrence within the grid cell, while controlling for potential confounders. Chi-square analysis was first conducted to investigate whether grid cells differed by strata in terms of household level farming, the existence of water bodies, and access to resources (n = 50). The research hypothesis was that the abundance of farming within or around the household, as recorded on the household questionnaire, increases the probability that at least 1 potential anopheline larval habitat (water body) exists within the community. The density of houses, distance from the city center, the level of access to resources, and the level of drainage per grid cell were treated as potential confounders. The dependent variable was binary and equal to 1 if at least one water body was found within the grid cell, and 0 if no water bodies were found within the grid cell (PSU = 50, USU = 629).
In this analysis, the grid cell (cluster) served as a surrogate for a community. Although communities are rarely uniform in space or character, and spatial units may fail to capture physical or human-ecological heterogeneity at a smaller scale, this analysis assumes a specific level of homogeneity within the grid cell based on previous studies conducted within Malindi [22, 23]. The proportion of households sampled per grid cell reporting house ownership plus access to electricity was used as a surrogate for the community's overall ability to access resources and thus control their own environment. The proportion of households reporting home ownership plus electricity for each selected grid cell was calculated from the household survey. The variable was dichotomized to equal 1 if the grid cell value was above the overall mean and 0 if below the overall mean. The proportion of households engaged in farming in or around the household per grid cell was calculated in the same way, and further dichotomized to equal 1 if the grid cell value was above the overall mean and 0 if below. In all cases where household level survey data were used to create a grid cell level variable, the number of households satisfying each respective condition was divided by the total number of households sampled within each grid cell to obtain the respective proportion.
The number of households per grid cell was a continuous variable and served as a surrogate measure of population density per grid cell. The distance from the city center variable was created using ArcView 3.3®. The spatial join function was used to calculate and assign a distance from the centroid of each selected grid cell to a point designated as the city center. This study used a roundabout just south of the old commercial district as the center of town based on its centrality, urban features, and access. The variable was also continuous and served as a proximate determinant of a range of community level variables, including access to infrastructure and services, levels of pollution, community level land-use and the relative socioeconomic status of the area.
Multi-level logistic regression was used to quantify the controlled effect of household-level agricultural activity on the probability of water body occurrence. Sampling weights, equal to the inverse of the probability that a grid cell was selected, were applied to the regression. The grid cell, and corresponding data from households interviewed therein, was treated as a cluster using the "cluster" option in STATA for the regression. Robust standard errors were used because data were collected at both the grid cell and household level. An alpha level of 0.05 was used to indicate significance. Data management and analysis were done using STATA version 7. ArcView 3.3® was used to generate the maps in this study.