Jeff Ricklefs and Chris Snyder, GIS analysts with Washington State Department of Natural Resources (DNR), push their way through thickets of salmonberry, vine maple, and devil’s club to reach a tiny stream deep in a stream basin in Capitol State Forest, an approximately 90,000-acre area outside Olympia, Washington. They are collecting stream data. Because the best time to do this is winter when the leaves are down and the streams are flowing, it is raining. And cold.
Jeff and Chris are here because of issues with DNR’s hydro layer, a statewide GIS layer showing stream location and type. Developed in the 1980s, DNR’s hydro layer was partially hand drawn from topographic maps and aerial photos on which the ground was not always visible. Some streams are untyped, others are mistyped, and still others are missing.
These issues affect planning projects like DNR’s calculation of sustainable timber harvest levels for state trust lands in western Washington. For this calculation, DNR needs to know stream type and location to estimate the width of riparian buffers. These issues also affect the planning of individual timber sales, which help support public school construction and other needs for trust lands beneficiaries.
Unfortunately, it is impractical to field map thousands of stream miles across a million plus acres of western Washington forested state trust lands to update DNR’s hydro layer. So what Jeff and Chris are doing, out here in the devil’s club, is gathering data from sample stream basins to build a synthetic stream model for Capitol State Forest as a proof-of-concept. If effective, a similar model could be built for all state trust lands west of the Cascade Mountains, no rain gear required.
Stream types On state trust lands in western Washington, DNR uses a numerical system (1 through 5) to categorize streams based on fish presence and physical characteristics, such as stream width and steepness. Type 1 streams are the largest, Type 5 streams are the smallest. Type 4 and 5 streams are non-fish bearing.
What is a Synthetic Stream Model, and How Does it Work?
A synthetic stream model is a mathematical tool (literally, a collection of mathematical equations) that can be applied to a GIS database to produce a synthetic stream map. The term “synthetic” means the map is based on predictions of where a stream is likely to be. The model can be applied to any area for which there is enough information to run it, and the map it produces can be viewed in GIS like any other stream map.
The method Jeff and Chris selected to build their model is “binary logistic regression.” This statistical method involves predicting a binary response (true/false) based on one or more variables. These predictions are based on observation: the model is “trained” to recognize streams using field-collected data, as will be explained later in this article.
To understand how the model works, imagine a forest as a grid of 6 ft. by 6 ft. squares or “pixels.” Each pixel is associated with data for each of the variables used in the model, such as soil conditions, slope, and elevation. Using this information, the model determines whether each pixel is likely or unlikely to be classified as a stream based on the variables it was given. For example, the model tells us that “based on soil conditions, slope, and elevation, this pixel is highly likely (or unlikely) to be a stream.” The answer is expressed as a probability between 0 (not likely to be a stream) and 1 (highly likely to be a stream). To find the stream, one looks for the pixels with the highest probabilities (Figure 1).
Jeff and Chris set out to develop a synthetic stream model that, when applied to DNR’s GIS database, would predict both stream location and type per DNR’s State Lands stream typing system across Capitol State Forest. The model would provide an improved planning tool for DNR foresters, engineers, and planners. To develop the model, they first built one model that indicated which pixels are likely to be Type 4 streams, and a second model that did the same for Type 5 streams based on physical characteristics only (Type 3 streams were handled differently, as will be explained later in this article; also, Type 1 and 2 streams were treated the same as Type 3 since their buffer requirements are the same). These models were later combined into a single synthetic stream model that predicted all three stream types (3, 4, and 5).
Selecting the Right Variables
Each separate stream model would need its own unique combination of variables because a combination that effectively identifies one stream type may over- or under-predict another. To begin, Jeff and Chris first developed a list of 60 variables based on literature and professional judgement. Then they used statistical methods to determine which variables were important and limit the number of possible combinations. For example, they eliminated combinations in which two variables were highly similar.
The best combinations were used to build and test over 1 million “candidate models” for both Type 4 and 5 streams. Model building and testing were done using high-powered computers located in the Natural Resources Building in Olympia.
To build each candidate model, they divided each of the basins from which they collected stream data into 6 ft. by 6 ft. pixels in GIS. Next, they generated data for every pixel for each variable. Primarily, they used topographic data collected through light detection and ranging (LiDAR), which is a remote sensing method that uses lasers mounted on planes or helicopters to make highly detailed images of the earth’s surface. Finally, they provided each model with the data they collected in the field (the “training data set”).
The model’s task was to look for a correlation between the variables and actual stream location (the training data set). For example, was there a strong correlation between elevation, soil conditions, and precipitation and Type 4 stream location? What about basin size and slope? A strong correlation means that a variable or combination of variables is highly predictive of stream location and type.
The training data set also served another purpose: to train the model how to identify streams for each variable. For example, if slope is one of the variables, how steep does the slope need to be? How deeply incised a channel? The model answered these questions by analyzing the training data set for each variable.
From this process, Jeff and Chris selected the candidate models for Type 4 and 5 streams that were the most effective at making correct predictions. To make the final selection, they generated a synthetic stream map from each of the selected models so they could compare predicted to actual stream location and type in each of the sample basins.
Flip a Coin: Turning Probabilities into Lines on a Map
In generating synthetic stream maps from the candidate models, Jeff and Chris encountered a challenge: selecting the right pixels for each stream. Obviously, a pixel with a probability above .9 should definitely be classified as a stream. But what about .6 or .5? Are those streams as well? How about .49? And should the cutpoint—the highest probability that indicates a stream—be the same for each basin? For each steam type?
To solve this problem, Peter Gould, Forest Inventory Lead in Olympia, suggested an ingenious solution that the team called “spatially aggregated probability.” Following is a simplified example to demonstrate how it works, using a square to represent a stream basin. In this very simplified example, each pixel has been assigned a probability as shown in Figure 2.
So how does one find the stream? Consider a simple coin flip. If one were to flip a coin ten times, the probability of getting “heads” on each coin flip would be 50 percent, or .5. To arrive at the number of heads one might get in 10 flips, the probabilities of each flip are added together:
.5+.5+.5+.5+.5+.5+.5+.5+.5+.5 = 5 heads
For the problem at hand, the object is not to determine how many times heads will occur in 10 coin flips, but to determine how many pixels will have a stream in a given stream basin. To begin, all of the probabilities in the grid are added together as shown in Figure 3.
Rounding down to three, in this grid the event occurs three times – in other words, three of these pixels should be classified as a stream. But which ones? As stated earlier, the cutpoint is the highest probability that indicates a stream. Anything above the cutpoint should be classified as a stream, anything below it should not. So in this case, the chosen cutpoint must result in three pixels in this grid being classified as streams. That cutpoint is derived by solving the model’s probability equation. In this example, the answer is .48 (Figure 4).
The elegance of this solution, is that it provides a cutpoint that is tailored to each basin and to each stream type. To Jeff and Chris’ knowledge, this solution had never been tried before and was a major breakthrough in developing these models.
Selecting and Combining the Models
For each stream type, Jeff and Chris selected the model that gave the best estimate of stream length at both the landscape and basin level, placed the stream in the correct location, and was also simple and intuitive. The candidate models selected for Type 4 and 5 steams are shown in Table 1.
What About Type 3 Streams?
Early in this process, Jeff and Chris realized a potential problem with identifying Type 3 streams: the model assigns a probability to each pixel individually, not in the context of nearby pixels. Thus the model may identify pixels as Type 3 streams even though a fish barrier (such as a waterfall) is present somewhere in the middle of the stream.
Jeff and Chris wanted their model to classify the stream reaches upstream of a barrier as Type 4 (non-fish bearing) rather than Type 3. This decision does not reflect DNR’s stream typing rules, which require a biological survey to determine if fish are present upstream of a barrier. However, a decision was made at the outset of model development to base stream typing in the model on physical criteria only.
Jeff and Chris used an automated technique to search the Type 4 stream network for areas where the average gradient exceeded specific thresholds. Segments of stream downstream of those areas were identified as Type 3 streams (Figure 5).
Some Type 3 streams in the model may be Type 1 or 2. As explained previously, the model did not differentiate between these three stream types (1, 2, and 3) since their buffer requirements are the same.
Is the Synthetic Stream Model More Accurate Than the Hydro Layer?
In a word, yes (Figure 6). Jeff and Chris tested the final synthetic stream model against the existing hydro layer and a basin threshold model, a common type of synthetic stream model that predicts streams based on basin size only. They tested for ability to predict stream length, type, and location. The synthetic model was far more accurate than either the hydro layer or the basin threshold model, particularly for Type 5 streams.
Feedback from the field has been positive. “It works better than other models I’ve seen,” says Scott Sargent, Manager of DNR’s Black Hills District. “We still have to proof the streams in the field, but it’s great for planning purposes.”
Derwood Duncan, Littlerock Unit Forester for DNR’s South Puget Sound Region, says the higher accuracy of the synthetic stream layer also helps them determine who to involve in road projects and when. On roads that may have stream crossings, “we might, for example, send a road engineer out with the forester when they’re doing initial field recon” for setting up a timber sale. They also use the synthetic stream layer to spot areas where many small streams come together, because such areas may be bogs or wetlands.
The higher accuracy of the Capitol Forest synthetic stream layer also will help inform the next sustainable timber harvest level. As explained previously, stream type and location affects DNR’s estimate of the width of riparian buffers, which in turn affects estimates of harvest volume. For that reason, DNR has incorporated the synthetic stream layer for Capitol State Forest into the model it will use to calculate the next sustainable timber harvest level and conduct the environmental analysis.
The uniqueness of this model has already attracted attention from other organizations such as University of Washington, Precision Forestry Cooperative, and several government agencies across Washington and Oregon. Over time, it is likely to attract more interest from organizations facing the same problem: accurately describing a stream network across a large and diverse land base, without investing millions of dollars in basin-by-basin surveys.
This article was first published as “Where the Streams are: Developing a Synthetic Stream Model for Western Washington State Trust Lands” In the Woods, Issue 2, Vol 2; November 2015; DNR agency intranet publication.