What is Georeferencing?
Georeferencing refers to the laborious effort of translating textual locality descriptions into coordinates (or shapes) that can be mapped and are computer readable. The application of an estimate of spatial error can also be included as part of the georeferencing effort, which makes the data more useful since records with large spatial uncertainty can be filtered out if needed. Use of modern GPS at the time of data collection typically negates the need for manual georeferencing and thus newer data often come with coordinates taken in the field and with a presumed small spatial error.
Georeferencing in Fishes of Texas
Especially for early incoming data (Version 1.0 and 2.0) records typically came to us lacking latitude and longitude, and even when present they were unreliable. In order to assure consistent high quality of spatial placement of records, we manually assigned latitude and longitude coordinates and error estimates to collection localities. Since that initial effort, it has become common place for collections to include coordinates in their databases and their quality has greatly improved. Collectors now routinely submit coordinates with specimens since GPS has become ubiquitous and many collections have georeferenced their own older collection records as well. Newer data imported as part of our Version 3.0 of the project often included what we determined to be accurate coordinates and often with spatial error, indicated as a radius from the set of coordinates. Due to this change, we’ve accepted, at least on a preliminary basis, many more georeferences provided by donors. Thus, at the time we released Version 3 of the database and website, our data includes a combination of records georeferenced manually by our team and those that have come from donor sources directly.
Things to Know About Our Version 3 Georeferences
- Georeferences received with new records were preliminarily accepted. Our intent is to review the donor georeferences and edit as needed.
- Many records that were previously not georeferenced have now been georeferenced at their donor institutions and we were able to extract those coordinates and apply them to those records. This especially effects those records that are in our neighbor states and in shared basins.
- Many of our oldest records in tracks 1 and 2 were collected during Texas boundary and rail road surveys in the early to mid-1800's and lacked specific locality details (no/little text and no coordinates). In version 3 of the database we updated most of the oldest records by reviewing the original survey reports and maps to extrapolate their locations. Delving into the literature also also allowed us to improve collection dates and collector names.
- Since many of the newer records lacked spatial error, and our spatial algorithms require our georeferences to have a spatial error, we usually applied a default error radius of 45 meters or used the number of digits provided after the decimal to derive an error.
- When we could unambiguously match a textual locality description to that of a previously georeferenced location, we copied its georeference data to the ungeoreferenced locality.
Georeferencing Methods
Key Things to Know About Our Georeferencing
- Georeferences include coordinates (latitude and longitude) in decimal degrees (WGS84 datum) and an error radius measured in meters that together define a circle. All serious users should understand that the true location of the occurrence may occur anywhere within the circle.
- We edited locality descriptions to correct and normalize spelling (using standard place names), and improved syntax of locality descriptions.
- Our notes indicate why locations could not be georeferenced and describe any decisions made about coordinates and error.
- Generally, georeferences were determined without consideration of taxa in order to ensure biases did not come into play. However, some georeferences were adjusted later based on distributional or habitat information when consensus could be made between at least two staff.
- Once the data were georeferenced, we were able to use a Geographic Information System to populate our database with categorical variables such as county, river drainage, USGS hydrologic unit codes, and more. Records that could not, or were not georeferenced, lack this step. These data were extracted from the coordinates, regardless of error, so it is possible for categorical variables to be incorrectly assigned. Since our query interface allows for querying among these categorical spatial parameters there can be some confusion. For example, a location with coordinates in county A and an error that extends into county B will only be pulled when county A is designated in the query. Even though the overlapping georeferenced error estimate means that this occurrence could likely occur in county B, it cannot be found by querying for county B.
- Some records could not be georeferenced. This usually happens when there is an internal conflict in a locality description or a named place cannot be found.
- Fish obviously are limited to water, but not all coordinates in our database fall on water features. Users should remember that some locality descriptions are vague and do not allow precise placement of points.
- Georeferences are continually refined based on new information and comments from FoTX users are important for improving our georeferences.
Georeferencing Details
Initially we pursued various methods for georeferencing the records contained in the database. Among options were automated programs such as GeoLocate which (at the time) did not provide estimates of error and for which we were not satisfied with the placement of some points. Furthermore, it required substantial prior editing of locality descriptions to function optimally. Our Track 1 and 2 datasets (Versions 1.0 and 2.0 of the FoTX project) were georeferenced semi-manually using protocols outlined in various other large-scale georeferencing projects (HerpNET; MaNIS) and published documents (Wieczorek et al. 2004. International Journal of Geographical Information Science 18:745-767). The methodology described in those documents defines a point with coordinates and assigns an error radius that defines a circle, throughout which the true location is equally probable to occur. In accordance with those protocols, an online error calculator was used to determine error radii, but coordinates were determined manually, in NAD27 datum (converted later to WGS84), using Terrain Navigator (Maptech) loaded with 1/24,000 and 1/100,000 US Geographical Survey (USGS) topographic maps. Only points that met the criteria of having locality descriptions that were clearly described, without internal conflicts, and located within the project's geographic scope (i.e., located on the Texas mainland, bays, estuaries, or inland side of the barrier island's surf side) were georeferenced (although for Version 3.0 the scope was increased). In some instances, assumptions were made about misspellings or other intended meaning in the verbatim locality descriptions. For some instances in which a location could not be georeferenced we contacted the collector or examined original field notes if available for more information. Notes and assumptions are described in comment fields associated with each record and displayed on the website.
Error extents (defined as a radius extending from each georeferenced point) were restricted by higher geographic information, typically county or state lines. For example, a stream reach defined as being in a county would be limited to the extent of the county. However, in many instances error radii extend into counties other than the county in which the point was placed. These are not errors on the part of the georeferencer, but rather, an unavoidable consequence of using circles to indicate errors. The most obvious examples are stream localities that define political borders, but this often occurs for stream localities further away from political borders as well. In the extreme case of streams that are political borders, the points are placed on the stream center and the county determinations displayed in our database are made via GIS. Thus, all coordinates were assigned by GIS-based methods to only one county, though many samples may have included specimens from both political units, especially when from small streams that are political boundaries. Those users searching for records from any county thus may want to include neighboring counties in their searches in addition to the county of interest.
Fish occurrences differ from those of terrestrial animals in that they are always restricted to water. However not all points in the database are placed on water bodies. In some cases, verbatim donor locality descriptions are too vague to assign coordinates to a specific water body (ex. 'Travis County' or 'IH35') and the point may fall on a feature, which is clearly not fish habitat, but the true location is encompassed by the error radius. Some water features such as small ponds may not show up in gazetteers or other resources and have to be approximated by their higher geography only (ex. 'John Smith's Pond in Austin'). However, most localities in the dataset were described clearly as stream localities and points are placed on streams in such cases. For such localities that are known to be on streams, an error defined by a circle is a gross exaggeration since the organisms are obviously constrained to the narrow intersection of the stream channel and the error circle.
When measuring by stream from a named place, the measurement was started generally on the stream closest to the geographic center of the named place (although some measurements may start at major road crossings).
All verbatim locality descriptions were assigned standard locality names according to the following protocol: The term 'unspecified' is used before a feature (creek, ditch, etc.) in cases where no name was given in the original description (i.e., 'bayou at IH 35' would be 'unspecified bayou at IH 35'). 'Unnamed' is used in reference to a water body that is known to be unnamed. Descriptions are generally written so that the waterbody name is given first and other higher-level information, or distances from a place follow the waterbody's name. Generally, roads were given the highway designations 'SH' (for Farm to Market roads, Ranch to Market (or simply Ranch) roads, and County roads, which are all various types of Texas 'State' highways), 'US' for US highways, and 'IH' for Interstate Highways. These are the three major designations used in USGS maps. Some exceptions may exist including many records georeferenced before June 26, 2006 which may be designated with FM, RR, CR or other designations.
Individual georeferences are inherently subjective and some users may disagree with placement and errors size assigned to some locations. In an effort to reduce variation in our georeferences attributable to georeferencers we relied on a small number of georeferencers, (for Track 1 data, 68% of our georeferences are produced by the same person and only four people were involved with the georeferencing process). Any errors or disagreements by users should be brought to our attention using the comment forms provided on the specimen pages.
After georeferencing, we were able to use a Geographic Information System to populate our database with categorical variables such as county, river drainage, USGS hydrologic unit codes, and many more. Records that could not be georeferenced thus lack this step. These determinations are based strictly on coordinates for georeferenced locations and do not take into consideration the error radius associated with each coordinate set.
Case Studies
We followed the protocol outlined above, but here we clarify several situations repeatedly found in the locality data specific to stream and river locations which make up the bulk of our data.
'x stream near y named place'
Localities like this are treated as a named place. The point is placed on the creek nearest the geographic center of named place Y. Extent runs halfway to a point on the creek that is closest to the nearest other named place of similar size/type.
'x stream, z distance from y named place'
Georeferencing of localities described like this depends on the path of the stream and its position relative to the named place. Generally, two options are used to georeference this type of location depending on which was the most appropriate.
Option 1. Treat as a distance from a named place.
This option was typically used for instances where a stream flows generally through, away or towards a named place. The locality is treated as a distance measured from a named place. The point is placed on the creek as measured by stream (or road if appropriate) at the given distance.
Option 2. Treated as a named place.
Contrary to option 1, the stream could flow 'by' a named place. The stream may flow generally perpendicular to the direction given (option 2a above) or perpendicular and then parallel to the direction given (option 2b above). For option 2a the point is placed on the creek halfway between crossings of lines drawn NE and SE out of the named place geographic center. The extent of the reach is ½ of the distance between the crossings of the two intercardinal lines. For option 2b the point is placed on the creek due east of the named place's geographic center. The extent runs to the crossing of the line drawn NE out of the named place Y. Depending on the path of the stream the extent could be in either direction NE or SE. In this example the NE line makes the most sense to use because the SE line may never cross the creek. The distance given in the locality description is checked to make sure the method reasonably positions the point.
If the direction given is a cardinal direction (N, S, E, or W) then the lines that define the extent are drawn at intercardinals (NE, NW, SW, or SE). If the direction given is an intercardinal direction the lines that define the extent are drawn between the intercardinals (SSW, WSW, NNW, etc.). This means that if a description is given as E, for example, the stream reach falls within a 45-degree angle drawn out of the geographic center of the named place. If the description is given as NE, the reach falls within an angle of 22.5 degrees. If the description is given as NNE, the reach falls within an angle of 11.25 degrees. This is, to our understanding, consistent with the MaNIS protocol for calculating errors for points that are described as being 'z distance in x direction from y named place (assumed by air)'. This method was put into effect after January 22, 2007 and records georeferenced before this date are typically handled more conservatively and given an angle of 45 degrees regardless of the direction given in the description.
'x stream at y named place'
Localities described like this are always treated as a named place. If a stream flows within the extents of a named place, the reach within the named place's geographical extents was defined. Otherwise, if the stream flows outside the geographic extents of the named place and never enters the named place, the point was placed on the stream nearest the geographic center and the extents of the town are translated to the river.
'x stream immediately (or just) up (or down) stream of y named place'
In locality descriptions like this the words 'immediately up/downstream' and 'just up/downstream' are usually taken to imply that the collection was made on the stream at the named place.
'x stream, between distance a and b up/downstream of y named place'
For localities described like this the point is placed on the stream at the midpoint of the range as measured from the named place. The error is calculated by path, but the distance between the midpoint and either range limit (they should be the same) is added to the starting extent of the named place.
Georeferencing Resources
Online tools
- Protocols and various other resources
- Online Error Calculator (version 070228)
- Length conversion calculator
- Getty Thesaurus of Names Online
- USGS National Map Viewer
- TXDoT
- Southwest Paddler
- Google Maps
- Wikipedia, The Free Encyclopedia
- Various internet searches performed on Google
- USGS Map Symbols
Computer programs
Terrain Navigator (displaying both 1/24,000 and 1/100000 USGS topographic maps)
Google Earth, version 4.3.7204.0836 (beta)
Books
- Brune, Gunnar. 1981. Springs of Texas, Vol 1. Branch Smith, Inc. Fort Worth, Texas.
- Abate, Frank R. 1991. Omni Gazetteer of the United States of America. Omnigraphics, Inc. Detroit, Michigan.
- Mapsco. 2005. The Roads of Texas. Addison, Texas.
- Delorme Mapping. 1995. Texas Atlas and Gazetteer. Freeport, Maine.
- Palacios Roji Garcia, Joaquin and A. Palacios Roji Garcia. 1999. Mexico Tourist Road Atlas. Guia Roji. S.A. de C.V.
- Texas Department of Transportation, Transportation Planning and Programming Division, Mapping Branch. 2004. County Maps of Texas.