Address Geocoding Tutorial¶
This tutorial teaches you how to convert street addresses into geographic coordinates (latitude/longitude) for spatial analysis. Address geocoding is essential when you have location data as addresses rather than coordinates.
What You'll Learn¶
- Converting single addresses to coordinates
- Batch processing multiple addresses efficiently
- Understanding geocoding quality levels
- Choosing between geocoding providers
- Integrating geocoded addresses with SocialMapper
- Handling errors and edge cases
Why Use Address Geocoding?¶
Address geocoding enables you to: - Convert address lists into mappable coordinates - Analyze service accessibility by street address - Integrate business locations with demographic data - Create custom POI datasets from address databases
Available Providers¶
SocialMapper includes two geocoding providers:
Provider | Coverage | Best For | Limitations |
---|---|---|---|
Nominatim | Global | General use, international addresses | Rate limits on free tier |
Census Bureau | US only | High accuracy for US addresses | US addresses only |
The system automatically falls back between providers for best results.
Getting Started¶
Import Required Components¶
from socialmapper.geocoding import (
geocode_address,
geocode_addresses,
AddressInput,
GeocodingConfig,
AddressProvider,
AddressQuality
)
Example 1: Single Address Geocoding¶
Let's start by geocoding a famous address:
# Create address input
address = AddressInput(
address="1600 Pennsylvania Avenue NW, Washington, DC 20500",
id="white_house",
source="tutorial"
)
# Configure geocoding
config = GeocodingConfig(
primary_provider=AddressProvider.NOMINATIM,
fallback_providers=[AddressProvider.CENSUS],
min_quality_threshold=AddressQuality.APPROXIMATE
)
# Geocode the address
result = geocode_address(address, config)
if result.success:
print(f"Coordinates: {result.latitude:.6f}, {result.longitude:.6f}")
print(f"Quality: {result.quality.value}")
print(f"Confidence: {result.confidence_score:.2f}")
print(f"Provider: {result.provider_used.value}")
else:
print(f"Failed: {result.error_message}")
Output:
Understanding Quality Levels¶
Geocoding results have different quality levels based on precision:
# Test different address types
test_addresses = [
"1600 Pennsylvania Avenue NW, Washington, DC 20500", # Street address
"Washington, DC", # City level
"North Carolina" # State level
]
for addr in test_addresses:
address = AddressInput(address=addr)
result = geocode_address(address, config)
if result.success:
print(f"{addr[:30]:<30} → Quality: {result.quality.value}")
Output:
1600 Pennsylvania Avenue NW, W → Quality: approximate
Washington, DC → Quality: centroid
North Carolina → Quality: approximate
Quality Level Hierarchy¶
- EXACT: Precise rooftop or parcel-level match
- INTERPOLATED: Estimated along street segment
- APPROXIMATE: Near the location but not exact
- CENTROID: Geographic center of area (city/state)
Example 2: Batch Address Processing¶
Process multiple addresses efficiently:
# North Carolina city halls
addresses = [
"100 N Tryon St, Charlotte, NC",
"301 E Hargett St, Raleigh, NC",
"120 E Main St, Durham, NC",
"100 N Greene St, Greensboro, NC",
"100 Coxe Ave, Asheville, NC"
]
# Create address inputs
address_inputs = [
AddressInput(
address=addr,
id=f"nc_{i}",
source="city_halls"
)
for i, addr in enumerate(addresses, 1)
]
# Configure for batch processing
config = GeocodingConfig(
primary_provider=AddressProvider.CENSUS, # Better for US addresses
fallback_providers=[AddressProvider.NOMINATIM],
min_quality_threshold=AddressQuality.APPROXIMATE,
enable_cache=True,
batch_size=3,
batch_delay_seconds=0.5 # Respect API rate limits
)
# Batch geocode
results = geocode_addresses(address_inputs, config, progress=True)
# Analyze results
successful = [r for r in results if r.success]
print(f"Successful: {len(successful)}/{len(results)} ({len(successful)/len(results)*100:.1f}%)")
Displaying Batch Results¶
import pandas as pd
from tabulate import tabulate
# Convert to DataFrame
data = []
for result in successful:
data.append({
'Address': result.input_address.address[:40],
'Latitude': f"{result.latitude:.4f}",
'Longitude': f"{result.longitude:.4f}",
'Quality': result.quality.value
})
df = pd.DataFrame(data)
print(tabulate(df, headers='keys', tablefmt='github', showindex=False))
Output:
| Address | Latitude | Longitude | Quality |
|-----------------------------------------|-----------|-----------|-------------|
| 100 N Tryon St, Charlotte, NC | 35.2271 | -80.8431 | approximate |
| 301 E Hargett St, Raleigh, NC | 35.7804 | -78.6382 | approximate |
| 120 E Main St, Durham, NC | 35.9940 | -78.8986 | approximate |
| 100 N Greene St, Greensboro, NC | 36.0726 | -79.7920 | approximate |
| 100 Coxe Ave, Asheville, NC | 35.5951 | -82.5515 | approximate |
Integration with SocialMapper¶
Convert geocoded addresses into demographic analysis:
# Save geocoded results to CSV
import pandas as pd
df = pd.DataFrame([{
'name': r.input_address.address.split(',')[0],
'latitude': r.latitude,
'longitude': r.longitude,
'address': r.input_address.address
} for r in successful])
df.to_csv('output/geocoded_addresses.csv', index=False)
# Use with SocialMapper
from socialmapper import SocialMapperClient, SocialMapperBuilder
with SocialMapperClient() as client:
config = (SocialMapperBuilder()
.with_custom_pois('output/geocoded_addresses.csv')
.with_travel_time(15)
.with_census_variables("total_population", "median_household_income")
.with_exports(csv=True)
.build()
)
result = client.run_analysis(config)
if result.is_ok():
analysis = result.unwrap()
print(f"Analyzed {analysis.poi_count} geocoded locations")
print(f"Census data for {analysis.census_units_analyzed} areas")
Error Handling¶
Handle common geocoding issues gracefully:
# Test problematic addresses
problem_addresses = [
"This is not a real address",
"123 Nonexistent Street, Nowhere, XX 99999",
"", # Empty address
"Paris" # Ambiguous - which Paris?
]
for addr in problem_addresses:
if not addr:
print("Empty address - skipping")
continue
address = AddressInput(address=addr)
result = geocode_address(address, config)
if result.success:
print(f"'{addr}' → {result.latitude:.4f}, {result.longitude:.4f}")
print(f" ⚠️ Quality: {result.quality.value} - verify this is correct!")
else:
print(f"'{addr}' → Failed: {result.error_message}")
Configuration Patterns¶
High-Accuracy US Addresses¶
For government or medical applications requiring precision:
config = GeocodingConfig(
primary_provider=AddressProvider.CENSUS,
min_quality_threshold=AddressQuality.EXACT,
require_country_match=True,
default_country='US'
)
Fast Processing for Large Datasets¶
When speed matters more than fallback options:
config = GeocodingConfig(
primary_provider=AddressProvider.NOMINATIM,
fallback_providers=[], # No fallbacks for speed
min_quality_threshold=AddressQuality.APPROXIMATE,
batch_size=10,
batch_delay_seconds=0.1
)
International Addresses¶
For global address datasets:
config = GeocodingConfig(
primary_provider=AddressProvider.NOMINATIM,
require_country_match=False,
timeout_seconds=15,
max_retries=3
)
Best Practices¶
- Always validate results: Check
result.success
before using coordinates - Set appropriate quality thresholds:
- Medical/Emergency:
EXACT
only - Business analysis:
APPROXIMATE
or better - Regional studies:
CENTROID
acceptable - Use caching: Avoid re-geocoding the same addresses
- Respect rate limits: Add delays for batch processing
- Include fallback providers: Improve success rates
- Clean addresses first: Remove special characters, standardize format
Performance Tips¶
- Enable caching: Geocoded addresses are cached automatically
- Batch processing: More efficient than individual requests
- Provider selection: Census for US, Nominatim for international
- Preprocess addresses: Clean and standardize before geocoding
Common Issues and Solutions¶
Issue: "No matches found" - Solution: Simplify address, remove apartment numbers, check spelling
Issue: "Rate limit exceeded" - Solution: Add delays, reduce batch size, enable caching
Issue: "Wrong location returned" - Solution: Add state/country, check quality level, verify provider
Issue: "Timeout errors" - Solution: Increase timeout, check internet connection, try fallback provider
Use Case Examples¶
Business Locations Analysis¶
# Geocode store locations and analyze demographics
store_addresses = pd.read_csv('store_locations.csv')
# ... geocode and analyze with SocialMapper
Service Accessibility Study¶
# Convert clinic addresses to coordinates for travel time analysis
clinic_addresses = load_clinic_addresses()
# ... geocode and create isochrones
Address Data Cleaning¶
# Validate and standardize addresses through geocoding
raw_addresses = get_customer_addresses()
# ... geocode to get standardized formatted addresses
Next Steps¶
After completing this tutorial:
- Try geocoding your own address datasets
- Experiment with different quality thresholds
- Compare provider accuracy for your region
- Build complete address-to-demographics workflows
- Create custom POI datasets from address lists
Full Code¶
The complete tutorial script is available at:
examples/tutorials/05_address_geocoding.py
Key Takeaways¶
- Address geocoding bridges text addresses and geographic analysis
- Quality levels indicate precision - choose based on use case
- Batch processing is more efficient for multiple addresses
- Provider selection affects accuracy and coverage
- Integration with SocialMapper enables demographic analysis
- Proper error handling ensures robust workflows