Wildlife Movement Prediction using Deep Learning 🦅¶
This project leverages deep learning techniques to analyze GPS tracking data of wildlife, focusing on bird migration patterns. Using geospatial and temporal data, we preprocess and visualize trajectories, compute inter-location distances using the Haversine formula, and apply sequential modeling techniques (e.g., GRU) to predict animal movement.
📂 Dataset Information¶
- Source: Movement Ecology Dataset (hosted via Google Drive)
- Size: ~90,000 entries
- Fields: Timestamp, Location (lat/lon), Species, Sensor Type, Vegetation Indexes, etc.
💻 Project Goals¶
- Clean and preprocess geospatial data
- Compute movement trajectories
- Build and train GRU-based deep learning model
- Predict next location(s) in movement sequence
%pip install gdown geopy pandas numpy matplotlib seaborn scikit-learn tensorflow --quiet
Note: you may need to restart the kernel to use updated packages.
import numpy as np
import pandas as pd
import math
import gdown
import pandas as pd
# Extract the File ID from your link
file_id = "1o1umq9xOuvhE7rKWpop82tvkKYjtIPyX" # Extracted from your Google Drive link
# Correct Google Drive direct download URL
download_url = f"https://drive.google.com/uc?id={file_id}"
# Define output file name
output_file = "migration_original.csv"
# Download the file
gdown.download(download_url, output_file, quiet=False)
/Users/apple/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020 warnings.warn( Downloading... From: https://drive.google.com/uc?id=1o1umq9xOuvhE7rKWpop82tvkKYjtIPyX To: /Users/apple/Desktop/Wildlife_Movement_Prediction/migration_original.csv 100%|██████████████████████████████████████| 22.3M/22.3M [00:07<00:00, 2.92MB/s]
'migration_original.csv'
# Load the dataset
df = pd.read_csv('migration_original.csv')
print(df.shape)
df.head()
(89867, 15)
event-id | visible | timestamp | location-long | location-lat | manually-marked-outlier | visible.1 | sensor-type | individual-taxon-canonical-name | tag-local-identifier | individual-local-identifier | study-name | ECMWF Interim Full Daily Invariant Low Vegetation Cover | NCEP NARR SFC Vegetation at Surface | ECMWF Interim Full Daily Invariant High Vegetation Cover | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1082620685 | True | 2009-05-27 14:00:00.000 | 24.58617 | 61.24783 | NaN | True | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | 0.039229 | NaN | 0.960771 |
1 | 1082620686 | True | 2009-05-27 20:00:00.000 | 24.58217 | 61.23267 | NaN | True | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | 0.040803 | NaN | 0.959197 |
2 | 1082620687 | True | 2009-05-28 05:00:00.000 | 24.53133 | 61.18833 | NaN | True | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | 0.052201 | NaN | 0.947799 |
3 | 1082620688 | True | 2009-05-28 08:00:00.000 | 24.58200 | 61.23283 | NaN | True | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | 0.040818 | NaN | 0.959182 |
4 | 1082620689 | True | 2009-05-28 14:00:00.000 | 24.58250 | 61.23267 | NaN | True | gps | Larus fuscus | 91732 | 91732A | Navigation experiments in lesser black-backed ... | 0.040753 | NaN | 0.959247 |
# Check for unique values in all the columns
for column in df.columns:
print(f'The Unique Columns present in "{column}" are: ',df[column].unique(), "\n")
The Unique Columns present in "event-id" are: [1082620685 1082620686 1082620687 ... 1082710937 1082710938 1082710939] The Unique Columns present in "visible" are: [ True] The Unique Columns present in "timestamp" are: ['2009-05-27 14:00:00.000' '2009-05-27 20:00:00.000' '2009-05-28 05:00:00.000' ... '2015-08-26 21:00:00.000' '2015-08-27 06:00:00.000' '2015-08-27 09:00:00.000'] The Unique Columns present in "location-long" are: [24.58617 24.58217 24.53133 ... 35.69217 35.71483 35.66567] The Unique Columns present in "location-lat" are: [61.24783 61.23267 61.18833 ... 64.95367 64.97133 65.019 ] The Unique Columns present in "manually-marked-outlier" are: [nan] The Unique Columns present in "visible.1" are: [ True] The Unique Columns present in "sensor-type" are: ['gps'] The Unique Columns present in "individual-taxon-canonical-name" are: ['Larus fuscus'] The Unique Columns present in "tag-local-identifier" are: [91732 91733 91734 91735 91737 91738 91739 91740 91741 91742 91743 91744 91745 91746 91747 91748 91749 91750 91751 91752 91754 91755 91756 91758 91759 91761 91762 91763 91764 91765 91766 91767 91769 91771 91774 91775 91776 91777 91778 91779 91780 91781 91782 91783 91785 91786 91787 91788 91789 91794 91795 91797 91798 91799 91800 91802 91803 91807 91809 91810 91811 91812 91813 91814 91815 91816 91819 91821 91823 91824 91825 91826 91827 91828 91829 91830 91831 91832 91835 91836 91837 91838 91839 91843 91845 91846 91848 91849 91852 91854 91858 91861 91862 91864 91865 91866 91870 91871 91872 91875 91876 91877 91878 91880 91881 91884 91885 91893 91894 91897 91900 91901 91903 91907 91908 91910 91911 91913 91916 91918 91919 91920 91921 91924 91929 91930] The Unique Columns present in "individual-local-identifier" are: ['91732A' '91733A' '91734A' '91735A' '91737A' '91738A' '91739A' '91740A' '91741A' '91742A' '91743A' '91744A' '91745A' '91746A' '91747A' '91748A' '91749A' '91750A' '91751A' '91752A' '91754A' '91755A' '91756A' '91758A' '91759A' '91761A' '91762A' '91763A' '91764A' '91765A' '91766A' '91767A' '91769A' '91771A' '91774A' '91775A' '91776A' '91777A' '91778A' '91779A' '91780A' '91781A' '91782A' '91783A' '91785A' '91786A' '91787A' '91788A' '91789A' '91794A' '91795A' '91797A' '91798A' '91799A' '91800A' '91802A' '91803A' '91807A' '91809A' '91810A' '91811A' '91812A' '91813A' '91814A' '91815A' '91816A' '91819A' '91821A' '91823A' '91824A' '91825A' '91826A' '91827A' '91828A' '91829A' '91830A' '91831A' '91832A' '91835A' '91836A' '91837A' '91838A' '91839A' '91843A' '91845A' '91846A' '91848A' '91849A' '91852A' '91854A' '91858A' '91861A' '91862A' '91864A' '91865A' '91866A' '91870A' '91871A' '91872A' '91875A' '91876A' '91877A' '91878A' '91880A' '91881A' '91884A' '91885A' '91893A' '91894A' '91897A' '91900A' '91901A' '91903A' '91907A' '91908A' '91910A' '91911A' '91913A' '91916A' '91918A' '91919A' '91920A' '91921A' '91924A' '91929A' '91930A'] The Unique Columns present in "study-name" are: ['Navigation experiments in lesser black-backed gulls (data from Wikelski et al. 2015)'] The Unique Columns present in "ECMWF Interim Full Daily Invariant Low Vegetation Cover" are: [0.03922896 0.0408028 0.0522006 ... 0.82435717 0.82432803 0.82430923] The Unique Columns present in "NCEP NARR SFC Vegetation at Surface" are: [nan] The Unique Columns present in "ECMWF Interim Full Daily Invariant High Vegetation Cover" are: [0.96077104 0.9591972 0.9477994 ... 0.17564283 0.17567197 0.17569077]
'''
1. As some Columns contain all null values or a single value for entire dataset,
they does not contribute to the output at all thus we will drop them.
2. Also as "individual-local-identifier" is the same as that of "tag-local-identifier"
just with an extension of "A" they become similar.
3. Again, "ECMWF Interim Full Daily Invariant Low Vegetation Cover" and "ECMWF Interim Full Daily Invariant High Vegetation Cover"
are complementary to each other. Thus, do not need to keep both in our dataset for training. Any one can be dropped.
'''
# define Columns to drop
columns_to_drop = ["event-id","visible", "visible.1", "sensor-type", "individual-taxon-canonical-name", "study-name", "manually-marked-outlier",
"NCEP NARR SFC Vegetation at Surface", "individual-local-identifier", "ECMWF Interim Full Daily Invariant Low Vegetation Cover"]
# drop unwanted columns
data = df.drop(columns=columns_to_drop)
data.head()
timestamp | location-long | location-lat | tag-local-identifier | ECMWF Interim Full Daily Invariant High Vegetation Cover | |
---|---|---|---|---|---|
0 | 2009-05-27 14:00:00.000 | 24.58617 | 61.24783 | 91732 | 0.960771 |
1 | 2009-05-27 20:00:00.000 | 24.58217 | 61.23267 | 91732 | 0.959197 |
2 | 2009-05-28 05:00:00.000 | 24.53133 | 61.18833 | 91732 | 0.947799 |
3 | 2009-05-28 08:00:00.000 | 24.58200 | 61.23283 | 91732 | 0.959182 |
4 | 2009-05-28 14:00:00.000 | 24.58250 | 61.23267 | 91732 | 0.959247 |
# Check for data types and Null counts using info() method
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 89867 entries, 0 to 89866 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 timestamp 89867 non-null object 1 location-long 89867 non-null float64 2 location-lat 89867 non-null float64 3 tag-local-identifier 89867 non-null int64 4 ECMWF Interim Full Daily Invariant High Vegetation Cover 89867 non-null float64 dtypes: float64(3), int64(1), object(1) memory usage: 3.4+ MB
# -------------------- STEP 1: Load Data -------------------- #
# Ensure timestamps are in datetime format
data["timestamp"] = pd.to_datetime(data["timestamp"])
# -------------------- STEP 2: Group Data by Tag -------------------- #
# This groups the dataset by "tag-local-identifier" so that birds are clearly separated
data = data.sort_values(by=["tag-local-identifier", "timestamp"]).reset_index(drop=True)
# -------------------- STEP 3: Extract Date-Time Features -------------------- #
data["year"] = data["timestamp"].dt.year
data["month"] = data["timestamp"].dt.month
data["hour"] = data["timestamp"].dt.hour
# STEP 4: Compute Time Difference per Bird
data["time_diff(hrs)"] = (
data.groupby("tag-local-identifier")["timestamp"]
.diff().dt.total_seconds() / 3600
)
# Replace NaN with 0 for the first row per bird (safe assignment)
data["time_diff(hrs)"] = data["time_diff(hrs)"].fillna(0)
data.head(10)
timestamp | location-long | location-lat | tag-local-identifier | ECMWF Interim Full Daily Invariant High Vegetation Cover | year | month | hour | time_diff(hrs) | |
---|---|---|---|---|---|---|---|---|---|
0 | 2009-05-27 14:00:00 | 24.58617 | 61.24783 | 91732 | 0.960771 | 2009 | 5 | 14 | 0.0 |
1 | 2009-05-27 20:00:00 | 24.58217 | 61.23267 | 91732 | 0.959197 | 2009 | 5 | 20 | 6.0 |
2 | 2009-05-28 05:00:00 | 24.53133 | 61.18833 | 91732 | 0.947799 | 2009 | 5 | 5 | 9.0 |
3 | 2009-05-28 08:00:00 | 24.58200 | 61.23283 | 91732 | 0.959182 | 2009 | 5 | 8 | 3.0 |
4 | 2009-05-28 14:00:00 | 24.58250 | 61.23267 | 91732 | 0.959247 | 2009 | 5 | 14 | 6.0 |
5 | 2009-05-28 20:00:00 | 24.58617 | 61.24767 | 91732 | 0.960761 | 2009 | 5 | 20 | 6.0 |
6 | 2009-05-29 05:00:00 | 24.58600 | 61.24767 | 91732 | 0.960736 | 2009 | 5 | 5 | 9.0 |
7 | 2009-05-29 08:00:00 | 24.58617 | 61.24767 | 91732 | 0.960761 | 2009 | 5 | 8 | 3.0 |
8 | 2009-05-29 14:00:00 | 24.58650 | 61.24750 | 91732 | 0.960799 | 2009 | 5 | 14 | 6.0 |
9 | 2009-05-29 20:00:00 | 24.56967 | 61.23883 | 91732 | 0.957722 | 2009 | 5 | 20 | 6.0 |
Haversine Formula:¶
To calculate the distance between two latitude and longitude points (current and previous), you can use the Haversine formula. This formula calculates the distance between two points on the Earth's surface, taking into account the spherical shape of the Earth.
$$ a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \sin^2\left(\frac{\Delta\lambda}{2}\right) $$
$$ c = 2 \cdot \text{atan2}\left(\sqrt{a}, \sqrt{1 - a}\right) $$
$$ d = R \cdot c $$
Where:
- $ \phi_1, \phi_2 $ are the latitudes of the two points in radians,
- $ \lambda_1, \lambda_2 $ are the longitudes of the two points in radians,
- $ R $ is the Earth's radius (mean radius = 6,371 km),
- $ d $ is the distance between the points in kilometers.
# -------------------- STEP 5: Define Haversine Distance Function -------------------- #
def haversine(lat1, lon1, lat2, lon2):
"""Compute the great-circle distance (Haversine formula) between two GPS coordinates."""
R = 6371 # Earth radius in kilometers
phi1, phi2 = map(math.radians, [lat1, lat2])
delta_phi = math.radians(lat2 - lat1)
delta_lambda = math.radians(lon2 - lon1)
a = math.sin(delta_phi / 2)**2 + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
return R * c # Distance in km
# -------------------- STEP 6: Compute Distance per Bird -------------------- #
# Compute previous lat/lon per bird before applying Haversine formula
data["prev_lat"] = data.groupby("tag-local-identifier")["location-lat"].shift(1)
data["prev_lon"] = data.groupby("tag-local-identifier")["location-long"].shift(1)
# Apply Haversine function to compute distances
data["distance(km)"] = data.apply(
lambda row: haversine(row["prev_lat"], row["prev_lon"], row["location-lat"], row["location-long"])
if pd.notna(row["prev_lat"]) and pd.notna(row["prev_lon"]) else 0, axis=1
)
# Drop temporary columns
data.drop(columns=["prev_lat", "prev_lon"], inplace=True)
# ------------------------ STEP 7: Compute Speed (Avoid Division by Zero) ------------------------ #
data["speed(km/hr)"] = data["distance(km)"] / data["time_diff(hrs)"]
# Replace inf, -inf, NaN → First np.nan, then 0
data["speed(km/hr)"] = data["speed(km/hr)"].replace([np.inf, -np.inf], np.nan)
data["speed(km/hr)"] = data["speed(km/hr)"].fillna(0) # Replace NaN with 0
!pip install folium
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: folium in /Users/apple/Library/Python/3.9/lib/python/site-packages (0.20.0) Requirement already satisfied: branca>=0.6.0 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (0.8.1) Requirement already satisfied: jinja2>=2.9 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (3.1.6) Requirement already satisfied: numpy in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2.0.2) Requirement already satisfied: requests in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2.32.4) Requirement already satisfied: xyzservices in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2025.4.0) Requirement already satisfied: MarkupSafe>=2.0 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from jinja2>=2.9->folium) (3.0.2) Requirement already satisfied: charset_normalizer<4,>=2 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (3.4.2) Requirement already satisfied: idna<4,>=2.5 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (2.5.0) Requirement already satisfied: certifi>=2017.4.17 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (2025.8.3)
import folium
from folium.plugins import AntPath
import pandas as pd
import numpy as np
import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import colormaps
# Load the dataset
data = pd.read_csv("migration_original.csv")
# Convert timestamp and extract year/month
data['timestamp'] = pd.to_datetime(data['timestamp'])
data['year'] = data['timestamp'].dt.year
data['month'] = data['timestamp'].dt.month
# Get unique years, months, and tags
unique_years = sorted(data['year'].unique())
unique_months = sorted(data['month'].unique())
unique_tags = sorted(data['tag-local-identifier'].unique())
# Set up color mapping for each unique tag
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))
tag_colors = {
tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
for i, tag in enumerate(unique_tags)
}
# Create dropdowns for year and month selection
year_selector = widgets.SelectMultiple(
options=unique_years,
value=[unique_years[0]],
description='Years',
layout=widgets.Layout(height='100px', width='150px')
)
month_selector = widgets.SelectMultiple(
options=unique_months,
value=[unique_months[0]],
description='Months',
layout=widgets.Layout(height='100px', width='150px')
)
# Button to update the map
update_button = widgets.Button(description="Update Map")
# Output widget to display the map
output = widgets.Output()
# Function to plot movement interactively
def plot_movement_interactive(years, months):
filtered_data = data[data["year"].isin(years) & data["month"].isin(months)]
if filtered_data.empty:
with output:
clear_output(wait=True)
print("No data available for the selected period.")
return None
first_point = (
filtered_data.iloc[0]["location-lat"],
filtered_data.iloc[0]["location-long"]
)
m = folium.Map(location=first_point, zoom_start=6)
for tag in filtered_data["tag-local-identifier"].unique():
bird_data = filtered_data[filtered_data["tag-local-identifier"] == tag]
bird_color = tag_colors[tag]
path = list(zip(bird_data["location-lat"], bird_data["location-long"]))
folium.PolyLine(path, color=bird_color, weight=2.5, opacity=0.8).add_to(m)
for _, row in bird_data.iterrows():
folium.CircleMarker(
location=(row["location-lat"], row["location-long"]),
radius=3,
color=bird_color,
fill=True,
fill_color=bird_color,
popup=f"Tag: {tag}<br>Time: {row['timestamp']}"
).add_to(m)
return m
# Button click handler
def on_button_click(b):
output.clear_output(wait=True)
selected_years = list(year_selector.value)
selected_months = list(month_selector.value)
map_plot = plot_movement_interactive(selected_years, selected_months)
if map_plot:
with output:
display(map_plot)
update_button.on_click(on_button_click)
# Display UI
display(widgets.HBox([year_selector, month_selector]))
display(update_button, output)
HBox(children=(SelectMultiple(description='Years', index=(0,), layout=Layout(height='100px', width='150px'), o…
Button(description='Update Map', style=ButtonStyle())
Output()
import folium
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output
from folium.plugins import TimestampedGeoJson
import matplotlib.pyplot as plt
from matplotlib import colormaps
# Load and preprocess the dataset
data = pd.read_csv("migration_original.csv")
data['timestamp'] = pd.to_datetime(data['timestamp'])
data['year'] = data['timestamp'].dt.year
data['month'] = data['timestamp'].dt.month
# Get unique tag-local-identifiers
unique_tags = sorted(data['tag-local-identifier'].unique())
# Assign unique colors to each tag using colormap
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))
tag_colors = {
tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
for i, tag in enumerate(unique_tags)
}
# Create dropdown widgets
tag_selector = widgets.Dropdown(
options=unique_tags,
value=unique_tags[0],
description='Tag:',
layout=widgets.Layout(width='200px')
)
year_selector = widgets.Dropdown(
options=[],
description='Year:',
layout=widgets.Layout(width='200px')
)
month_selector = widgets.Dropdown(
options=[],
description='Month:',
layout=widgets.Layout(width='200px')
)
update_button = widgets.Button(description="Update Map")
output = widgets.Output()
# Update year and month dropdowns dynamically
def update_year_month_dropdowns(tag):
filtered_data = data[data["tag-local-identifier"] == tag]
unique_years = sorted(filtered_data['year'].unique())
unique_months = sorted(filtered_data['month'].unique())
year_selector.options = unique_years
if unique_years:
year_selector.value = unique_years[0]
month_selector.options = unique_months
if unique_months:
month_selector.value = unique_months[0]
# Function to plot movement using TimestampedGeoJson
def plot_movement_interactive(tag, year, month):
filtered_data = data[
(data["tag-local-identifier"] == tag) &
(data["year"] == year) &
(data["month"] == month)
]
if filtered_data.empty:
with output:
clear_output(wait=True)
print("No data available for the selected tag, year, and month.")
return None
filtered_data = filtered_data.sort_values(by="timestamp")
bird_color = tag_colors[tag]
first_point = (filtered_data.iloc[0]["location-lat"], filtered_data.iloc[0]["location-long"])
m = folium.Map(location=first_point, zoom_start=8)
features = []
path_coordinates = []
for _, row in filtered_data.iterrows():
point_feature = {
'type': 'Feature',
'geometry': {
'type': 'Point',
'coordinates': [row["location-long"], row["location-lat"]]
},
'properties': {
'time': row['timestamp'].isoformat(),
'popup': f"Tag: {tag}<br>Time: {row['timestamp']}",
'icon': 'circle',
'iconstyle': {
'fillColor': bird_color,
'fillOpacity': 0.6,
'stroke': 'false',
'radius': 5
}
}
}
features.append(point_feature)
path_coordinates.append([row["location-long"], row["location-lat"]])
# Add movement line
line_feature = {
'type': 'Feature',
'geometry': {
'type': 'LineString',
'coordinates': path_coordinates
},
'properties': {
'times': [row['timestamp'].isoformat() for _, row in filtered_data.iterrows()],
'style': {
'color': bird_color,
'weight': 2
}
}
}
features.append(line_feature)
TimestampedGeoJson(
{'type': 'FeatureCollection', 'features': features},
period='PT1M',
add_last_point=True,
auto_play=True,
loop=False,
max_speed=30,
loop_button=True,
date_options='YYYY/MM/DD HH:mm:ss',
time_slider_drag_update=True
).add_to(m)
return m
# Button click logic
def on_button_click(b):
output.clear_output(wait=True)
selected_tag = tag_selector.value
selected_year = year_selector.value
selected_month = month_selector.value
map_plot = plot_movement_interactive(selected_tag, selected_year, selected_month)
if map_plot:
with output:
display(map_plot)
# When tag changes, update year/month dropdowns
def on_tag_change(change):
update_year_month_dropdowns(change['new'])
tag_selector.observe(on_tag_change, names='value')
update_year_month_dropdowns(tag_selector.value)
update_button.on_click(on_button_click)
# Display all UI components
display(widgets.VBox([tag_selector, year_selector, month_selector]))
display(update_button, output)
VBox(children=(Dropdown(description='Tag:', layout=Layout(width='200px'), options=(np.int64(91732), np.int64(9…
Button(description='Update Map', style=ButtonStyle())
Output()
# -------------------- STEP 8: Compute Bearing (Direction of Movement) -------------------- #
# Function to calculate bearing between two GPS points
def calculate_bearing(lat1, lon1, lat2, lon2):
"""
Calculate the initial bearing (direction) from point (lat1, lon1) to (lat2, lon2).
The result is in degrees (0° = North, 90° = East, 180° = South, 270° = West).
"""
lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
delta_lon = lon2 - lon1
x = np.sin(delta_lon) * np.cos(lat2)
y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(delta_lon)
initial_bearing = np.arctan2(x, y)
initial_bearing = np.degrees(initial_bearing)
return (initial_bearing + 360) % 360 # Normalize to 0-360 degrees
# Initialize a new bearing column
data["bearing"] = np.nan # Start with NaN for all rows
# Compute bearing for each bird (tag) individually
for tag in data["tag-local-identifier"].unique():
tag_data = data[data["tag-local-identifier"] == tag].copy()
tag_data.sort_values("timestamp", inplace=True)
# Shifted coordinates to get previous point
lat1 = tag_data["location-lat"].shift(1)
lon1 = tag_data["location-long"].shift(1)
lat2 = tag_data["location-lat"]
lon2 = tag_data["location-long"]
# Compute bearing
bearings = calculate_bearing(lat1, lon1, lat2, lon2)
# Fill NaN with 0 and assign to main DataFrame
data.loc[tag_data.index, "bearing"] = bearings.fillna(0)
# -------------------- STEP 9: Encode Cyclic Time Features & Compute Movement Metrics -------------------- #
# Ensure timestamp exists
if "timestamp" not in data.columns:
raise KeyError("Column 'timestamp' not found. Required for extracting hour/month.")
# Convert timestamp to datetime
data["timestamp"] = pd.to_datetime(data["timestamp"])
# Extract hour and month
data["hour"] = data["timestamp"].dt.hour
data["month"] = data["timestamp"].dt.month
# Encode hour and month as cyclic features
data["hour_sin"] = np.sin(2 * np.pi * data["hour"] / 24)
data["hour_cos"] = np.cos(2 * np.pi * data["hour"] / 24)
data["month_sin"] = np.sin(2 * np.pi * data["month"] / 12)
data["month_cos"] = np.cos(2 * np.pi * data["month"] / 12)
# Drop original hour and month
data.drop(["hour", "month"], axis=1, inplace=True)
# -------------------- Compute time difference in hours -------------------- #
data["time_diff(hrs)"] = data["timestamp"].diff().dt.total_seconds() / 3600
# -------------------- Compute distance using haversine -------------------- #
from haversine import haversine
def compute_distance(row1, row2):
if pd.isnull(row1["location-lat"]) or pd.isnull(row1["location-long"]) or \
pd.isnull(row2["location-lat"]) or pd.isnull(row2["location-long"]):
return 0
return haversine(
(row1["location-lat"], row1["location-long"]),
(row2["location-lat"], row2["location-long"])
)
# Calculate distance row-by-row
data["distance(km)"] = [
compute_distance(data.iloc[i - 1], data.iloc[i]) if i != 0 else 0
for i in range(len(data))
]
# -------------------- Compute speed (km/hr) -------------------- #
data["speed(km/hr)"] = data["distance(km)"] / data["time_diff(hrs)"]
data["speed(km/hr)"] = data["speed(km/hr)"].replace([np.inf, -np.inf], np.nan).fillna(0)
# -------------------- STEP 10: Reorder Columns -------------------- #
desired_order = [
"tag-local-identifier", "timestamp", "year", "month_sin", "month_cos",
"hour_sin", "hour_cos", "time_diff(hrs)", "distance(km)", "speed(km/hr)",
"ECMWF Interim Full Daily Invariant High Vegetation Cover", "bearing",
"location-long", "location-lat"
]
# Only keep columns that exist in the DataFrame
existing_columns = [col for col in desired_order if col in data.columns]
# Warn about missing columns (optional)
missing_columns = [col for col in desired_order if col not in data.columns]
if missing_columns:
print(f"⚠️ Warning: The following columns were not found and will be skipped: {missing_columns}")
# Reorder DataFrame with existing columns
data = data[existing_columns]
data.head()
tag-local-identifier | timestamp | year | month_sin | month_cos | hour_sin | hour_cos | time_diff(hrs) | distance(km) | speed(km/hr) | ECMWF Interim Full Daily Invariant High Vegetation Cover | bearing | location-long | location-lat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 91732 | 2009-05-27 14:00:00 | 2009 | 0.5 | -0.866025 | -0.500000 | -0.866025 | NaN | 0.000000 | 0.000000 | 0.960771 | 0.000000 | 24.58617 | 61.24783 |
1 | 91732 | 2009-05-27 20:00:00 | 2009 | 0.5 | -0.866025 | -0.866025 | 0.500000 | 6.0 | 1.699247 | 0.283208 | 0.959197 | 187.236713 | 24.58217 | 61.23267 |
2 | 91732 | 2009-05-28 05:00:00 | 2009 | 0.5 | -0.866025 | 0.965926 | 0.258819 | 9.0 | 5.632128 | 0.625792 | 0.947799 | 208.929407 | 24.53133 | 61.18833 |
3 | 91732 | 2009-05-28 08:00:00 | 2009 | 0.5 | -0.866025 | 0.866025 | -0.500000 | 3.0 | 5.643323 | 1.881108 | 0.959182 | 28.716637 | 24.58200 | 61.23283 |
4 | 91732 | 2009-05-28 14:00:00 | 2009 | 0.5 | -0.866025 | -0.500000 | -0.866025 | 6.0 | 0.032132 | 0.005355 | 0.959247 | 123.620959 | 24.58250 | 61.23267 |
data.drop(columns=['year', 'time_diff(hrs)', 'ECMWF Interim Full Daily Invariant High Vegetation Cover'], axis=1, inplace=True)
data.head()
tag-local-identifier | timestamp | month_sin | month_cos | hour_sin | hour_cos | distance(km) | speed(km/hr) | bearing | location-long | location-lat | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 91732 | 2009-05-27 14:00:00 | 0.5 | -0.866025 | -0.500000 | -0.866025 | 0.000000 | 0.000000 | 0.000000 | 24.58617 | 61.24783 |
1 | 91732 | 2009-05-27 20:00:00 | 0.5 | -0.866025 | -0.866025 | 0.500000 | 1.699247 | 0.283208 | 187.236713 | 24.58217 | 61.23267 |
2 | 91732 | 2009-05-28 05:00:00 | 0.5 | -0.866025 | 0.965926 | 0.258819 | 5.632128 | 0.625792 | 208.929407 | 24.53133 | 61.18833 |
3 | 91732 | 2009-05-28 08:00:00 | 0.5 | -0.866025 | 0.866025 | -0.500000 | 5.643323 | 1.881108 | 28.716637 | 24.58200 | 61.23283 |
4 | 91732 | 2009-05-28 14:00:00 | 0.5 | -0.866025 | -0.500000 | -0.866025 | 0.032132 | 0.005355 | 123.620959 | 24.58250 | 61.23267 |
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Calculate speed for each bird using Haversine formula
data['speed'] = 0.0
data['distance'] = 0.0
for tag in data["tag-local-identifier"].unique():
tag_data = data[data["tag-local-identifier"] == tag].copy()
lat1 = np.radians(tag_data["location-lat"].shift(1))
lon1 = np.radians(tag_data["location-long"].shift(1))
lat2 = np.radians(tag_data["location-lat"])
lon2 = np.radians(tag_data["location-long"])
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat / 2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2)**2
c = 2 * np.arcsin(np.sqrt(a))
distance = 6371 * c
time_diff = tag_data["timestamp"].diff().dt.total_seconds() / 3600
speed = distance / time_diff
speed = speed.fillna(0)
data.loc[tag_data.index, "speed"] = speed
data.loc[tag_data.index, "distance"] = distance.fillna(0)
# Select bird
bird_tag = 91732
bird_data = data[data["tag-local-identifier"] == bird_tag]
# Replace NaNs
bird_data = data[data["tag-local-identifier"] == bird_tag].copy()
bird_data.loc[:, "bearing"] = bird_data["bearing"].fillna(0)
# Plotting
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle(f"Visualizations for Bird Tag: {bird_tag}", fontsize=16)
# Top row
axes[0, 0].hist(bird_data["speed"], bins=50, color='skyblue')
axes[0, 0].set_title("Speed Distribution")
axes[0, 0].set_xlabel("Speed (km/hr)")
axes[0, 1].scatter(bird_data["location-long"], bird_data["location-lat"], c='blue', s=10)
axes[0, 1].set_title("Path (Lat vs Long)")
axes[0, 1].set_xlabel("Longitude")
axes[0, 1].set_ylabel("Latitude")
axes[0, 2].boxplot(
[bird_data["speed"], bird_data["distance"], bird_data["bearing"]],
tick_labels=["Speed", "Distance", "Bearing"]
)
axes[0, 2].set_title("Outlier Detection (Boxplots)")
# Bottom row — NEW PLOTS
# Time Series: Speed
axes[1, 0].plot(bird_data["timestamp"], bird_data["speed"], color='green')
axes[1, 0].set_title("Speed over Time")
axes[1, 0].set_xlabel("Time")
axes[1, 0].tick_params(axis='x', rotation=45)
# Scatter: Distance vs Speed
axes[1, 1].scatter(bird_data["distance"], bird_data["speed"], alpha=0.5, color='purple')
axes[1, 1].set_title("Distance vs Speed")
axes[1, 1].set_xlabel("Distance (km)")
axes[1, 1].set_ylabel("Speed (km/hr)")
# Histogram: Bearings
axes[1, 2].hist(bird_data["bearing"], bins=30, color='orange')
axes[1, 2].set_title("Bearing Distribution")
axes[1, 2].set_xlabel("Bearing (°)")
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display
# Function to calculate time difference and plot graphs
def plot_time_gap_analysis(tag_id):
"""
Analyzes the relationship between time intervals and speed/distance.
Parameters:
tag_id (int or str): The unique identifier of the bird.
"""
data_tag = data[data['tag-local-identifier'] == tag_id].copy()
data_tag['timestamp'] = pd.to_datetime(data_tag['timestamp'])
data_tag = data_tag.sort_values(by='timestamp')
# Compute time difference in hours
data_tag['time_diff'] = data_tag['timestamp'].diff().dt.total_seconds() / 3600
if data_tag.empty:
print(f"No data available for tag {tag_id}")
return
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
fig.suptitle(f'Time Interval Analysis for Bird {tag_id}', fontsize=14)
# Histogram of time intervals
axes[0].hist(data_tag['time_diff'].dropna(), bins=30, edgecolor='black')
axes[0].set_title('Time Interval Distribution')
axes[0].set_xlabel('Time Interval (hours)')
axes[0].set_ylabel('Frequency')
# Scatter plot of distance vs. time interval
axes[1].scatter(data_tag['time_diff'], data_tag['distance(km)'], alpha=0.5)
axes[1].set_title('Distance vs. Time Interval')
axes[1].set_xlabel('Time Interval (hours)')
axes[1].set_ylabel('Distance (km)')
# Scatter plot of speed vs. time interval
axes[2].scatter(data_tag['time_diff'], data_tag['speed(km/hr)'], alpha=0.5)
axes[2].set_title('Speed vs. Time Interval')
axes[2].set_xlabel('Time Interval (hours)')
axes[2].set_ylabel('Speed (km/hr)')
plt.show()
# Create dropdown widget to select bird tag
tag_selector = widgets.Dropdown(
options=data['tag-local-identifier'].unique(),
description='Select Tag:',
style={'description_width': 'initial'}
)
# Display dropdown and link it to the function
# display(tag_selector)
widgets.interactive(plot_time_gap_analysis, tag_id=tag_selector)
interactive(children=(Dropdown(description='Select Tag:', options=(np.int64(91732), np.int64(91733), np.int64(…
import numpy as np
import pandas as pd
import folium
from folium.plugins import MarkerCluster
from sklearn.cluster import DBSCAN
# Assuming df has columns: ['timestamp', 'location-lat', 'location-long', 'speed(km/hr)']
resting_threshold = 0.025 # km/hr
resting_points = data[data['speed(km/hr)'] <= resting_threshold].copy()
# Clustering with DBSCAN
epsilon = 0.1 # Adjust based on typical stopover site size
min_samples = 10 # Minimum points to form a cluster
db = DBSCAN(eps=epsilon, min_samples=min_samples, metric='haversine').fit(np.radians(resting_points[['location-lat', 'location-long']]))
# Assign cluster labels
resting_points['cluster'] = db.labels_
# Create a folium map centered at the first resting point
center_lat, center_long = resting_points.iloc[0][['location-lat', 'location-long']]
m = folium.Map(location=[center_lat, center_long], zoom_start=8)
# Color mapping for clusters
colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'lightblue', 'pink', 'black', 'gray']
marker_cluster = MarkerCluster().add_to(m)
# Plot resting points with cluster colors
for _, row in resting_points.iterrows():
cluster = row['cluster']
color = colors[cluster % len(colors)] if cluster != -1 else "black" # Noise in black
folium.CircleMarker(
location=[row['location-lat'], row['location-long']],
radius=4,
color=color,
fill=True,
fill_color=color,
fill_opacity=0.7,
popup=f"Cluster: {cluster}"
).add_to(marker_cluster)
# Show the map
m