Wildlife Movement Prediction using Deep Learning 🦅¶

This project leverages deep learning techniques to analyze GPS tracking data of wildlife, focusing on bird migration patterns. Using geospatial and temporal data, we preprocess and visualize trajectories, compute inter-location distances using the Haversine formula, and apply sequential modeling techniques (e.g., GRU) to predict animal movement.


📂 Dataset Information¶

  • Source: Movement Ecology Dataset (hosted via Google Drive)
  • Size: ~90,000 entries
  • Fields: Timestamp, Location (lat/lon), Species, Sensor Type, Vegetation Indexes, etc.

💻 Project Goals¶

  • Clean and preprocess geospatial data
  • Compute movement trajectories
  • Build and train GRU-based deep learning model
  • Predict next location(s) in movement sequence
%pip install gdown geopy pandas numpy matplotlib seaborn scikit-learn tensorflow --quiet
Note: you may need to restart the kernel to use updated packages.
import numpy as np
import pandas as pd
import math
import gdown
import pandas as pd

# Extract the File ID from your link
file_id = "1o1umq9xOuvhE7rKWpop82tvkKYjtIPyX"  # Extracted from your Google Drive link

# Correct Google Drive direct download URL
download_url = f"https://drive.google.com/uc?id={file_id}"

# Define output file name
output_file = "migration_original.csv"

# Download the file
gdown.download(download_url, output_file, quiet=False)
/Users/apple/Library/Python/3.9/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Downloading...
From: https://drive.google.com/uc?id=1o1umq9xOuvhE7rKWpop82tvkKYjtIPyX
To: /Users/apple/Desktop/Wildlife_Movement_Prediction/migration_original.csv
100%|██████████████████████████████████████| 22.3M/22.3M [00:07<00:00, 2.92MB/s]
'migration_original.csv'
# Load the dataset
df = pd.read_csv('migration_original.csv')
print(df.shape)
df.head()
(89867, 15)
event-id visible timestamp location-long location-lat manually-marked-outlier visible.1 sensor-type individual-taxon-canonical-name tag-local-identifier individual-local-identifier study-name ECMWF Interim Full Daily Invariant Low Vegetation Cover NCEP NARR SFC Vegetation at Surface ECMWF Interim Full Daily Invariant High Vegetation Cover
0 1082620685 True 2009-05-27 14:00:00.000 24.58617 61.24783 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.039229 NaN 0.960771
1 1082620686 True 2009-05-27 20:00:00.000 24.58217 61.23267 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.040803 NaN 0.959197
2 1082620687 True 2009-05-28 05:00:00.000 24.53133 61.18833 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.052201 NaN 0.947799
3 1082620688 True 2009-05-28 08:00:00.000 24.58200 61.23283 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.040818 NaN 0.959182
4 1082620689 True 2009-05-28 14:00:00.000 24.58250 61.23267 NaN True gps Larus fuscus 91732 91732A Navigation experiments in lesser black-backed ... 0.040753 NaN 0.959247
# Check for unique values in all the columns
for column in df.columns:
  print(f'The Unique Columns present in "{column}" are: ',df[column].unique(), "\n")
The Unique Columns present in "event-id" are:  [1082620685 1082620686 1082620687 ... 1082710937 1082710938 1082710939] 

The Unique Columns present in "visible" are:  [ True] 

The Unique Columns present in "timestamp" are:  ['2009-05-27 14:00:00.000' '2009-05-27 20:00:00.000'
 '2009-05-28 05:00:00.000' ... '2015-08-26 21:00:00.000'
 '2015-08-27 06:00:00.000' '2015-08-27 09:00:00.000'] 

The Unique Columns present in "location-long" are:  [24.58617 24.58217 24.53133 ... 35.69217 35.71483 35.66567] 

The Unique Columns present in "location-lat" are:  [61.24783 61.23267 61.18833 ... 64.95367 64.97133 65.019  ] 

The Unique Columns present in "manually-marked-outlier" are:  [nan] 

The Unique Columns present in "visible.1" are:  [ True] 

The Unique Columns present in "sensor-type" are:  ['gps'] 

The Unique Columns present in "individual-taxon-canonical-name" are:  ['Larus fuscus'] 

The Unique Columns present in "tag-local-identifier" are:  [91732 91733 91734 91735 91737 91738 91739 91740 91741 91742 91743 91744
 91745 91746 91747 91748 91749 91750 91751 91752 91754 91755 91756 91758
 91759 91761 91762 91763 91764 91765 91766 91767 91769 91771 91774 91775
 91776 91777 91778 91779 91780 91781 91782 91783 91785 91786 91787 91788
 91789 91794 91795 91797 91798 91799 91800 91802 91803 91807 91809 91810
 91811 91812 91813 91814 91815 91816 91819 91821 91823 91824 91825 91826
 91827 91828 91829 91830 91831 91832 91835 91836 91837 91838 91839 91843
 91845 91846 91848 91849 91852 91854 91858 91861 91862 91864 91865 91866
 91870 91871 91872 91875 91876 91877 91878 91880 91881 91884 91885 91893
 91894 91897 91900 91901 91903 91907 91908 91910 91911 91913 91916 91918
 91919 91920 91921 91924 91929 91930] 

The Unique Columns present in "individual-local-identifier" are:  ['91732A' '91733A' '91734A' '91735A' '91737A' '91738A' '91739A' '91740A'
 '91741A' '91742A' '91743A' '91744A' '91745A' '91746A' '91747A' '91748A'
 '91749A' '91750A' '91751A' '91752A' '91754A' '91755A' '91756A' '91758A'
 '91759A' '91761A' '91762A' '91763A' '91764A' '91765A' '91766A' '91767A'
 '91769A' '91771A' '91774A' '91775A' '91776A' '91777A' '91778A' '91779A'
 '91780A' '91781A' '91782A' '91783A' '91785A' '91786A' '91787A' '91788A'
 '91789A' '91794A' '91795A' '91797A' '91798A' '91799A' '91800A' '91802A'
 '91803A' '91807A' '91809A' '91810A' '91811A' '91812A' '91813A' '91814A'
 '91815A' '91816A' '91819A' '91821A' '91823A' '91824A' '91825A' '91826A'
 '91827A' '91828A' '91829A' '91830A' '91831A' '91832A' '91835A' '91836A'
 '91837A' '91838A' '91839A' '91843A' '91845A' '91846A' '91848A' '91849A'
 '91852A' '91854A' '91858A' '91861A' '91862A' '91864A' '91865A' '91866A'
 '91870A' '91871A' '91872A' '91875A' '91876A' '91877A' '91878A' '91880A'
 '91881A' '91884A' '91885A' '91893A' '91894A' '91897A' '91900A' '91901A'
 '91903A' '91907A' '91908A' '91910A' '91911A' '91913A' '91916A' '91918A'
 '91919A' '91920A' '91921A' '91924A' '91929A' '91930A'] 

The Unique Columns present in "study-name" are:  ['Navigation experiments in lesser black-backed gulls (data from Wikelski et al. 2015)'] 

The Unique Columns present in "ECMWF Interim Full Daily Invariant Low Vegetation Cover" are:  [0.03922896 0.0408028  0.0522006  ... 0.82435717 0.82432803 0.82430923] 

The Unique Columns present in "NCEP NARR SFC Vegetation at Surface" are:  [nan] 

The Unique Columns present in "ECMWF Interim Full Daily Invariant High Vegetation Cover" are:  [0.96077104 0.9591972  0.9477994  ... 0.17564283 0.17567197 0.17569077] 

'''
    1. As some Columns contain all null values or a single value for entire dataset,
       they does not contribute to the output at all thus we will drop them.

    2. Also as "individual-local-identifier" is the same as that of "tag-local-identifier"
       just with an extension of "A" they become similar.

    3. Again, "ECMWF Interim Full Daily Invariant Low Vegetation Cover" and "ECMWF Interim Full Daily Invariant High Vegetation Cover"
       are complementary to each other. Thus, do not need to keep both in our dataset for training. Any one can be dropped.
'''

# define Columns to drop
columns_to_drop = ["event-id","visible", "visible.1", "sensor-type", "individual-taxon-canonical-name", "study-name", "manually-marked-outlier",
                   "NCEP NARR SFC Vegetation at Surface", "individual-local-identifier", "ECMWF Interim Full Daily Invariant Low Vegetation Cover"]

# drop unwanted columns
data = df.drop(columns=columns_to_drop)
data.head()
timestamp location-long location-lat tag-local-identifier ECMWF Interim Full Daily Invariant High Vegetation Cover
0 2009-05-27 14:00:00.000 24.58617 61.24783 91732 0.960771
1 2009-05-27 20:00:00.000 24.58217 61.23267 91732 0.959197
2 2009-05-28 05:00:00.000 24.53133 61.18833 91732 0.947799
3 2009-05-28 08:00:00.000 24.58200 61.23283 91732 0.959182
4 2009-05-28 14:00:00.000 24.58250 61.23267 91732 0.959247
# Check for data types and Null counts using info() method
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89867 entries, 0 to 89866
Data columns (total 5 columns):
 #   Column                                                    Non-Null Count  Dtype  
---  ------                                                    --------------  -----  
 0   timestamp                                                 89867 non-null  object 
 1   location-long                                             89867 non-null  float64
 2   location-lat                                              89867 non-null  float64
 3   tag-local-identifier                                      89867 non-null  int64  
 4   ECMWF Interim Full Daily Invariant High Vegetation Cover  89867 non-null  float64
dtypes: float64(3), int64(1), object(1)
memory usage: 3.4+ MB
# -------------------- STEP 1: Load Data -------------------- #
# Ensure timestamps are in datetime format
data["timestamp"] = pd.to_datetime(data["timestamp"])
# -------------------- STEP 2: Group Data by Tag -------------------- #
# This groups the dataset by "tag-local-identifier" so that birds are clearly separated
data = data.sort_values(by=["tag-local-identifier", "timestamp"]).reset_index(drop=True)
# -------------------- STEP 3: Extract Date-Time Features -------------------- #
data["year"] = data["timestamp"].dt.year
data["month"] = data["timestamp"].dt.month
data["hour"] = data["timestamp"].dt.hour
# STEP 4: Compute Time Difference per Bird
data["time_diff(hrs)"] = (
    data.groupby("tag-local-identifier")["timestamp"]
    .diff().dt.total_seconds() / 3600
)

# Replace NaN with 0 for the first row per bird (safe assignment)
data["time_diff(hrs)"] = data["time_diff(hrs)"].fillna(0)
data.head(10)
timestamp location-long location-lat tag-local-identifier ECMWF Interim Full Daily Invariant High Vegetation Cover year month hour time_diff(hrs)
0 2009-05-27 14:00:00 24.58617 61.24783 91732 0.960771 2009 5 14 0.0
1 2009-05-27 20:00:00 24.58217 61.23267 91732 0.959197 2009 5 20 6.0
2 2009-05-28 05:00:00 24.53133 61.18833 91732 0.947799 2009 5 5 9.0
3 2009-05-28 08:00:00 24.58200 61.23283 91732 0.959182 2009 5 8 3.0
4 2009-05-28 14:00:00 24.58250 61.23267 91732 0.959247 2009 5 14 6.0
5 2009-05-28 20:00:00 24.58617 61.24767 91732 0.960761 2009 5 20 6.0
6 2009-05-29 05:00:00 24.58600 61.24767 91732 0.960736 2009 5 5 9.0
7 2009-05-29 08:00:00 24.58617 61.24767 91732 0.960761 2009 5 8 3.0
8 2009-05-29 14:00:00 24.58650 61.24750 91732 0.960799 2009 5 14 6.0
9 2009-05-29 20:00:00 24.56967 61.23883 91732 0.957722 2009 5 20 6.0

Haversine Formula:¶

To calculate the distance between two latitude and longitude points (current and previous), you can use the Haversine formula. This formula calculates the distance between two points on the Earth's surface, taking into account the spherical shape of the Earth.

$$ a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \sin^2\left(\frac{\Delta\lambda}{2}\right) $$

$$ c = 2 \cdot \text{atan2}\left(\sqrt{a}, \sqrt{1 - a}\right) $$

$$ d = R \cdot c $$

Where:

  • $ \phi_1, \phi_2 $ are the latitudes of the two points in radians,
  • $ \lambda_1, \lambda_2 $ are the longitudes of the two points in radians,
  • $ R $ is the Earth's radius (mean radius = 6,371 km),
  • $ d $ is the distance between the points in kilometers.
# -------------------- STEP 5: Define Haversine Distance Function -------------------- #
def haversine(lat1, lon1, lat2, lon2):
    """Compute the great-circle distance (Haversine formula) between two GPS coordinates."""
    R = 6371  # Earth radius in kilometers
    phi1, phi2 = map(math.radians, [lat1, lat2])
    delta_phi = math.radians(lat2 - lat1)
    delta_lambda = math.radians(lon2 - lon1)

    a = math.sin(delta_phi / 2)**2 + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    return R * c  # Distance in km
# -------------------- STEP 6: Compute Distance per Bird -------------------- #
# Compute previous lat/lon per bird before applying Haversine formula
data["prev_lat"] = data.groupby("tag-local-identifier")["location-lat"].shift(1)
data["prev_lon"] = data.groupby("tag-local-identifier")["location-long"].shift(1)

# Apply Haversine function to compute distances
data["distance(km)"] = data.apply(
    lambda row: haversine(row["prev_lat"], row["prev_lon"], row["location-lat"], row["location-long"])
    if pd.notna(row["prev_lat"]) and pd.notna(row["prev_lon"]) else 0, axis=1
)

# Drop temporary columns
data.drop(columns=["prev_lat", "prev_lon"], inplace=True)
# ------------------------ STEP 7: Compute Speed (Avoid Division by Zero) ------------------------ #
data["speed(km/hr)"] = data["distance(km)"] / data["time_diff(hrs)"]

# Replace inf, -inf, NaN → First np.nan, then 0
data["speed(km/hr)"] = data["speed(km/hr)"].replace([np.inf, -np.inf], np.nan)
data["speed(km/hr)"] = data["speed(km/hr)"].fillna(0)  # Replace NaN with 0
!pip install folium
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: folium in /Users/apple/Library/Python/3.9/lib/python/site-packages (0.20.0)
Requirement already satisfied: branca>=0.6.0 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (0.8.1)
Requirement already satisfied: jinja2>=2.9 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (3.1.6)
Requirement already satisfied: numpy in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2.0.2)
Requirement already satisfied: requests in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2.32.4)
Requirement already satisfied: xyzservices in /Users/apple/Library/Python/3.9/lib/python/site-packages (from folium) (2025.4.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from jinja2>=2.9->folium) (3.0.2)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (3.4.2)
Requirement already satisfied: idna<4,>=2.5 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /Users/apple/Library/Python/3.9/lib/python/site-packages (from requests->folium) (2025.8.3)
import folium
from folium.plugins import AntPath
import pandas as pd
import numpy as np
import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import colormaps

# Load the dataset
data = pd.read_csv("migration_original.csv")

# Convert timestamp and extract year/month
data['timestamp'] = pd.to_datetime(data['timestamp'])
data['year'] = data['timestamp'].dt.year
data['month'] = data['timestamp'].dt.month

# Get unique years, months, and tags
unique_years = sorted(data['year'].unique())
unique_months = sorted(data['month'].unique())
unique_tags = sorted(data['tag-local-identifier'].unique())

# Set up color mapping for each unique tag
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))
tag_colors = {
    tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
    for i, tag in enumerate(unique_tags)
}

# Create dropdowns for year and month selection
year_selector = widgets.SelectMultiple(
    options=unique_years,
    value=[unique_years[0]],
    description='Years',
    layout=widgets.Layout(height='100px', width='150px')
)

month_selector = widgets.SelectMultiple(
    options=unique_months,
    value=[unique_months[0]],
    description='Months',
    layout=widgets.Layout(height='100px', width='150px')
)

# Button to update the map
update_button = widgets.Button(description="Update Map")

# Output widget to display the map
output = widgets.Output()

# Function to plot movement interactively
def plot_movement_interactive(years, months):
    filtered_data = data[data["year"].isin(years) & data["month"].isin(months)]

    if filtered_data.empty:
        with output:
            clear_output(wait=True)
            print("No data available for the selected period.")
        return None

    first_point = (
        filtered_data.iloc[0]["location-lat"],
        filtered_data.iloc[0]["location-long"]
    )
    m = folium.Map(location=first_point, zoom_start=6)

    for tag in filtered_data["tag-local-identifier"].unique():
        bird_data = filtered_data[filtered_data["tag-local-identifier"] == tag]
        bird_color = tag_colors[tag]
        path = list(zip(bird_data["location-lat"], bird_data["location-long"]))

        folium.PolyLine(path, color=bird_color, weight=2.5, opacity=0.8).add_to(m)

        for _, row in bird_data.iterrows():
            folium.CircleMarker(
                location=(row["location-lat"], row["location-long"]),
                radius=3,
                color=bird_color,
                fill=True,
                fill_color=bird_color,
                popup=f"Tag: {tag}<br>Time: {row['timestamp']}"
            ).add_to(m)

    return m

# Button click handler
def on_button_click(b):
    output.clear_output(wait=True)
    selected_years = list(year_selector.value)
    selected_months = list(month_selector.value)
    map_plot = plot_movement_interactive(selected_years, selected_months)

    if map_plot:
        with output:
            display(map_plot)

update_button.on_click(on_button_click)

# Display UI
display(widgets.HBox([year_selector, month_selector]))
display(update_button, output)
HBox(children=(SelectMultiple(description='Years', index=(0,), layout=Layout(height='100px', width='150px'), o…
Button(description='Update Map', style=ButtonStyle())
Output()
import folium
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output
from folium.plugins import TimestampedGeoJson
import matplotlib.pyplot as plt
from matplotlib import colormaps

# Load and preprocess the dataset
data = pd.read_csv("migration_original.csv")
data['timestamp'] = pd.to_datetime(data['timestamp'])
data['year'] = data['timestamp'].dt.year
data['month'] = data['timestamp'].dt.month

# Get unique tag-local-identifiers
unique_tags = sorted(data['tag-local-identifier'].unique())

# Assign unique colors to each tag using colormap
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))
tag_colors = {
    tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
    for i, tag in enumerate(unique_tags)
}

# Create dropdown widgets
tag_selector = widgets.Dropdown(
    options=unique_tags,
    value=unique_tags[0],
    description='Tag:',
    layout=widgets.Layout(width='200px')
)

year_selector = widgets.Dropdown(
    options=[],
    description='Year:',
    layout=widgets.Layout(width='200px')
)

month_selector = widgets.Dropdown(
    options=[],
    description='Month:',
    layout=widgets.Layout(width='200px')
)

update_button = widgets.Button(description="Update Map")
output = widgets.Output()

# Update year and month dropdowns dynamically
def update_year_month_dropdowns(tag):
    filtered_data = data[data["tag-local-identifier"] == tag]
    unique_years = sorted(filtered_data['year'].unique())
    unique_months = sorted(filtered_data['month'].unique())

    year_selector.options = unique_years
    if unique_years:
        year_selector.value = unique_years[0]

    month_selector.options = unique_months
    if unique_months:
        month_selector.value = unique_months[0]

# Function to plot movement using TimestampedGeoJson
def plot_movement_interactive(tag, year, month):
    filtered_data = data[
        (data["tag-local-identifier"] == tag) &
        (data["year"] == year) &
        (data["month"] == month)
    ]

    if filtered_data.empty:
        with output:
            clear_output(wait=True)
            print("No data available for the selected tag, year, and month.")
        return None

    filtered_data = filtered_data.sort_values(by="timestamp")
    bird_color = tag_colors[tag]
    first_point = (filtered_data.iloc[0]["location-lat"], filtered_data.iloc[0]["location-long"])
    m = folium.Map(location=first_point, zoom_start=8)

    features = []
    path_coordinates = []

    for _, row in filtered_data.iterrows():
        point_feature = {
            'type': 'Feature',
            'geometry': {
                'type': 'Point',
                'coordinates': [row["location-long"], row["location-lat"]]
            },
            'properties': {
                'time': row['timestamp'].isoformat(),
                'popup': f"Tag: {tag}<br>Time: {row['timestamp']}",
                'icon': 'circle',
                'iconstyle': {
                    'fillColor': bird_color,
                    'fillOpacity': 0.6,
                    'stroke': 'false',
                    'radius': 5
                }
            }
        }
        features.append(point_feature)
        path_coordinates.append([row["location-long"], row["location-lat"]])

    # Add movement line
    line_feature = {
        'type': 'Feature',
        'geometry': {
            'type': 'LineString',
            'coordinates': path_coordinates
        },
        'properties': {
            'times': [row['timestamp'].isoformat() for _, row in filtered_data.iterrows()],
            'style': {
                'color': bird_color,
                'weight': 2
            }
        }
    }
    features.append(line_feature)

    TimestampedGeoJson(
        {'type': 'FeatureCollection', 'features': features},
        period='PT1M',
        add_last_point=True,
        auto_play=True,
        loop=False,
        max_speed=30,
        loop_button=True,
        date_options='YYYY/MM/DD HH:mm:ss',
        time_slider_drag_update=True
    ).add_to(m)

    return m

# Button click logic
def on_button_click(b):
    output.clear_output(wait=True)
    selected_tag = tag_selector.value
    selected_year = year_selector.value
    selected_month = month_selector.value
    map_plot = plot_movement_interactive(selected_tag, selected_year, selected_month)
    if map_plot:
        with output:
            display(map_plot)

# When tag changes, update year/month dropdowns
def on_tag_change(change):
    update_year_month_dropdowns(change['new'])

tag_selector.observe(on_tag_change, names='value')
update_year_month_dropdowns(tag_selector.value)
update_button.on_click(on_button_click)

# Display all UI components
display(widgets.VBox([tag_selector, year_selector, month_selector]))
display(update_button, output)
VBox(children=(Dropdown(description='Tag:', layout=Layout(width='200px'), options=(np.int64(91732), np.int64(9…
Button(description='Update Map', style=ButtonStyle())
Output()
# -------------------- STEP 8: Compute Bearing (Direction of Movement) -------------------- #

# Function to calculate bearing between two GPS points
def calculate_bearing(lat1, lon1, lat2, lon2):
    """
    Calculate the initial bearing (direction) from point (lat1, lon1) to (lat2, lon2).
    The result is in degrees (0° = North, 90° = East, 180° = South, 270° = West).
    """
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])

    delta_lon = lon2 - lon1
    x = np.sin(delta_lon) * np.cos(lat2)
    y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(delta_lon)

    initial_bearing = np.arctan2(x, y)
    initial_bearing = np.degrees(initial_bearing)

    return (initial_bearing + 360) % 360  # Normalize to 0-360 degrees

# Initialize a new bearing column
data["bearing"] = np.nan  # Start with NaN for all rows

# Compute bearing for each bird (tag) individually
for tag in data["tag-local-identifier"].unique():
    tag_data = data[data["tag-local-identifier"] == tag].copy()
    tag_data.sort_values("timestamp", inplace=True)

    # Shifted coordinates to get previous point
    lat1 = tag_data["location-lat"].shift(1)
    lon1 = tag_data["location-long"].shift(1)
    lat2 = tag_data["location-lat"]
    lon2 = tag_data["location-long"]

    # Compute bearing
    bearings = calculate_bearing(lat1, lon1, lat2, lon2)

    # Fill NaN with 0 and assign to main DataFrame
    data.loc[tag_data.index, "bearing"] = bearings.fillna(0)
# -------------------- STEP 9: Encode Cyclic Time Features & Compute Movement Metrics -------------------- #

# Ensure timestamp exists
if "timestamp" not in data.columns:
    raise KeyError("Column 'timestamp' not found. Required for extracting hour/month.")

# Convert timestamp to datetime
data["timestamp"] = pd.to_datetime(data["timestamp"])

# Extract hour and month
data["hour"] = data["timestamp"].dt.hour
data["month"] = data["timestamp"].dt.month

# Encode hour and month as cyclic features
data["hour_sin"] = np.sin(2 * np.pi * data["hour"] / 24)
data["hour_cos"] = np.cos(2 * np.pi * data["hour"] / 24)
data["month_sin"] = np.sin(2 * np.pi * data["month"] / 12)
data["month_cos"] = np.cos(2 * np.pi * data["month"] / 12)

# Drop original hour and month
data.drop(["hour", "month"], axis=1, inplace=True)

# -------------------- Compute time difference in hours -------------------- #
data["time_diff(hrs)"] = data["timestamp"].diff().dt.total_seconds() / 3600

# -------------------- Compute distance using haversine -------------------- #
from haversine import haversine

def compute_distance(row1, row2):
    if pd.isnull(row1["location-lat"]) or pd.isnull(row1["location-long"]) or \
       pd.isnull(row2["location-lat"]) or pd.isnull(row2["location-long"]):
        return 0
    return haversine(
        (row1["location-lat"], row1["location-long"]),
        (row2["location-lat"], row2["location-long"])
    )

# Calculate distance row-by-row
data["distance(km)"] = [
    compute_distance(data.iloc[i - 1], data.iloc[i]) if i != 0 else 0
    for i in range(len(data))
]

# -------------------- Compute speed (km/hr) -------------------- #
data["speed(km/hr)"] = data["distance(km)"] / data["time_diff(hrs)"]
data["speed(km/hr)"] = data["speed(km/hr)"].replace([np.inf, -np.inf], np.nan).fillna(0)
# -------------------- STEP 10: Reorder Columns -------------------- #
desired_order = [
    "tag-local-identifier", "timestamp", "year", "month_sin", "month_cos",
    "hour_sin", "hour_cos", "time_diff(hrs)", "distance(km)", "speed(km/hr)",
    "ECMWF Interim Full Daily Invariant High Vegetation Cover", "bearing",
    "location-long", "location-lat"
]

# Only keep columns that exist in the DataFrame
existing_columns = [col for col in desired_order if col in data.columns]

# Warn about missing columns (optional)
missing_columns = [col for col in desired_order if col not in data.columns]
if missing_columns:
    print(f"⚠️ Warning: The following columns were not found and will be skipped: {missing_columns}")

# Reorder DataFrame with existing columns
data = data[existing_columns]
data.head()
tag-local-identifier timestamp year month_sin month_cos hour_sin hour_cos time_diff(hrs) distance(km) speed(km/hr) ECMWF Interim Full Daily Invariant High Vegetation Cover bearing location-long location-lat
0 91732 2009-05-27 14:00:00 2009 0.5 -0.866025 -0.500000 -0.866025 NaN 0.000000 0.000000 0.960771 0.000000 24.58617 61.24783
1 91732 2009-05-27 20:00:00 2009 0.5 -0.866025 -0.866025 0.500000 6.0 1.699247 0.283208 0.959197 187.236713 24.58217 61.23267
2 91732 2009-05-28 05:00:00 2009 0.5 -0.866025 0.965926 0.258819 9.0 5.632128 0.625792 0.947799 208.929407 24.53133 61.18833
3 91732 2009-05-28 08:00:00 2009 0.5 -0.866025 0.866025 -0.500000 3.0 5.643323 1.881108 0.959182 28.716637 24.58200 61.23283
4 91732 2009-05-28 14:00:00 2009 0.5 -0.866025 -0.500000 -0.866025 6.0 0.032132 0.005355 0.959247 123.620959 24.58250 61.23267
data.drop(columns=['year', 'time_diff(hrs)', 'ECMWF Interim Full Daily Invariant High Vegetation Cover'], axis=1, inplace=True)
data.head()
tag-local-identifier timestamp month_sin month_cos hour_sin hour_cos distance(km) speed(km/hr) bearing location-long location-lat
0 91732 2009-05-27 14:00:00 0.5 -0.866025 -0.500000 -0.866025 0.000000 0.000000 0.000000 24.58617 61.24783
1 91732 2009-05-27 20:00:00 0.5 -0.866025 -0.866025 0.500000 1.699247 0.283208 187.236713 24.58217 61.23267
2 91732 2009-05-28 05:00:00 0.5 -0.866025 0.965926 0.258819 5.632128 0.625792 208.929407 24.53133 61.18833
3 91732 2009-05-28 08:00:00 0.5 -0.866025 0.866025 -0.500000 5.643323 1.881108 28.716637 24.58200 61.23283
4 91732 2009-05-28 14:00:00 0.5 -0.866025 -0.500000 -0.866025 0.032132 0.005355 123.620959 24.58250 61.23267
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Calculate speed for each bird using Haversine formula
data['speed'] = 0.0
data['distance'] = 0.0

for tag in data["tag-local-identifier"].unique():
    tag_data = data[data["tag-local-identifier"] == tag].copy()

    lat1 = np.radians(tag_data["location-lat"].shift(1))
    lon1 = np.radians(tag_data["location-long"].shift(1))
    lat2 = np.radians(tag_data["location-lat"])
    lon2 = np.radians(tag_data["location-long"])

    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = np.sin(dlat / 2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    distance = 6371 * c

    time_diff = tag_data["timestamp"].diff().dt.total_seconds() / 3600
    speed = distance / time_diff
    speed = speed.fillna(0)

    data.loc[tag_data.index, "speed"] = speed
    data.loc[tag_data.index, "distance"] = distance.fillna(0)

# Select bird
bird_tag = 91732
bird_data = data[data["tag-local-identifier"] == bird_tag]

# Replace NaNs
bird_data = data[data["tag-local-identifier"] == bird_tag].copy()
bird_data.loc[:, "bearing"] = bird_data["bearing"].fillna(0)


# Plotting
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle(f"Visualizations for Bird Tag: {bird_tag}", fontsize=16)

# Top row
axes[0, 0].hist(bird_data["speed"], bins=50, color='skyblue')
axes[0, 0].set_title("Speed Distribution")
axes[0, 0].set_xlabel("Speed (km/hr)")

axes[0, 1].scatter(bird_data["location-long"], bird_data["location-lat"], c='blue', s=10)
axes[0, 1].set_title("Path (Lat vs Long)")
axes[0, 1].set_xlabel("Longitude")
axes[0, 1].set_ylabel("Latitude")

axes[0, 2].boxplot(
    [bird_data["speed"], bird_data["distance"], bird_data["bearing"]],
    tick_labels=["Speed", "Distance", "Bearing"]
)

axes[0, 2].set_title("Outlier Detection (Boxplots)")

# Bottom row — NEW PLOTS
# Time Series: Speed
axes[1, 0].plot(bird_data["timestamp"], bird_data["speed"], color='green')
axes[1, 0].set_title("Speed over Time")
axes[1, 0].set_xlabel("Time")
axes[1, 0].tick_params(axis='x', rotation=45)

# Scatter: Distance vs Speed
axes[1, 1].scatter(bird_data["distance"], bird_data["speed"], alpha=0.5, color='purple')
axes[1, 1].set_title("Distance vs Speed")
axes[1, 1].set_xlabel("Distance (km)")
axes[1, 1].set_ylabel("Speed (km/hr)")

# Histogram: Bearings
axes[1, 2].hist(bird_data["bearing"], bins=30, color='orange')
axes[1, 2].set_title("Bearing Distribution")
axes[1, 2].set_xlabel("Bearing (°)")

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
No description has been provided for this image
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display

# Function to calculate time difference and plot graphs
def plot_time_gap_analysis(tag_id):
    """
    Analyzes the relationship between time intervals and speed/distance.

    Parameters:
    tag_id (int or str): The unique identifier of the bird.
    """
    data_tag = data[data['tag-local-identifier'] == tag_id].copy()
    data_tag['timestamp'] = pd.to_datetime(data_tag['timestamp'])
    data_tag = data_tag.sort_values(by='timestamp')

    # Compute time difference in hours
    data_tag['time_diff'] = data_tag['timestamp'].diff().dt.total_seconds() / 3600

    if data_tag.empty:
        print(f"No data available for tag {tag_id}")
        return

    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    fig.suptitle(f'Time Interval Analysis for Bird {tag_id}', fontsize=14)

    # Histogram of time intervals
    axes[0].hist(data_tag['time_diff'].dropna(), bins=30, edgecolor='black')
    axes[0].set_title('Time Interval Distribution')
    axes[0].set_xlabel('Time Interval (hours)')
    axes[0].set_ylabel('Frequency')

    # Scatter plot of distance vs. time interval
    axes[1].scatter(data_tag['time_diff'], data_tag['distance(km)'], alpha=0.5)
    axes[1].set_title('Distance vs. Time Interval')
    axes[1].set_xlabel('Time Interval (hours)')
    axes[1].set_ylabel('Distance (km)')

    # Scatter plot of speed vs. time interval
    axes[2].scatter(data_tag['time_diff'], data_tag['speed(km/hr)'], alpha=0.5)
    axes[2].set_title('Speed vs. Time Interval')
    axes[2].set_xlabel('Time Interval (hours)')
    axes[2].set_ylabel('Speed (km/hr)')

    plt.show()

# Create dropdown widget to select bird tag
tag_selector = widgets.Dropdown(
    options=data['tag-local-identifier'].unique(),
    description='Select Tag:',
    style={'description_width': 'initial'}
)

# Display dropdown and link it to the function
# display(tag_selector)
widgets.interactive(plot_time_gap_analysis, tag_id=tag_selector)
interactive(children=(Dropdown(description='Select Tag:', options=(np.int64(91732), np.int64(91733), np.int64(…
import numpy as np
import pandas as pd
import folium
from folium.plugins import MarkerCluster
from sklearn.cluster import DBSCAN

# Assuming df has columns: ['timestamp', 'location-lat', 'location-long', 'speed(km/hr)']
resting_threshold = 0.025  # km/hr
resting_points = data[data['speed(km/hr)'] <= resting_threshold].copy()

# Clustering with DBSCAN
epsilon = 0.1  # Adjust based on typical stopover site size
min_samples = 10  # Minimum points to form a cluster
db = DBSCAN(eps=epsilon, min_samples=min_samples, metric='haversine').fit(np.radians(resting_points[['location-lat', 'location-long']]))

# Assign cluster labels
resting_points['cluster'] = db.labels_

# Create a folium map centered at the first resting point
center_lat, center_long = resting_points.iloc[0][['location-lat', 'location-long']]
m = folium.Map(location=[center_lat, center_long], zoom_start=8)

# Color mapping for clusters
colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'lightblue', 'pink', 'black', 'gray']
marker_cluster = MarkerCluster().add_to(m)

# Plot resting points with cluster colors
for _, row in resting_points.iterrows():
    cluster = row['cluster']
    color = colors[cluster % len(colors)] if cluster != -1 else "black"  # Noise in black
    folium.CircleMarker(
        location=[row['location-lat'], row['location-long']],
        radius=4,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        popup=f"Cluster: {cluster}"
    ).add_to(marker_cluster)

# Show the map
m
Make this Notebook Trusted to load map: File -> Trust Notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping



X = data.drop(columns=['timestamp', 'location-long', 'location-lat']).to_numpy()
y = data[['location-long', 'location-lat']].to_numpy()
tags = data['tag-local-identifier'].to_numpy()


unique_tags = np.unique(tags)


def create_sequences(tag_data, tag_labels, seq_length):
    X_seq, y_seq = [], []
    for i in range(len(tag_data) - seq_length):  
        X_seq.append(tag_data[i:i + seq_length])
        y_seq.append(tag_labels[i + seq_length]) 
    return np.array(X_seq), np.array(y_seq)


X_sequences, y_sequences = [], []
sequence_length = 10  


for tag in unique_tags:
    tag_indices = np.where(tags == tag)[0]  # Get indices for the tag
    tag_data = X[tag_indices]
    tag_labels = y[tag_indices]

    if len(tag_data) > sequence_length:
        X_seq, y_seq = create_sequences(tag_data, tag_labels, sequence_length)
        X_sequences.append(X_seq)
        y_sequences.append(y_seq)


X_sequences = np.vstack(X_sequences)
y_sequences = np.vstack(y_sequences)

#train and testing validation split (80-10-10)
X_train, X_temp, y_train, y_temp = train_test_split(X_sequences, y_sequences, test_size=0.2, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)


scaler_X = StandardScaler()
X_train = scaler_X.fit_transform(X_train.reshape(-1, X_train.shape[2])).reshape(X_train.shape)
X_val = scaler_X.transform(X_val.reshape(-1, X_val.shape[2])).reshape(X_val.shape)
X_test = scaler_X.transform(X_test.reshape(-1, X_test.shape[2])).reshape(X_test.shape)


scaler_y = StandardScaler()
y_train = scaler_y.fit_transform(y_train)
y_val = scaler_y.transform(y_val)
y_test = scaler_y.transform(y_test)


num_features = X_train.shape[2] 

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, GRU, Dense

model = Sequential([
    Input(shape=(sequence_length, num_features)),  
    GRU(64, return_sequences=True),
    GRU(32, return_sequences=False),
    Dense(16, activation='relu'),
    Dense(2)
])



model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])


history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    batch_size=32,
    callbacks=[EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)]
)


model.save("my_model.keras")



plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Training History')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Epoch 1/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 8s 3ms/step - loss: 0.4611 - mae: 0.4933 - val_loss: 0.2269 - val_mae: 0.3314
Epoch 2/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.2014 - mae: 0.3132 - val_loss: 0.1678 - val_mae: 0.2746
Epoch 3/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.1566 - mae: 0.2651 - val_loss: 0.1383 - val_mae: 0.2474
Epoch 4/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.1356 - mae: 0.2414 - val_loss: 0.1142 - val_mae: 0.2190
Epoch 5/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.1191 - mae: 0.2233 - val_loss: 0.1172 - val_mae: 0.2151
Epoch 6/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.1074 - mae: 0.2084 - val_loss: 0.0984 - val_mae: 0.1973
Epoch 7/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0972 - mae: 0.1954 - val_loss: 0.0990 - val_mae: 0.1947
Epoch 8/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0870 - mae: 0.1827 - val_loss: 0.0818 - val_mae: 0.1729
Epoch 9/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0750 - mae: 0.1696 - val_loss: 0.0730 - val_mae: 0.1649
Epoch 10/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0689 - mae: 0.1608 - val_loss: 0.0722 - val_mae: 0.1606
Epoch 11/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0674 - mae: 0.1577 - val_loss: 0.0734 - val_mae: 0.1597
Epoch 12/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0604 - mae: 0.1488 - val_loss: 0.0640 - val_mae: 0.1523
Epoch 13/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0553 - mae: 0.1423 - val_loss: 0.0605 - val_mae: 0.1480
Epoch 14/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0537 - mae: 0.1399 - val_loss: 0.0585 - val_mae: 0.1433
Epoch 15/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0516 - mae: 0.1356 - val_loss: 0.0587 - val_mae: 0.1433
Epoch 16/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0451 - mae: 0.1287 - val_loss: 0.0562 - val_mae: 0.1376
Epoch 17/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0451 - mae: 0.1276 - val_loss: 0.0504 - val_mae: 0.1285
Epoch 18/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0428 - mae: 0.1234 - val_loss: 0.0498 - val_mae: 0.1293
Epoch 19/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0398 - mae: 0.1205 - val_loss: 0.0509 - val_mae: 0.1287
Epoch 20/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0374 - mae: 0.1158 - val_loss: 0.0466 - val_mae: 0.1288
Epoch 21/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0364 - mae: 0.1146 - val_loss: 0.0437 - val_mae: 0.1234
Epoch 22/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0327 - mae: 0.1096 - val_loss: 0.0464 - val_mae: 0.1187
Epoch 23/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0345 - mae: 0.1106 - val_loss: 0.0412 - val_mae: 0.1188
Epoch 24/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0313 - mae: 0.1073 - val_loss: 0.0409 - val_mae: 0.1184
Epoch 25/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0305 - mae: 0.1055 - val_loss: 0.0452 - val_mae: 0.1219
Epoch 26/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0297 - mae: 0.1042 - val_loss: 0.0382 - val_mae: 0.1149
Epoch 27/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0283 - mae: 0.1019 - val_loss: 0.0404 - val_mae: 0.1162
Epoch 28/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0273 - mae: 0.0997 - val_loss: 0.0549 - val_mae: 0.1246
Epoch 29/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0298 - mae: 0.1024 - val_loss: 0.0391 - val_mae: 0.1137
Epoch 30/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0263 - mae: 0.0984 - val_loss: 0.0370 - val_mae: 0.1086
Epoch 31/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0255 - mae: 0.0962 - val_loss: 0.0355 - val_mae: 0.1066
Epoch 32/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0247 - mae: 0.0948 - val_loss: 0.0363 - val_mae: 0.1097
Epoch 33/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0242 - mae: 0.0937 - val_loss: 0.0332 - val_mae: 0.1051
Epoch 34/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0223 - mae: 0.0907 - val_loss: 0.0412 - val_mae: 0.1210
Epoch 35/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0232 - mae: 0.0923 - val_loss: 0.0334 - val_mae: 0.1038
Epoch 36/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0201 - mae: 0.0866 - val_loss: 0.0338 - val_mae: 0.1060
Epoch 37/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0213 - mae: 0.0884 - val_loss: 0.0338 - val_mae: 0.1040
Epoch 38/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0210 - mae: 0.0881 - val_loss: 0.0313 - val_mae: 0.0993
Epoch 39/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0200 - mae: 0.0862 - val_loss: 0.0423 - val_mae: 0.1148
Epoch 40/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0199 - mae: 0.0867 - val_loss: 0.0292 - val_mae: 0.0982
Epoch 41/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0195 - mae: 0.0854 - val_loss: 0.0328 - val_mae: 0.1010
Epoch 42/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0202 - mae: 0.0855 - val_loss: 0.0285 - val_mae: 0.0977
Epoch 43/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0203 - mae: 0.0858 - val_loss: 0.0300 - val_mae: 0.0985
Epoch 44/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0195 - mae: 0.0837 - val_loss: 0.0285 - val_mae: 0.0948
Epoch 45/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0189 - mae: 0.0826 - val_loss: 0.0299 - val_mae: 0.1025
Epoch 46/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0179 - mae: 0.0804 - val_loss: 0.0316 - val_mae: 0.0988
Epoch 47/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0177 - mae: 0.0811 - val_loss: 0.0289 - val_mae: 0.0955
Epoch 48/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0179 - mae: 0.0814 - val_loss: 0.0294 - val_mae: 0.0992
Epoch 49/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0162 - mae: 0.0779 - val_loss: 0.0271 - val_mae: 0.0935
Epoch 50/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0156 - mae: 0.0763 - val_loss: 0.0275 - val_mae: 0.0932
Epoch 51/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0163 - mae: 0.0778 - val_loss: 0.0312 - val_mae: 0.1004
Epoch 52/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0162 - mae: 0.0773 - val_loss: 0.0300 - val_mae: 0.1010
Epoch 53/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0171 - mae: 0.0792 - val_loss: 0.0290 - val_mae: 0.0932
Epoch 54/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0153 - mae: 0.0755 - val_loss: 0.0352 - val_mae: 0.1062
Epoch 55/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0167 - mae: 0.0786 - val_loss: 0.0253 - val_mae: 0.0881
Epoch 56/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0142 - mae: 0.0730 - val_loss: 0.0269 - val_mae: 0.0900
Epoch 57/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0145 - mae: 0.0735 - val_loss: 0.0283 - val_mae: 0.0918
Epoch 58/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0163 - mae: 0.0763 - val_loss: 0.0256 - val_mae: 0.0891
Epoch 59/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0137 - mae: 0.0719 - val_loss: 0.0250 - val_mae: 0.0884
Epoch 60/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0144 - mae: 0.0734 - val_loss: 0.0291 - val_mae: 0.0938
Epoch 61/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0157 - mae: 0.0742 - val_loss: 0.0259 - val_mae: 0.0880
Epoch 62/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0140 - mae: 0.0721 - val_loss: 0.0246 - val_mae: 0.0863
Epoch 63/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0139 - mae: 0.0721 - val_loss: 0.0259 - val_mae: 0.0872
Epoch 64/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0133 - mae: 0.0706 - val_loss: 0.0246 - val_mae: 0.0857
Epoch 65/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0154 - mae: 0.0746 - val_loss: 0.0277 - val_mae: 0.0900
Epoch 66/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0126 - mae: 0.0700 - val_loss: 0.0284 - val_mae: 0.0925
Epoch 67/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0134 - mae: 0.0712 - val_loss: 0.0253 - val_mae: 0.0857
Epoch 68/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0135 - mae: 0.0715 - val_loss: 0.0337 - val_mae: 0.1003
Epoch 69/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0168 - mae: 0.0760 - val_loss: 0.0244 - val_mae: 0.0845
Epoch 70/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0124 - mae: 0.0681 - val_loss: 0.0276 - val_mae: 0.0909
Epoch 71/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0138 - mae: 0.0708 - val_loss: 0.0236 - val_mae: 0.0844
Epoch 72/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0140 - mae: 0.0708 - val_loss: 0.0282 - val_mae: 0.0912
Epoch 73/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0149 - mae: 0.0727 - val_loss: 0.0238 - val_mae: 0.0842
Epoch 74/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0114 - mae: 0.0665 - val_loss: 0.0246 - val_mae: 0.0909
Epoch 75/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0143 - mae: 0.0717 - val_loss: 0.0238 - val_mae: 0.0840
Epoch 76/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0152 - mae: 0.0720 - val_loss: 0.0248 - val_mae: 0.0866
Epoch 77/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0118 - mae: 0.0671 - val_loss: 0.0235 - val_mae: 0.0835
Epoch 78/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0124 - mae: 0.0677 - val_loss: 0.0259 - val_mae: 0.0861
Epoch 79/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0125 - mae: 0.0684 - val_loss: 0.0246 - val_mae: 0.0887
Epoch 80/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0118 - mae: 0.0672 - val_loss: 0.0249 - val_mae: 0.0871
Epoch 81/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0128 - mae: 0.0686 - val_loss: 0.0254 - val_mae: 0.0860
Epoch 82/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0150 - mae: 0.0718 - val_loss: 0.0241 - val_mae: 0.0840
Epoch 83/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0112 - mae: 0.0659 - val_loss: 0.0261 - val_mae: 0.0886
Epoch 84/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0154 - mae: 0.0722 - val_loss: 0.0257 - val_mae: 0.0879
Epoch 85/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0124 - mae: 0.0677 - val_loss: 0.0289 - val_mae: 0.0904
Epoch 86/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0139 - mae: 0.0706 - val_loss: 0.0245 - val_mae: 0.0833
Epoch 87/100
2216/2216 ━━━━━━━━━━━━━━━━━━━━ 7s 3ms/step - loss: 0.0114 - mae: 0.0651 - val_loss: 0.0238 - val_mae: 0.0840
No description has been provided for this image
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import load_model
from tensorflow.keras.losses import MeanSquaredError

# Load the trained model from the 'models' directory
model_path = 'models/next_lat_long_model.h5'
model = load_model(model_path, compile=False)

# Compile the model (useful for evaluation or further training)
model.compile(loss=MeanSquaredError(), optimizer='adam')

# Predict on the scaled test dataset
y_pred_scaled = model.predict(X_test)

# Inverse transform to get actual lat/long values
y_pred = scaler_y.inverse_transform(y_pred_scaled)
y_true = scaler_y.inverse_transform(y_test)

# Compute performance metrics
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

print(f"Mean Absolute Error (MAE): {mae}")
print(f"Root Mean Squared Error (RMSE): {rmse}")

# Create a DataFrame to compare true vs predicted values
results_df = pd.DataFrame({
    'True_Longitude': y_true[:, 0],
    'True_Latitude': y_true[:, 1],
    'Predicted_Longitude': y_pred[:, 0],
    'Predicted_Latitude': y_pred[:, 1]
})

# Calculate individual and combined absolute errors
results_df['Longitude_Error'] = results_df['True_Longitude'] - results_df['Predicted_Longitude']
results_df['Latitude_Error'] = results_df['True_Latitude'] - results_df['Predicted_Latitude']
results_df['Absolute_Error'] = np.sqrt(
    results_df['Longitude_Error']**2 + results_df['Latitude_Error']**2
)

# Export the results to a CSV file
results_df.to_csv('test_predictions_with_errors.csv', index=False)

# Preview the first few rows
print("\nTest Predictions with Errors:")
print(results_df.head())
277/277 ━━━━━━━━━━━━━━━━━━━━ 0s 825us/step
Mean Absolute Error (MAE): 1.332922318958555
Root Mean Squared Error (RMSE): 2.8953520972254196

Test Predictions with Errors:
   True_Longitude  True_Latitude  Predicted_Longitude  Predicted_Latitude  \
0        35.02833       36.69917            33.357147           34.084949   
1        18.72817       54.68117            17.790884           49.833965   
2        43.10750       11.54833            41.998451           18.181252   
3        36.68800       65.00750            35.898228           65.054451   
4        32.79250       -1.80633            32.822536           -2.764980   

   Longitude_Error  Latitude_Error  Absolute_Error  
0         1.671183        2.614221        3.102741  
1         0.937286        4.847205        4.936993  
2         1.109049       -6.632922        6.725001  
3         0.789772       -0.046951        0.791167  
4        -0.030036        0.958650        0.959121  
import folium
from folium import PolyLine, CircleMarker
import pandas as pd

# Load the predictions from the CSV in the datasets/ directory
results_df = pd.read_csv('dataset/test_predictions_with_errors_2.csv')

# Sample the first N points for clarity in visualization
N = 20
sample_df = results_df.head(N)

# Initialize the map centered at the starting actual location
start_coords = [sample_df['True_Latitude'].iloc[0], sample_df['True_Longitude'].iloc[0]]
m = folium.Map(location=start_coords, zoom_start=5)

# Plot actual path (in blue)
actual_path = list(zip(sample_df['True_Latitude'], sample_df['True_Longitude']))
PolyLine(actual_path, color='blue', weight=4, opacity=0.8, tooltip="Actual Path").add_to(m)

# Plot predicted path (in red)
predicted_path = list(zip(sample_df['Predicted_Latitude'], sample_df['Predicted_Longitude']))
PolyLine(predicted_path, color='red', weight=4, opacity=0.8, tooltip="Predicted Path").add_to(m)

# Draw error lines and point markers
for _, row in sample_df.iterrows():
    actual_point = (row['True_Latitude'], row['True_Longitude'])
    predicted_point = (row['Predicted_Latitude'], row['Predicted_Longitude'])

    # Line connecting actual to predicted
    PolyLine([actual_point, predicted_point], color='gray', weight=1, opacity=0.5).add_to(m)

    # Markers
    CircleMarker(actual_point, radius=3, color='blue', fill=True, fill_color='blue').add_to(m)
    CircleMarker(predicted_point, radius=3, color='red', fill=True, fill_color='red').add_to(m)

# Show the map
m
Make this Notebook Trusted to load map: File -> Trust Notebook

✅ Conclusion¶

This project successfully demonstrated a complete end-to-end pipeline for processing, modeling, and predicting wildlife movement data using deep learning. We utilized real-world geospatial datasets and implemented a GRU-based recurrent neural network to model and forecast trajectories.

🚀 Future Scope¶

  • Integrate real-time animal movement data using APIs (e.g., Movebank)
  • Deploy the model as a Streamlit or Flask web app for conservationists
  • Expand to multi-species, multi-continent trajectory analysis
  • Add reinforcement learning to optimize migratory route prediction
  • Collaborate with wildlife reserves for real-world deployment

📌 Author: Aarjun Mahule
📅 Date: May 2025
📍 Nagpur, India