August 2025

August 25, 2025
20 min read

Fine-Tuning GPT model to Predict NIFTY50 Prices

In the previous post we have prepared a dataset consisting of minute level Nifty50 index price for the last 10 years. We've done some data cleansing and split the dataset into training and validation set. Please skim through that if you haven't yet. In this blog we're going to use the training set to fine-tune a GPT model in Azure AI Foundry. Then we'll use the validation set to check whether the fine-tuned model can make a profit for us.

Prepare Training Conversations

The prepared dataset is stored under the dataset directory. We'll use the train_price_movements.csv for fine-tuning the model. Here is the glimpse of the content of the file.

import pandas as pd
training_set = pd.read_csv('dataset/train_price_movements.csv')
training_set.head()

Output

	date	09:16	09:17	09:18	09:19	09:20	09:21	09:22	09:23	...	15:20	15:21	15:22	15:23	15:24	15:25	15:26	15:27	15:28	15:29
0	2015-01-09	0.09	-0.06	0.08	0.08	-0.00	0.00	0.02	-0.09	...	-0.01	0.02	0.01	0.01	0.03	-0.00	-0.04	-0.01	0.01	-0.03
1	2015-01-12	-0.45	0.01	0.04	0.06	0.05	-0.02	0.03	0.06	...	0.04	-0.04	0.02	0.00	-0.01	0.04	0.01	-0.01	0.00	-0.01
2	2015-01-13	0.11	-0.08	-0.05	-0.02	-0.03	-0.01	0.01	-0.09	...	0.08	0.01	0.01	0.01	0.01	0.00	0.00	-0.01	-0.02	0.03
3	2015-01-14	-0.08	0.07	0.02	-0.04	-0.01	-0.03	-0.11	0.04	...	0.08	-0.01	0.02	0.03	0.01	0.01	0.00	0.02	0.02	0.01
4	2015-01-15	0.18	-0.55	-0.10	0.18	0.32	-0.23	-0.12	0.22	...	-0.04	-0.07	-0.11	-0.06	-0.11	-0.05	0.06	-0.05	0.03	0.00

Our expectation is when feed the price movement till 2.30PM, the model should tell us what is the most probable price it will reach before the market close. Based on that price we'll either buy or sell (short) the index at 2.30.

OpenAI GPTs are conversational models. So, the training dataset should be converted into set of conversations. Each conversation should have a system message, a user message and an assistant message. In our case the system message will be a role assignment for the LLM. The user message will be the market movement and the assistant message will be the target price for that day.

How to determine target price?

Before converting the current dataset into conversations, we need to establish a method for determining the target price for each day in the training dataset. This is essential because the training conversation requires the target_price to be included as the assistant's response. Therefore, we must devise a systematic approach to calculate the target price using the price movement data available after 2:30 PM.

We have price movement in percentage from 2.31PM to 3.29PM. As we're going to square off our position by 3.25PM itself, let's consider only upto 3.25PM. And each number in the dataset is relative percentage difference from the previous price. So, it is a geometric progression.

We should know price oscillations with respect to the price at 2.30 at every minute until 3.25. So, if we're buying at 2.30, we should sell at a time when the price reached the peak. Or if we're selling (shorting) we should buy back when the price is at the bottom of the graph. Here we are going to identify the most probable price (or movement) close to the maximum or minimum and use that as the target price.

Please note that we're not taking the best price here. Because the best price will be an outlier in the sequence. It may not properly represent the trend in the price setting. So, take the price ocillation range and take the first standard deviation. By this we have approximately 66% possibility to reach the target price. And this is our expectation from the LLM also. To make profit in long-term, we trade high profit in a day for more stable returns.

Let's create a sequence of price difference with respect to the base price at 2.30 and find it's mean and std. If the mean is positive, let's set the target price as mean+std. Otherwise the short target price is mean-std. Let's do this for first sequence in this dataset.

# Set the date column as index
training_set_indexed = training_set.set_index('date')

target_date = '2015-01-09'
target_time = '14:30'

# Get data for the target date
day_data = training_set_indexed.loc[target_date]

# Print the price movement at 2:30 PM
print(f"Price movement at {target_time} on {target_date}: {day_data.loc[target_time]}%")

# Calculate cumulative price changes from 2:30 PM onwards
# Starting with 1.0 at 2:30 PM (baseline)
price_diff_with_respect_to_230 = []
time_labels = []
cur = 1.0  # Starting value at 2:30 PM

# Generate times from 2:31 PM to 3:25 PM
for hour in [14, 15]:  # 2 PM and 3 PM in 24-hour format
    start_min = 31 if hour == 14 else 0  # Start from 31 min for 2 PM, 0 min for 3 PM
    end_min = 60 if hour == 14 else 26   # End at 60 min for 2 PM, 26 min for 3 PM

    for minute in range(start_min, end_min):
        time_str = f"{hour}:{minute:02d}"

        if time_str in day_data.index:
            # Apply the percentage change: new_value = current * (1 + percentage_change/100)
            percentage_change = day_data.loc[time_str]
            cur = cur * (1 + percentage_change / 100)
            price_diff_with_respect_to_230.append(cur)
            time_labels.append(time_str)

# Create DataFrame with results
result_df = pd.DataFrame({
    'time': time_labels,
    'cumulative_price_factor': price_diff_with_respect_to_230
})

print(f"\nCumulative price changes from 2:30 PM baseline:")
print(f"Mean: {result_df['cumulative_price_factor'].mean():.2f}")
print(f"Std: {result_df['cumulative_price_factor'].std():.2f}")
print(f"Min: {result_df['cumulative_price_factor'].min():.2f}")
print(f"Max: {result_df['cumulative_price_factor'].max():.2f}")

mean = result_df['cumulative_price_factor'].mean()
std = result_df['cumulative_price_factor'].std()

if mean >= 1:
    position = "Call"
    target_percentage = round((mean + std), 4)
else:
    position = "Put"
    target_percentage = round((mean - std), 4)

print(f"{position}. Target Percentage: {target_percentage}")

Output

Price movement at 14:30 on 2015-01-09: 0.01%

Cumulative price changes from 2:30 PM baseline:
Mean: 1.35
Std: 0.24
Min: 0.90
Max: 1.84
Call. Target Percentage: 1.0053

By multiplying the target_percentage with the base price i.e. the price at 2.30PM, we'll get the target price. We have the price sequence data in dataset/train_daily_opens.csv. Let's calculate the position and target price for 09-01-2015.

training_prices = pd.read_csv('dataset/train_daily_opens.csv')
# training_prices.head()
training_prices.set_index('date', inplace=True)
day_price = training_prices.loc[target_date]
base_price = day_price.loc[target_time]
print(f"Base Price at {target_time} on {target_date}: {base_price}")
target_price = base_price * target_percentage
print(f"{position}. Target Price: {target_price}")

OutPut

Base Price at 14:30 on 2015-01-09: 8240.0
Call. Target Price: 8283.672

This price calculation is only for illustration purposes. We're not going to pass it to the LLM. Because, we're not going to give it any price related context. We'll pass only the price movement in percentage and expect it to provide the multiplication factor. We'll multiply the factor with the base price and determine the target price programmatically.

To prepare the training dataset for fine-tuning, we need to convert the entire dataset into a conversational format and save it as a JSONL file. For each day in the dataset, we will calculate the multiplication factor (target_percentage) following the above method. The model will be trained to predict only the target_percentage given the market movement sequence. The target_price will be derived later using the base_price at the time of evaluation.

training_conversations = []

# traverse every item in the training_set_indexed
for index, row in training_set_indexed.iterrows():
    date = index  # The date is the index, not row['date']

    # Convert date string to datetime to get day name
    date_obj = pd.to_datetime(date)
    day_of_the_date = date_obj.day_name()

    # Filter row to only include times from 9:15 AM to 2:30 PM
    # Create time range from 09:15 to 14:30
    times_until_230 = []
    for hour in range(9, 15):  # 9 AM to 2 PM
        start_min = 15 if hour == 9 else 0  # Start from 15 min for 9 AM, 0 min for others
        end_min = 60
        for minute in range(start_min, end_min):
            times_until_230.append(f"{hour:02d}:{minute:02d}")

    # Add 14:30 (2:30 PM)
    times_until_230.append("14:30")

    # Filter row data to only include times until 2:30 PM
    row_until_230 = row[row.index.isin(times_until_230)]

    # Get the current day's data for post-2:30 PM calculations
    day_data = row  # This is the full day's data

    percent_diff_with_respect_to_230 = []
    time_labels = []
    cur = 1.0  # Take the base as 1

    for hour in [14, 15]:
        start_min = 31 if hour == 14 else 0
        end_min = 60 if hour == 14 else 26  # Fixed: changed 'or' to 'else'

        for minute in range(start_min, end_min):  # Fixed: added colon
            time_str = f"{hour}:{minute:02d}"

            if time_str in day_data.index:
                percentage_change = day_data.loc[time_str]
                cur = cur * (1 + percentage_change / 100)
                percent_diff_with_respect_to_230.append(cur)
                time_labels.append(time_str)

    result_df = pd.DataFrame({
        'time': time_labels,
        'cumulative_price_factor': percent_diff_with_respect_to_230
    })

    mean = result_df['cumulative_price_factor'].mean()
    std = result_df['cumulative_price_factor'].std()

    if mean >= 1:
        position = "Call"
        target_percentage = round((mean + std), 4)
    else:
        position = "Put"
        target_percentage = round((mean - std), 4)

    base_price = training_prices.loc[date].loc['14:30']
    target_price = base_price * target_percentage

    # Use row_until_230 instead of full row for the conversation
    conversation = {
        "messages": [
            {
                "role": "system",
                "content": "You are a stock market expert. You will predict the most profit making position and what is the expected percentage change in the next 1 hour given the percentage movement between 9.15 AM and 2.30 PM."
            },
            {
                "role": "user",
                "content": "The price movements in percentage from 9:15 AM to 2:30 PM on " + str(date) + " (" + day_of_the_date + ") are as follows: " + str(row_until_230.values.tolist()) + "."
            },
            {
                "role": "assistant",
                "content": str(target_percentage)
            }
        ]
    }

    training_conversations.append(conversation)

print(f"Generated {len(training_conversations)} training conversations")
print(training_conversations[0])

# Store `training_conversations` in a jsonl file.
import json

with open("dataset/training_conversations_gpt.jsonl", "w", encoding="utf-8") as f:
    for conversation in training_conversations:
        f.write(json.dumps(conversation) + "\n")

Output

Generated 1975 training conversations
{'messages': [{'role': 'system', 'content': 'You are a stock market expert. You will predict the most profit making position and what is the expected percentage change in the next 1 hour given the percentage movement between 9.15 AM and 2.30 PM.'}, {'role': 'user', 'content': 'The price movements in percentage from 9:15 AM to 2:30 PM on 2015-01-09 (Friday) are as follows: [0.0, 0.09, -0.06, 0.08, 0.08, -0.0, 0.0, 0.02, -0.09, 0.0, 0.08, -0.08, -0.07, 0.03, -0.02, 0.04, -0.06, 0.0, -0.08, 0.01, 0.07, -0.05, 0.0, -0.07, 0.04, 0.01, 0.02, 0.01, -0.07, -0.03, 0.09, -0.05, 0.02, -0.04, -0.01, 0.06, -0.1, 0.04, 0.0, -0.0, -0.03, -0.01, 0.05, 0.04, -0.04, 0.03, 0.02, 0.01, 0.02, -0.02, 0.01, -0.02, 0.02, -0.03, 0.0, -0.04, -0.01, -0.04, 0.02, 0.02, 0.03, 0.01, -0.02, -0.0, 0.03, 0.01, 0.0, 0.03, 0.0, -0.01, 0.02, 0.01, -0.02, 0.02, 0.03, -0.0, -0.01, -0.04, 0.02, -0.01, 0.01, -0.02, -0.03, 0.02, -0.01, 0.02, -0.02, -0.02, 0.0, -0.01, -0.01, -0.02, 0.03, -0.02, 0.01, -0.0, -0.01, 0.02, 0.02, -0.01, 0.02, -0.02, 0.03, 0.01, 0.0, -0.01, 0.02, -0.01, -0.05, 0.03, 0.03, -0.02, -0.02, -0.02, -0.02, 0.01, 0.02, -0.01, -0.0, 0.04, -0.04, -0.0, -0.03, -0.02, -0.01, 0.02, -0.01, 0.02, 0.02, -0.13, -0.03, -0.03, 0.02, -0.0, -0.0, -0.02, -0.13, 0.06, -0.07, 0.0, 0.04, 0.05, -0.02, -0.05, -0.01, -0.01, 0.02, 0.03, 0.04, -0.04, 0.0, 0.04, 0.0, -0.01, 0.01, -0.06, 0.02, -0.01, 0.01, -0.01, 0.02, 0.02, -0.01, 0.01, 0.0, 0.0, -0.04, 0.02, -0.05, -0.06, -0.45, 0.05, 0.02, 0.01, 0.04, -0.11, -0.09, -0.02, 0.12, 0.04, 0.08, 0.02, -0.05, 0.01, 0.01, -0.0, -0.05, 0.03, 0.0, 0.07, -0.06, 0.04, -0.09, -0.08, 0.05, 0.01, -0.0, 0.03, 0.0, -0.06, -0.04, 0.0, 0.01, -0.02, -0.08, 0.03, 0.06, -0.02, -0.01, 0.06, 0.03, 0.07, -0.03, -0.1, 0.1, -0.01, -0.02, 0.15, 0.21, -0.02, 0.03, -0.17, 0.15, 0.07, 0.12, 0.08, -0.06, 0.07, -0.06, -0.05, 0.02, 0.07, -0.03, 0.07, 0.01, 0.02, 0.01, -0.06, -0.01, -0.07, -0.15, -0.07, -0.04, 0.01, -0.07, 0.16, -0.06, -0.07, -0.04, 0.02, 0.01, 0.0, 0.04, 0.02, 0.04, -0.0, 0.04, -0.08, 0.02, 0.07, -0.0, -0.07, 0.03, 0.03, 0.04, 0.02, -0.07, 0.01, -0.04, 0.01, 0.03, -0.0, 0.01, -0.02, 0.0, -0.1, 0.04, 0.02, -0.03, -0.04, -0.05, -0.02, -0.01, -0.17, -0.12, 0.14, -0.09, 0.02, 0.0, 0.1, 0.01, 0.04, 0.03, 0.05, -0.08, 0.08, -0.0, 0.04, 0.07, -0.02, -0.16, 0.09, 0.06, 0.04, 0.01, 0.01, 0.02, 0.04, 0.01, -0.05, -0.08, 0.04, -0.03, 0.0, 0.02, 0.01, 0.06, -0.04, -0.12, 0.01, 0.06, 0.04, 0.05, 0.01, 0.07, 0.01, -0.05, -0.01, 0.06, 0.02, -0.03, -0.0, -0.03, 0.14, -0.01, -0.03, 0.07, 0.05, 0.09, 0.02, -0.02, -0.07, -0.0, -0.02, -0.04].'}, {'role': 'assistant', 'content': '1.0053'}]}

Training GPT LLM

I refer to the Azure Documentation to fine-tune an OpenAI GPT model. To do so we need to use Serverless Training Infrastructure. This is cheaper and easier to handle compared to the alternative - Managed Compute Infrastructure. Azure offers three kinds of fine-tuning.

SFT - Supervised Fine Tuning
DPO - Direct Preference Optimization
RFT - Reinforcement Fine-Tuning

Though RFT is recommended for tasks like stock market prediction, we'll go with SFT this time for its simplicity. Our training file is prepared for SFT only. In the future articles, we'll explore other training methods and compare their performance.

To follow this article and fine-tune a model, you need to be an owner or contributor of a Paid Azure Account. Free or trail Azure accounts may not work. We Microsoft employees get $150 Azure credit per month. I'm using that credit for this project.

Now we're going to follow the step-by-step instructions provided at Customize a model with fine-tuning. It is a well written document with screenshots. So, I'm not going to elaborate each step here in this blog.

Here is a short quote from the Azure documentation.

If you train the model on a large amount of internal data, without first pruning the dataset for only the highest quality examples you could end up with a model that performs much worse than expected.

This is the reason we have done all the pre-processing in the previous dataset prep blog.

Below are the steps to be followed to fine-tune a GPT model. I've decided to use gpt-4.1-mini as the base model. It is faster and cheaper because of it's size.

1. Create an Azure Hub project

Create a new Azure Hub resource in Azure AI Foundry. Because Azure AI Foundry resource doesn't have an option to fune-tune models.

2. Start Fine-Tuning

Once the project is opened in the Azure AI Foundry, select Fine-Tuning from ...More. Refer to the screenshot below.

Fine-Tuning Screenshot

In the newly opened dialogue, click the Fine Tune a Model button and select your base model. In my case, I selected gpt-4.1-mini. This step will deploy a resource with the pre-trained model in a specified location. Detailed step-by-step instructions with screenshots can be found in the documentation mentioned above.

Once the resource is created, you'll be asked to select the fine-tuning method and training data.

Select Training Data

Keep the method to default as Supervised and then select the jsonl file we have created in the previous section using "Upload Files" method. We don't have any validation set. We could have split the training-set into training and validation. Just for simplicity we're not doing it now.

The validation set prepared in the previous blog is intentionally excluded from this step. This ensures that the validation dataset remains untouched by the LLM during training, preserving its integrity for independent backtesting and evaluation purposes.

Leave every other value to default and click Submit. Now fine-tuning will start. This will take some hours based on how quickly the descent of loss occurs. Here is the loss function drop graph after 45 minutes of training.

Fine Tuning Metrics

The training got completed after 2 hours. Suppose if you're using bigger models like gpt-4.1, you can expect even longer.

Backtesting

With the training complete and the final loss stabilizing around 0.6, the model shows potential for making accurate predictions. The next step is to test its real-world performance by evaluating if GPT-4.1-mini can generate profitable trades in NIFTY50 index trading on our validation set.

Deploy the fine-tuned model and copy it's API key and Endpoints to the .env file as below.

(.venv) bala@bala-swe-test-ubuntu-2404:~/0xba1a.github.com$ cat .env
AZURE_API_KEY='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
AZURE_API_BASE='https://ai-bkannan-per-XXXXXXXXXXXXXXXXX.cognitiveservices.azure.com/'
AZURE_API_VERSION='2024-12-01-preview'
AZURE_DEPLOYMENT="gpt-4-1-mini-2025-04-14-ft-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

Replace XXX with appropriate values.

We use AzureOpenAI library to inference with the newly deployed model. Similar to the training conversation prep we need to prepare conversations using the validation dataset and send to the LLM.

Let me give 1 million rupees initial value and see how much profit it makes in the 247 days of validation dataset.

Make sure that you've installed openai in your venv.

(.venv) $ pip install openai

This is a big chunk of code. In the notebook you will find it in a single cell. For easier understanding let me give it in multiple chunks. Import all the necessary modules and setup variables to interact with Azure-Foundry. And create a AzureOpenAI client to start the inferencing.

import pandas as pd
from dotenv import load_dotenv
import os
from openai import AzureOpenAI
import time
import random

load_dotenv()

endpoint = os.environ.get('AZURE_API_BASE')
model_name = "gpt-4.1-mini"
deployment = os.environ.get('AZURE_DEPLOYMENT')

subscription_key = os.environ.get('AZURE_API_KEY')
api_version = os.environ.get('AZURE_API_VERSION')

client = AzureOpenAI(
    api_version=api_version,
    azure_endpoint=endpoint,
    api_key=subscription_key,
)

Next setup some global variables to control the flow. We've offset and count to test the validation set partially. cash_in_hand represents how much cash is available at any day of the trade. Initially we set it to 1 million INR. Then we've metrics object which counts number profits, losses, time-outs, etc.,. Time out is nothing but time 3.26 PM is reached before making any trade for the day. The variable message is going to have the system and user prompt before sending them to the LLM.

count = 1000
offset = 0
for_index = 0
cash_in_hand = 1000000 # 1 Million starting investment
metrics = {
    "total_trades": 0,
    "profitable_trades": 0,
    "unprofitable_trades": 0,
    "total_profit": 0,
    "total_loss": 0,
    "time_up": 0,
    "stop_loss": 0
}
message = []

Load the validation datasets - price movement and open prices itself.

validation_set = pd.read_csv('dataset/val_price_movements.csv')
validation_set.set_index('date', inplace=True)

validation_price = pd.read_csv('dataset/val_daily_opens.csv')
validation_price.set_index('date', inplace=True)

Create a list which has time string from 09:15 AM to 02:30 PM. This list will serve as an index filter for filtering the validation dataset.

# Create time range from 09:15 to 14:30
times_until_230 = []
for hour in range(9, 14):  # 9 AM to 2 PM
    start_min = 15 if hour == 9 else 0  # Start from 15 min for 9 AM, 0 min for others
    end_min = 30 if hour == 14 else 60
    for minute in range(start_min, end_min):
        times_until_230.append(f"{hour:02d}:{minute:02d}")

# Add 14:30 (2:30 PM)
times_until_230.append("14:30")

Start iterating over every day in the validation set. Skip offset days without processing them.

for index, row in validation_set.iterrows():

    for_index += 1
    if for_index <= offset:
        continue

When starting to process a day, first print the date and the cash-in-hand. Then there is a random delay between 3 and 5 seconds. This delay is to avoid hitting the LLM API ratelimiting. Then we get the date from the index and find out the day of the week of that trade day.

print(f"\n*** Processing date: {index}. Cash In Hand: {cash_in_hand} ***")
    # random delay between 3 to 6 seconds for avoiding rate limiting
    time.sleep(random.uniform(3, 6))
    date = index  # The date is the index, not row['date']

    # Convert date string to datetime to get day name
    date_obj = pd.to_datetime(date)
    day_of_the_date = date_obj.day_name()

Filter only the values upto 2.30 PM from validation dataset and construct a chat message to be passed to the LLM.

    # Filter row data to only include times until 2:30 PM
    row_until_230 = row[row.index.isin(times_until_230)]

    # Get the current day's data for post-2:30 PM calculations
    day_data = row  # This is the full day's data

    messages = [
        {
            "role": "system",
            "content": "You are a stock market expert. You will predict the most profit making position and what is the expected percentage change in the next 1 hour given the percentage movement between 9.15 AM and 2.30 PM."
        },
        {
            "role": "user",
            "content": "The price movements in percentage from 9:15 AM to 2:30 PM on " + str(date) + " (" + day_of_the_date + ") are as follows: " + str(row_until_230.values.tolist()) + "."
        }
    ]

Call the LLM and get its response.

    response = client.chat.completions.create(
        messages=messages,
        temperature=1.0,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0,
        model=deployment
    )

    print(response.choices[0].message.content)

Calculate the position and target price based on the response from the LLM. If the LLM provided output is more than 1, it is a "Call" otherwise it's a "Put". We don't have corresponding options prices. So, we take it against the index price itself.

    target_percentage = float(response.choices[0].message.content.strip())
    base_price = validation_price.loc[date].loc['14:30']
    metrics['total_trades'] += 1
    if target_percentage >= 1:
        position = "Call"
        target_price = base_price * target_percentage
        stop_loss = base_price * 0.99 # 1% fall is stop loss
        print(f"Position: {position}, Base Price: {base_price} Target Price: {target_price}, Stop Loss: {stop_loss}")
    else:
        position = "Put"
        target_price = base_price * target_percentage
        stop_loss = base_price * 1.01 # 1% raise is stop loss
        print(f"Position: {position}, Base Price: {base_price} Target Price: {target_price}, Stop Loss: {stop_loss}")

Now the actual validation happens. Iterate over the price from 2:31 PM to 3:26 PM. Book profit or loss based on below rules.

If target price reached, book profit
If stop-loss reached, book loss
If neither of the above condition occurred by time reached 3.26 PM, close the position with whatever price at the moment

    # Price sequence from 2.31 to 3.30
    price_after_230 = validation_price.loc[date].loc['14:30':]
    for time_str, price in price_after_230.items():
        # Calculate the expected price movement
        if position == "Call":

            if price >= target_price:
                # Book profit
                total_stocks = cash_in_hand // base_price
                profit = (price - base_price) * total_stocks
                cash_in_hand += profit
                print(f"{time_str}: Booking Profit Rs.{profit} at price {price} for {total_stocks} stocks")
                metrics['profitable_trades'] += 1
                break

            elif price <= stop_loss:
                total_stocks = cash_in_hand // base_price
                loss = (base_price - price) * total_stocks
                cash_in_hand -= loss
                print(f"{time_str}: Booking Loss Rs.{loss} at price {price} for {total_stocks} stocks")
                metrics['unprofitable_trades'] += 1
                metrics['stop_loss'] += 1
                break

            if time_str == '15:26':
                metrics['time_up'] += 1
                total_stocks = cash_in_hand // base_price
                final_price = price

                if final_price > base_price:
                    profit = (final_price - base_price) * total_stocks
                    cash_in_hand += profit
                    print(f"{time_str}: Booking Profit Rs.{profit} at price {final_price} for {total_stocks} stocks")
                    metrics['profitable_trades'] += 1

                else:
                    loss = (base_price - final_price) * total_stocks
                    cash_in_hand -= loss
                    print(f"{time_str}: Booking Loss Rs.{loss} at price {final_price} for {total_stocks} stocks")
                    metrics['unprofitable_trades'] += 1

                break

        else:

            if price <= target_price:
                total_stocks = cash_in_hand // base_price
                profit = (base_price - price) * total_stocks
                cash_in_hand += profit
                print(f"{time_str}: Booking Profit Rs.{profit} at price {price} for {total_stocks} stocks")
                metrics['profitable_trades'] += 1
                break

            elif price >= stop_loss:
                total_stocks = cash_in_hand // base_price
                loss = (price - base_price) * total_stocks
                cash_in_hand -= loss
                print(f"{time_str}: Booking Loss Rs.{loss} at price {price} for {total_stocks} stocks")
                metrics['unprofitable_trades'] += 1
                break

            if time_str == '15:26':
                metrics['time_up'] += 1
                total_stocks = cash_in_hand // base_price
                final_price = price

                if final_price < base_price:
                    profit = (base_price - final_price) * total_stocks
                    cash_in_hand += profit
                    print(f"{time_str}: Booking Profit Rs.{profit} at price {final_price} for {total_stocks} stocks")
                    metrics['profitable_trades'] += 1

                else:
                    loss = (final_price - base_price) * total_stocks
                    cash_in_hand -= loss
                    print(f"{time_str}: Booking Loss Rs.{loss} at price {final_price} for {total_stocks} stocks")
                    metrics['unprofitable_trades'] += 1

                break

At the end of the for loop, break it if expected number of trades have happened. At the end print the final cash-in-hand and metrics.

    if metrics['total_trades'] >= count:
        break

print(f"\n*** Final Cash in Hand: Rs.{cash_in_hand} ***")
print(f"Metrics: {metrics}")

Verdict

Overall gpt-4.1-fine-tuned made a loss for me! At end of 246 trades final cash remaining in hand was Rs.97,4114. Approximately Rs.26,000 loss.

...
...
...
*** Processing date: 2015-03-18. Cash In Hand: 971755.4500000001 ***
0.9978
Position: Put, Base Price: 8693.8 Target Price: 8674.673639999999, Stop Loss: 8780.738
14:43: Booking Profit Rs.2358.75 at price 8672.55 for 111.0 stocks

*** Final Cash in Hand: Rs.974114.2000000001 ***
Metrics: {'total_trades': 246, 'profitable_trades': 145, 'unprofitable_trades': 101, 'total_profit': 0, 'total_loss': 0, 'time_up': 136, 'stop_loss': 3}

But the important statistics here is Total number of profitable trades are higher than total number of unprofitable trades. Profitable trades account for 60% of the times. We hit stop-loss only 3 times.

Conclusion

The concerning aspect is the high occurrence of time-outs, which happened 55% (136) of the time. This suggests that the 1-sigma target price might be overly ambitious. While theoretically, it should be achievable 66% of the time, the results indicate otherwise. This approach is quite rudimentary, relying on the LLM to deduce all patterns and features from the limited data provided. To improve, we should consider engineering additional features, such as the day of the week, and implementing a custom loss function that accounts for time-outs more effectively.

Despite the lack of profitability, this experiment was a fascinating learning experience. The absence of profit doesn't necessarily mean there's no underlying pattern—it simply means we haven't uncovered it yet. If you have suggestions or ideas for a better approach, feel free to connect with me on LinkedIn. Additionally, if you spot any issues in the code, please raise them on GitHub. Thank you for taking the time to read and engage with this content!

The complete code can be found in this Notebook

August 23, 2025
13 min read

Fine Tuning LLMs to Predict NIFTY50 Price

Language has developed naturally over time, without strict rules in its early stages. This makes it one of the first structured systems shaped by human behavior. By studying language, we can uncover patterns in human thinking and actions. Large Language Models (LLMs) have shown impressive abilities to understand and predict human language. In a similar way, business systems like pricing have also evolved organically. Pricing patterns existed long before formal accounting rules were created. If LLMs can predict the next word in a sentence by understanding language patterns, could they also predict the next price using historical data? This article dives into how LLMs can be trained and fine-tuned to analyze NIFTY50 price trends and predict future movements, exploring its potential in financial markets.

My focus here is only Intraday trading. Because we don't have any price value for NIFTY in the closed hours of market and there can be too many factors affecting the next day opening price. So, trying to predict price across days with current data is nothing but halucination.

Our system will keep watching the market movement until 2.30 PM. And based on the movement so far, it will make a call or put at 2.30 and square off before the market ends. We'll ask the model to predit a target price. There will be a stop-loss calculated programatically. Additionally, the cutoff time will be 3.25 PM to close the position. So, if neither target price or stop-loss is reached, the position will be squared off for whatever price at 3.25 PM.

Data Preperation

The performance of a model is relying only on the data it is trained. So, collecting and preparing the historical Nifty50 data is very crucial. Simply dumping the historical data to any LLM won't make any good other than getting you pay for the GPUs.

Prepare the environment

 $ python3 -m venv .venv
 $ source .venv/bin/activate
 $ pip install pandas

I took the 9 years of Nifty-50 candlestick data from this GitHub repo. It has minute level Open, High, Low and Close information for every market functioning day. We just need one number per minute. So, let's remove everything else other than open.

import pandas as pd

candle_stick_data = pd.read_csv("dataset/nifty50_candlestick_data.csv")
candle_stick_data["datetime"] = pd.to_datetime(candle_stick_data["Date"] + " " + candle_stick_data["Time"], format="%d-%m-%Y %H:%M:%S")
candle_stick_data.set_index("datetime", inplace=True)
candle_stick_data.drop(columns=["Date", "Time", "High", "Low", "Close", "Instrument"], inplace=True, errors="ignore")

n50_minute_level_opens = candle_stick_data
n50_minute_level_opens.head()

Output

	Open
datetime
2015-01-09 09:15:00	8285.45
2015-01-09 09:16:00	8292.60
2015-01-09 09:17:00	8287.40
2015-01-09 09:18:00	8294.25
2015-01-09 09:19:00	8300.60

We need to train the model on daily movements. So, the data should be grouped date-wise. In the dataset the second value is not proper. As we don't worry much about the second, we'll unify that to the the minute. Also, remove any data that is beyond typical Indian market hours.

market_hours_filter = (n50_minute_level_opens.index.time >= pd.Timestamp('09:15:00').time()) & \
                      (n50_minute_level_opens.index.time <= pd.Timestamp('15:30:00').time())

n50_min_opens = n50_minute_level_opens[market_hours_filter].copy()

n50_min_opens['date'] = n50_min_opens.index.date
n50_min_opens['time'] = n50_min_opens.index.strftime('%H:%M')

n50_daily_opens = n50_min_opens.pivot_table(
    index='date',
    columns='time',
    values='Open',
    aggfunc='first'  # In case there are duplicates, take the first value
)

n50_daily_opens.head()

Output

time	09:15	09:16	09:17	09:18	09:19	09:20	09:21	09:22	09:23	09:24	...	15:20	15:21	15:22	15:23	15:24	15:25	15:26	15:27	15:28	15:29
date
2015-01-09	8285.45	8292.60	8287.40	8294.25	8300.6	8300.50	8300.65	8302.45	8294.85	8295.20	...	8280.8	8282.35	8283.40	8284.35	8286.9	8286.65	8283.45	8282.35	8283.25	8280.50
2015-01-12	8291.35	8254.20	8255.25	8258.15	8263.2	8267.45	8266.05	8268.80	8273.85	8266.75	...	8329.5	8326.55	8328.05	8328.05	8327.2	8330.20	8330.90	8329.95	8329.95	8328.85
2015-01-13	8346.15	8355.15	8348.70	8344.50	8342.5	8340.35	8339.75	8340.45	8333.30	8326.05	...	8304.9	8305.75	8306.50	8307.15	8308.0	8308.20	8308.25	8307.25	8305.85	8308.20
2015-01-14	8307.25	8300.85	8307.00	8309.05	8305.4	8304.70	8302.20	8293.10	8296.70	8306.85	...	8280.1	8278.90	8280.90	8283.60	8284.3	8285.35	8285.50	8286.95	8288.30	8288.90
2015-01-15	8425.20	8440.45	8394.35	8386.05	8401.1	8428.00	8408.25	8398.00	8416.70	8421.95	...	8497.6	8491.80	8482.05	8477.25	8468.0	8463.80	8469.05	8464.80	8467.25	8467.45

This data is very refreshing. Nifty50 was in its eight thousands in 2015. If you have invested in the index, you would've trippled your money in the past 10 years. This gives us a small problem. I try to make the LLM to understand the trend in human price setting behaviour. Whether the price is 8000 or 24000, the trend should be same. But if I pass different prices (tokens), LLM may consider them as different behaviours. This may lead to a situation where LLM will give less importance to the original feature that defines the trend.

So, I decide to pass the price difference in percentage instead of passing the price itself. The idea here is to keep the open price at 9.15 as the reference and calculate the difference in percentage for 9:16. Then using price of 9.16 as reference calculate the different for 9.17. Like this we continue for the whole day with respective to the previous minute price. My assumption is whatever the price is, we humans tend to set the new price relatively.

# Calculate percentage price movements within each day
# For each day, calculate percentage change from previous minute
n50_daily_price_movements = n50_daily_opens.pct_change(axis=1, fill_method=None) * 100

# Set the first column (first minute of each day) to 0 as there's no reference price
n50_daily_price_movements.iloc[:, 0] = 0

n50_daily_price_movements.head()

Output

time	09:15	09:16	09:17	09:18	09:19	09:20	09:21	09:22	09:23	09:24	...	15:20	15:21	15:22	15:23	15:24	15:25	15:26	15:27	15:28	15:29
date
2015-01-09	0.0	0.086296	-0.062707	0.082656	0.076559	-0.001205	0.001807	0.021685	-0.091539	0.004219	...	-0.008453	0.018718	0.012678	0.011469	0.030781	-0.003017	-0.038616	-0.013279	0.010866	-0.033200
2015-01-12	0.0	-0.448057	0.012721	0.035129	0.061152	0.051433	-0.016934	0.033269	0.061073	-0.085813	...	0.042037	-0.035416	0.018015	0.000000	-0.010206	0.036027	0.008403	-0.011403	0.000000	-0.013205
2015-01-13	0.0	0.107834	-0.077198	-0.050307	-0.023968	-0.025772	-0.007194	0.008394	-0.085727	-0.087000	...	0.080137	0.010235	0.009030	0.007825	0.010232	0.002407	0.000602	-0.012036	-0.016853	0.028293
2015-01-14	0.0	-0.077041	0.074089	0.024678	-0.043928	-0.008428	-0.030103	-0.109610	0.043410	0.122338	...	0.078563	-0.014493	0.024158	0.032605	0.008450	0.012675	0.001810	0.017500	0.016291	0.007239
2015-01-15	0.0	0.181005	-0.546179	-0.098876	0.179465	0.320196	-0.234338	-0.121904	0.222672	0.062376	...	-0.042347	-0.068255	-0.114817	-0.056590	-0.109116	-0.049598	0.062029	-0.050183	0.028943	0.002362

Let's analyze the price movement for insights. Since the true value of an asset (like Nifty50) is often unclear, we use the current and previous prices as proxies—a concept tied to Daniel Kahneman's Anchoring Bias. This bias suggests that sudden price increases are likely to be corrected downward, while sharp decreases are adjusted upward. As a result, the average price movement should ideally converge to zero.

# Calculate statistics excluding NaN values and the first column (which is all zeros)
movements_data = n50_daily_price_movements.iloc[:, 1:].values.flatten()  # Exclude first column
movements_data_clean = movements_data[~pd.isna(movements_data)]  # Remove NaN values

print(f"Total data points: {len(movements_data_clean):,}")
print(f"Mean movement: {movements_data_clean.mean():.4f}%")
print(f"Std deviation: {movements_data_clean.std():.4f}%")
print(f"Min movement: {movements_data_clean.min():.4f}%")
print(f"Max movement: {movements_data_clean.max():.4f}%")

Output

Total data points: 849,150
Mean movement: -0.0002%
Std deviation: 0.0409%
Min movement: -2.4480%
Max movement: 6.2991%

Our assumption didn't go wrong. The mean value is very close to zero, -0.0002%. If you see the standard deviation, it is around 0.04%. It means in the last 10 years, for more than 68% percentage of the time we quoated the new price with is ±0.04% relative to the current price.

Note: In our case the ±1σ is 82.69%. It means 82% of times we qouted the new price within ±0.04% of the current price. Similarly Nifty50 movement distribution's ±2σ is 96.01% whereas typical distribution is 95.45%. At 2σ only the price movement converges to normal statistics. The ±3σ is 98.64% but normal is 99.73% - we have more outliers here.

This insight reveals that we tend to adopt a more cautious approach under normal circumstances. However, when pushed to the edge, we take significantly higher risks. The difference in the 3σ range is notable. While 4.28% of data points are expected in this range, only 2.63% are present—nearly half are missing. This suggests that when we decide to take risks, we often overextend, taking unnecessary risks about 50% of the time.

I'm leaving it as an excercise for you to calculate the ranges yourself.

Let's clean the data.

Fill any NaN with previous value

Fix the precision to two decimal degits. It's safe to have 0.04% as our approximate std.

# Fill NaN values with previous value (forward fill along rows)
n50_daily_price_movements = n50_daily_price_movements.ffill(axis=1)

# Round to 2 decimal places
n50_daily_price_movements = n50_daily_price_movements.round(2)

print(f"Price movements DataFrame shape: {n50_daily_price_movements.shape}")
print(f"NaN values remaining: {n50_daily_price_movements.isna().sum().sum()}")

n50_daily_price_movements.head()

Output

Price movements DataFrame shape: (2273, 375)
NaN values remaining: 0

time	09:15	09:16	09:17	09:18	09:19	09:20	09:21	09:22	09:23	09:24	...	15:20	15:21	15:22	15:23	15:24	15:25	15:26	15:27	15:28	15:29
date
2015-01-09	0.0	0.09	-0.06	0.08	0.08	-0.00	0.00	0.02	-0.09	0.00	...	-0.01	0.02	0.01	0.01	0.03	-0.00	-0.04	-0.01	0.01	-0.03
2015-01-12	0.0	-0.45	0.01	0.04	0.06	0.05	-0.02	0.03	0.06	-0.09	...	0.04	-0.04	0.02	0.00	-0.01	0.04	0.01	-0.01	0.00	-0.01
2015-01-13	0.0	0.11	-0.08	-0.05	-0.02	-0.03	-0.01	0.01	-0.09	-0.09	...	0.08	0.01	0.01	0.01	0.01	0.00	0.00	-0.01	-0.02	0.03
2015-01-14	0.0	-0.08	0.07	0.02	-0.04	-0.01	-0.03	-0.11	0.04	0.12	...	0.08	-0.01	0.02	0.03	0.01	0.01	0.00	0.02	0.02	0.01
2015-01-15	0.0	0.18	-0.55	-0.10	0.18	0.32	-0.23	-0.12	0.22	0.06	...	-0.04	-0.07	-0.11	-0.06	-0.11	-0.05	0.06	-0.05	0.03	0.00

To further refine the dataset, we will remove high-stress days. These are days where extreme emotions drive price movements, leading to significant variations. Including such days in the dataset might confuse the model, as it could attempt to scale these outliers alongside normal trading days. By excluding these high-stress days, we aim to create a more consistent dataset that better represents typical market behavior.

By removing these high-stress days, we ensure that the dataset focuses on regular market conditions, which are more representative of typical trading patterns. This adjustment will help the model generalize better and avoid overfitting to rare, extreme scenarios.

To identify and filter out high-stress days, we will use the daily standard deviation as a measure of volatility. Days with a standard deviation outside the range of ±2σ (calculated from the mean daily standard deviation) will be considered high-stress days and excluded from the dataset. Here we calculate the std of every day. And put that in a series then calculate mean and std of that series. So, please don't get confused with std of std.

# Calculate daily standard deviation for each trading day
daily_std = n50_daily_price_movements.std(axis=1)  # std across columns (time) for each day

print(f"Daily std statistics:")
print(f"Mean daily std: {daily_std.mean():.4f}%")
print(f"Std of daily std: {daily_std.std():.4f}%")
print(f"Min daily std: {daily_std.min():.4f}%")
print(f"Max daily std: {daily_std.max():.4f}%")

# Calculate the mean and std of daily standard deviations
mean_daily_std = daily_std.mean()
std_daily_std = daily_std.std()

# Define the acceptable range (±2σ)
lower_bound = mean_daily_std - 2 * std_daily_std
upper_bound = mean_daily_std + 2 * std_daily_std

print(f"\nAcceptable daily std range: {lower_bound:.4f}% to {upper_bound:.4f}%")

# Filter days that fall within ±2σ of mean daily std
days_within_2sigma = (daily_std >= lower_bound) & (daily_std <= upper_bound)

print(f"\nDays analysis:")
print(f"Total days before filtering: {len(n50_daily_price_movements)}")
print(f"Days within ±2σ: {days_within_2sigma.sum()}")
print(f"Days to remove: {len(n50_daily_price_movements) - days_within_2sigma.sum()}")
print(f"Percentage kept: {days_within_2sigma.sum() / len(n50_daily_price_movements) * 100:.2f}%")

# Apply the filter
n50_daily_price_movements_filtered = n50_daily_price_movements[days_within_2sigma]
n50_daily_opens_filtered = n50_daily_opens[days_within_2sigma]

print(f"\nFiltered dataset shape:")
print(f"Price movements: {n50_daily_price_movements_filtered.shape}")
print(f"Daily opens: {n50_daily_opens_filtered.shape}")

# Show some examples of removed days (outliers)
outlier_days = n50_daily_price_movements[~days_within_2sigma]
if len(outlier_days) > 0:
    print(f"\nExamples of removed days (high/low volatility):")
    print(f"Highest volatility day: {daily_std.idxmax()} (std: {daily_std.max():.4f}%)")
    print(f"Lowest volatility day: {daily_std.idxmin()} (std: {daily_std.min():.4f}%)")

n50_daily_price_movements_filtered.head()

Output

Daily std statistics:
Mean daily std: 0.0351%
Std of daily std: 0.0211%
Min daily std: 0.0104%
Max daily std: 0.4735%

Acceptable daily std range: -0.0070% to 0.0772%

Days analysis:
Total days before filtering: 2273
Days within ±2σ: 2221
Days to remove: 52
Percentage kept: 97.71%

Filtered dataset shape:
Price movements: (2221, 375)
Daily opens: (2221, 375)

Examples of removed days (high/low volatility):
Highest volatility day: 2020-03-13 (std: 0.4735%)
Lowest volatility day: 2024-03-02 (std: 0.0104%)

time	09:15	09:16	09:17	09:18	09:19	09:20	09:21	09:22	09:23	09:24	...	15:20	15:21	15:22	15:23	15:24	15:25	15:26	15:27	15:28	15:29
date
2015-01-09	0.0	0.09	-0.06	0.08	0.08	-0.00	0.00	0.02	-0.09	0.00	...	-0.01	0.02	0.01	0.01	0.03	-0.00	-0.04	-0.01	0.01	-0.03
2015-01-12	0.0	-0.45	0.01	0.04	0.06	0.05	-0.02	0.03	0.06	-0.09	...	0.04	-0.04	0.02	0.00	-0.01	0.04	0.01	-0.01	0.00	-0.01
2015-01-13	0.0	0.11	-0.08	-0.05	-0.02	-0.03	-0.01	0.01	-0.09	-0.09	...	0.08	0.01	0.01	0.01	0.01	0.00	0.00	-0.01	-0.02	0.03
2015-01-14	0.0	-0.08	0.07	0.02	-0.04	-0.01	-0.03	-0.11	0.04	0.12	...	0.08	-0.01	0.02	0.03	0.01	0.01	0.00	0.02	0.02	0.01
2015-01-15	0.0	0.18	-0.55	-0.10	0.18	0.32	-0.23	-0.12	0.22	0.06	...	-0.04	-0.07	-0.11	-0.06	-0.11	-0.05	0.06	-0.05	0.03	0.00

This cleansing removed 52 days from the market.

Most of the blogs and influencers will take only these 52 days. Probably, if we have of all the blogs and youtube transcripts and arrange the dates mentioned in a series, 2σ dates will fall in these 52 days only. You decide whether to make profit in 2221 normal days or you're going to wait for one of those 52 days.

Finally, let's split the dataset into training and validation sets and store them as separate files. The model will be trained only on the training dataset. The validation set will not be exposed to the model during training or fine-tuning. We'll use the validation dataset to backtest whether the model is makeing any profit for us.

Let's take out every nineth day into validation set. As it is more than seven, the subsequent nineth day will be different day of the week.

# Split into training and validation sets
# Every 9th day goes to validation, rest goes to training
total_days = len(n50_daily_price_movements_filtered)

# Create boolean masks for train/validation split
validation_mask = [(i % 9 == 8) for i in range(total_days)]  # Every 9th day (0-indexed, so 8th position)
training_mask = [not val for val in validation_mask]

# Split the datasets
train_price_movements = n50_daily_price_movements_filtered[training_mask]
val_price_movements = n50_daily_price_movements_filtered[validation_mask]

train_daily_opens = n50_daily_opens_filtered[training_mask]
val_daily_opens = n50_daily_opens_filtered[validation_mask]

# Save the datasets to CSV files in the dataset directory
import os

# Create dataset directory if it doesn't exist
os.makedirs('dataset', exist_ok=True)

# Save training datasets
train_price_movements.to_csv('dataset/train_price_movements.csv')
train_daily_opens.to_csv('dataset/train_daily_opens.csv')

# Save validation datasets
val_price_movements.to_csv('dataset/val_price_movements.csv')
val_daily_opens.to_csv('dataset/val_daily_opens.csv')

print("Datasets saved successfully!")

Conclusion

Let's wrap up this blog here. In the upcoming posts, we will explore various models and training methodologies to determine which approach yields the best results. The dataset prepared in this blog will serve as the foundation for all those experiments. You can access the code discussed in this blog in the n50_dataset_prep notebook.