...

CSNAINC

gpt 4o vs o1

OpenAI o1 Vs. GPT 4o – Can Thinking LLMs Actually Make A Difference?

OpenAI o1 Vs. GPT 4o – Can Thinking LLMs Actually Make A Difference?

Disclaimer: These are test scenarios and we don’t recommend settling for ChatGPT’s answers in serious situations.

AI is evolving, and with OpenAI’s latest release of the o1-preview model, there’s a noticeable shift. This isn’t just about speed or spitting out information anymore. It’s about something more interesting—what feels like the AI is actually thinking through problems.

So, here’s the question: is this shift real, or is it just marketing hype? To find out, we ran both models—GPT-4o and o1-preview—through a series of tests designed to see how they handled complex problems. The results? Well, it was fascinating to watch.

Starting Off With Math

a. We kicked things off with a simple cubic equation. The question: Find all solutions to the equation x3 – 3x + 2 = 0.
Now, at first glance, this seems straightforward. You might assume both models would handle it with ease—and you’d be right. They both solved the equation accurately. But what was really interesting was how differently they got there.

GPT-4o’s approach

So, here’s the question: is this shift real, or is it just marketing hype? To find out, we ran both models—GPT-4o and o1-preview—through a series of tests designed to see how they handled complex problems. The results? Well, it was fascinating to watch.
  • GPT-4o went straight for the Rational Root Theorem. It quickly identified the roots x=1 and x=−2, factored the equation, and called it a day.
  • Quick and efficient, no unnecessary fluff.

O1-Preview’s Approach

This is where things got interesting. o1-preview didn’t just rush to the solution. It paused for a noticeable amount of time—35 seconds—before even starting. It was almost as if it was processing the problem like a human would by trying different strategies, analyzing, and rechecking.
When it did arrive at the answer, it explained every single step in extreme detail. It wasn’t just factoring; it was showing how it factored, why it chose those specific steps, and what alternatives it considered along the way.

b. An Optimization Challenge

Next, we gave them a real-world optimization problem You are building a rectangular fence that needs to enclose 100 square meters of land. If one side of the fence costs twice as much as the other, what are the dimensions that minimize the cost?
Here, the challenge wasn’t just about getting the answer but about how each model approached the problem. It required some calculus and an understanding of how costs can be minimized under constraints.

GPT-4o’s approach

  • GPT-4o immediately wrote down the cost function, took the derivative, and solved it in about 20 seconds. It found the correct answer: 14.14 meters by 7.07 meters.
  • Quick and correct, but nothing to write home about in terms of explanation. It was like it assumed you already knew the steps and didn’t need to be walked through them.

O1-Preview’s Approach

This model took its time again—20 seconds of thinking—but gave a far more thorough explanation. It detailed how the area constraint fit into the cost equation, how it derived the formula for cost, and even double-checked its answer by discussing why those specific dimensions minimized the cost.
There was a lot more thought and depth to the response.

Takeaway

GPT-4o was faster, but o1-preview took a more methodical approach. If you’re looking for quick answers, GPT-4o will get the job done. But if you want to actually understand the reasoning behind the answer, o1-preview is the better choice.

Putting Complex Reasoning To The Test

a. The Space Shuttle Problem

We wanted to see how both models handled more scientific reasoning. So, we asked:
Explain how a space shuttle re-enters Earth’s atmosphere without burning up. Include concepts of heat transfer, friction, and energy dissipation.
This is where the models really started to differentiate themselves.

GPT-4o’s approach

It did a solid job, honestly. It explained the basic idea of heat generated from friction and the use of heat shields. It touched on how air compression heats the shuttle, leading to high temperatures that are managed by thermal protection systems.
However, the explanation felt a bit… basic. It was like it was telling you what you already know but without really diving into the science behind it.

O1-Preview’s Approach

gpt o1 rocket problem
Again, it took longer—20 seconds of “thinking”—but the response was worth the wait. It didn’t just talk about friction; it went into air compression, explaining that the real source of heat isn’t friction but rather the compression of air in front of the shuttle. It even touched on convection, conduction, and radiation as part of the heat dissipation process.
What really stood out was the depth of the explanation. It felt like a physicist was walking you through the problem, not just summarizing a textbook.

b. Logically Solving A Riddle

We threw a logic riddle at both models:
A man is looking at a portrait. Someone asks him, ‘Whose picture are you looking at?’ The man replies, ‘Brothers and sisters, I have none. But this man’s father is my father’s son.’ Who is in the picture?
Now, this is where we expected both models to perform well—it’s a classic riddle, and both should be able to reason through it.

GPT-4o’s approach

The model was quick and accurate. It immediately identified that the man in the portrait is his son, and it explained the riddle clearly.

O1-Preview’s Approach

The model was also accurate, but this time, the explanation was deeper. It didn’t just solve the riddle; it took the time to break down each part of the sentence, almost like a teacher explaining it to a student who might be unfamiliar with this kind of logical puzzle.

Takeaway

The extra explanation wasn’t necessary, but it made a difference. If you’ve heard this riddle before, GPT-4o’s answer is fine. But if it’s your first time encountering this type of problem, o1-preview holds your hand through the reasoning process.

Real-World Decision Making

a. Thinking Like A CEO

We threw a more strategic, real-world challenge at both models:
You’re the CEO of a tech startup, and you’re offered two investment options. One promises high short-term returns but comes with considerable risk. The other provides stable but slower growth. How do you evaluate these options, and which would you choose?
This isn’t just a math problem—it requires an understanding of business strategy, risk management, and long-term thinking.

GPT-4o’s Approach

GPT-4o evaluated the two options logically and quickly laid out the pros and cons. It recommended the high-risk, high-reward option only if the company had a solid financial cushion and strong growth ambitions. For startups in early stages, it leaned toward the stable growth path.
The response was practical, focusing on key metrics like cash flow, risk tolerance, and market volatility.
However, it felt like a quick pros-and-cons list—it gave the right information but didn’t dig into the strategy behind the decision. It was more transactional than strategic.

o1-Preview’s Approach

o1-preview, on the other hand, took its time, thinking before diving into the answer. It didn’t just list the pros and cons but dug into the nuances of each option. It emphasized diversifying investments, suggesting you don’t have to pick one or the other.
It recommended testing the high-risk option with a small part of the budget, while keeping the bulk of investments in stable growth.
The model considered more strategic long-term thinking, including market conditions, investor expectations, and even company culture.
This felt less like checking boxes and more like you were speaking to a seasoned business advisor, taking into account every angle before making a recommendation.

Takeaway

If you’re looking for speedy advice, GPT-4o gets you there fast. It’s concise and efficient but lacks depth. If, however, you need to explore strategic options and consider a variety of factors, o1-preview dives deeper into the decision-making process, making it better for complex business scenarios where long-term thinking matters. If you are in search for an AI that truly stands out, you should definitely take a look at our artificial development services.

Attempting a Medical Diagnosis

b. Diagnosing a Complex Case
Finally, we wanted to see how the models handled a medical diagnosis—an area where accuracy is paramount. We gave them this scenario: A 45-year-old male presents with symptoms of fatigue, shortness of breath, weight loss, anemia, and enlarged lymph nodes. His blood tests reveal anemia, and a chest X-ray shows enlarged lymph nodes. What are the possible diagnoses, and what further tests would you recommend?
This isn’t just about listing diseases—it’s about reasoning through the case, considering possibilities, and recommending a sensible diagnostic approach.

GPT-4o’s Approach

GPT-4o quickly rattled off a list of potential diagnoses: lymphoma, leukemia, tuberculosis, and sarcoidosis. It also suggested the right follow-up tests—lymph node biopsy, CBC, and CT scan.
While the response was clinically sound, it didn’t feel researched into. It gave the right answers but lacked depth in explaining why certain conditions were prioritized or what made each test essential in this specific context.

o1-Preview’s Approach

o1-preview took 14 seconds to think through the case, and the results showed. It didn’t just provide a list of diseases but discussed why lymphoma and leukemia were the most likely based on the patient’s symptom progression.
It also gave a detailed explanation of how specific tests, like flow cytometry or bone marrow biopsy, would help confirm or rule out these conditions.
The model even explored more secondary possibilities, such as chronic infections and autoimmune diseases, while explaining the rationale behind each follow-up test.

Takeaway

GPT-4o delivered a competent answer but lacked the understanding that o1-preview brought to the table. If you’re looking for a quick diagnostic checklist, GPT-4o is efficient. But for a more thoughtful, in-depth analysis of complex medical cases, o1-preview provides the level of care and consideration you’d expect from a seasoned healthcare professional.

Which Model Is Better?

So, gpt 4o vs o1 preview, which model is ahead? In our experience, it’s not about which model is “better” overall—it’s about what you need. 

Speed

If you need fast, correct answers without too much fluff, GPT-4o is your model. It gets to the point quickly, solves problems effectively, and doesn’t linger on unnecessary details.

Depth and Thinking

On the other hand, o1-preview is for those who want more than just an answer. It takes its time (sometimes noticeably so) but offers deeper, more thorough explanations. In some cases, it felt like the model was actually thinking through the problem, testing strategies, and rechecking itself.

Table of Contents

Popular Category

Scroll to Top
Contact Us