#126 Perspectives in the Layers of Cinematic Portrayals

Launching a new section - "Science unveiled"

Sep 24, 2023

September 23, 2023

man holding eyeglasses — Photo by Nathan Dumlao on Unsplash

Last week, I was watching the movie Rocky & Rani, and there is a scene in the movie that led me to criticize Bollywood movies, followed by a lot of questioning on perspectives (mine as well as others) in life. Turned out to be a good eye opener. Let me first describe the scene -

Approximately 15 minutes into the movie, as they introduce Ranveer Singh's character and give us a glimpse of what the movie is exploring, a perfect moment arises to introduce the lead actress to the audience. To give you some context, I've always been critical of those unnecessary item songs or scenes in movies that don't really have anything to do with the story. I mean they are called “item“ songs. Seriously? On one hand, people in the movie business talk about feminism and equality, but on the other hand, they include these scenes and songs, which, by the way, have nothing to do with the main story and are all about objectification. This seems a bit two-faced, doesn't it?

Now, let's talk about Alia Bhatt, who is the lead actress in this movie. When she's introduced, she's wearing a bindi, wearing kajal on her eyes, tying the back strings of her saree, draping the dupatta as she's walking elegantly towards a newsroom where she's going to interview a politician. (This introduction scene was replayed btw!) In the interview, Alia Bhatt eloquently highlights how the politician has been objectifying women, raising doubts about his intent for public service. She passionately speaks about how many men, or maybe even most, treat women as mere objects.

But hold on a second! I couldn't help but notice that the way they introduced her character in the movie is a lot like what she's speaking out against in the interview. It's almost like they're doing the very thing they're criticizing. And what's more, these scenes happen one after the other! It feels a bit like saying one thing and doing another, doesn't it? That's what hypocrisy is.

But can that scene really be termed as objectification?

Perspectives. This is where the perspectives come in. I was fortunate to see a different perspective—the one that distinguishes the celebration of feminine energy and glamour from objectification. After seeing this perspective, I came to the realization that there might not be anything wrong with that scene. In that scene, the filmmakers were trying to showcase the character's grace, confidence, and elegance. They used traditional symbols like the bindi and kajal to emphasize her Indian heritage.

It turns out my earlier prejudice against Bollywood movies, combined with the fact that the scene played twice (due to reasons beyond my control), influenced me to label it as objectification. The key lesson here is that sometimes our biases and external factors we can't control can lead us to conclusions that aren't entirely accurate. It underscores the importance of engaging our own critical thinking to grasp situations and consider various perspectives before making judgments.

Think and grow!

Cheers!

Science Unveiled

For quite some time I have been thinking to make a few changes to my newsletter i.e. to add some more details apart from a usual Saturday story/lesson. And I think I have finally decided onto something.

Starting from the 126th edition of The Passion Pad i.e. starting today, my newsletter will feature a brand new section called as “Science Unveiled“ which will try to explain research papers, mostly revolving around AI, LLMs, ML into easy to understand language. Every week I will bring up a research paper that explains complex stuff happening in the AI space in easy terms. Well you might still be needing a little background though. Why? Because I started this newsletter 2 years ago with the intention of growing my knowledge and I believe I have been very successful at this considering the depth and the insights I have gained while researching for a topic to write on every Saturday. I believe this is a step that solidifies that goal of knowledge even further and pushes me to learn more. With that being said, let’s get started with today’s topic -

“The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python”

Paper link: https://arxiv.org/pdf/2305.15507.pdf
Related Article:

AI: A Guide for Thinking Humans

Can Large Language Models Reason?

What should we believe about the reasoning abilities of today’s large language models? As the headlines above illustrate, there’s a debate raging over whether these enormous pre-trained neural networks have achieved humanlike reasoning abilities, or whether their skills are in fact “a mirage…

2 years ago · 167 likes · 50 comments · Melanie Mitchell

Brief Summary

LLMs are being used in code generation nowadays, but the question is how well these LLMs understand the code. The paper conducted an experiment in which the authors swapped the meaning of two identifiers in Python. The LLM failed to generate the correct Python code, indicating that LLMs still lack abstract understanding of the task. This makes them highly unsuitable for tasks that are outside the scope of their training data. Furthermore, the authors demonstrate that scaling the model does not improve abstract understanding; in fact, it may even worsen it. In other words, the model's performance deteriorates as it scales up.

In-depth

Large Language Models (LLMs) have become a popular topic of conversation lately. If you frequently use ChatGPT or any other AI tool that utilizes natural language, it's likely powered by LLMs. In the realm of software, LLMs are being employed for various tasks including programming, code generation, and code completion. While LLMs have shown improved performance with larger model sizes in many applications, code generation is an example of a task that falls under the category of inverse scaling. But what exactly is inverse scaling?

Researchers have observed that sometimes the output quality of the model decreases as we increase the size of the model. This can occur because of the biases in training data and a certain bias in the training data might result in a drop in the quality of model output. These biases however often depend on a certain knowledge i.e. a data that seems bias to me may not seem bias to you just because of the difference of knowledge. Hence, the authors of this paper proposed another method to show inverse scaling in case of code generation tasks using LLMs.

Programming languages have fixed syntax and semantics therefore it is an easy task to automate the generation of examples. They are scientifically interesting because they can be used for automatic generation of examples of coding problems and their evaluation against an objective ground truth, whereas most natural language tasks have enough ambiguity that require human annotation in order to produce high-quality examples. The research is also important for commercial tools such as Github copilot which are being used by developers heavily.

The authors define a code generation task as follows -

Write a statement that swaps the definition of two builtin python identifiers. eg:
```
len, print = print, len
```
Along with the redeclaration statement written above, the model is given the function name, followed by a doc-string which specifies what the function is supposed to do.
The model is then supposed to generate the rest of the body of the function
The task is defined as a classification task where the input is: swap statement, function declaration, docstring and the output is one of the two classes “bad” or “good”

The class “bad” means that in the auto generated code, the model used the identifiers as their usual meaning, ignoring the swap statement and the class “good” means that the model considered the swap statement and therefore generated the code keeping in mind the swap statement.

The dataset for training comprised of data collected by scraping data from GitHub. Only repositories with greater than 100 stars and an open source CC-BY-4.0 license license in the README were considered. Random functions were selected from the code which had the use of at least two built in functions and a doc string.

The evaluation was done on auto regressive models - OpenAI GPT-3, Salesforce CodeGen, Meta AI OPT, and one family of sequence-to-sequence conditional auto-regressive language model (Google FLAN-T5)

The analysis shows that autoregressive text-based LLMs, even when pre-trained on code-based models, demonstrate inverse scaling on our task. On the other hand, the code-based models exhibit flat scaling, which may potentially transition to positive scaling at the largest tested size. However, they fail to significantly improve upon the performance of the text-based models.

The authors also utilized chat LLMs, such as gpt-4, to investigate inverse scaling and assess the LLMs' comprehension of the code. The Anthropic models (claude-instant and claude) demonstrate higher accuracy (10-18%) with positive scaling and consistently generate valid outputs. On the other hand, the OpenAI models (gpt-3.5-turbo and gpt-4) exhibit lower accuracy (< 4%) with flat or inverse scaling and occasionally produce invalid outputs.

The central idea of the experiment is that if you swap the capabilities of two builtin functions in Python, such as print becoming len and len becoming print, LLMs still use the functions as originally defined. This implies that LLMs are unable to comprehend, reason, and adapt the usage of functions after a function swap. This phenomenon becomes more apparent as the size of the language model increases. It appears that LLMs rely more on shortcut learning rather than logical inference. LLMs demonstrate a significantly improved capability when dealing with problems related to terms or concepts that have a higher frequency in their training data. This observation leads to the hypothesis that LLMs may not primarily rely on robust abstract reasoning to solve problems; instead, they seem to solve problems, at least partially, by recognizing patterns in their training data that align with, resemble, or are otherwise connected to the text of the prompts provided to them.

Sample from dataset -

The input consists of a swap statement which swaps the identifier names len and open, the function name importfile(), and a docstring specifying what the function should do.
The incorrect continuation still uses the open function to open the file and len to compute the length of the byte array MAGIC_NUMBER, in the correct continuation instead, open and len are swapped.

Feel free to dig into the paper to get a better understanding. I hope this new section will turn out to be even better than the previous one and gain more traction! Let me know in the comment section if you like it!

Take care, have fun, see you soon :)

The Passion Pad

#126 Perspectives in the Layers of Cinematic Portrayals

Launching a new section - "Science unveiled"

Science Unveiled

Discussion about this post