Click Anywhere To Know More About Me!
Hi, I am Naseela
Research @ USC | MS CS @ USC | Natural Language Processing | Bias and Fairness in AI | Network Science
Rewriting the code (RTC) | Grace Hopper Celebration (GHC) | Society of Women Engineers (SWE)
My picture in a bookstore
Bio

Hello! I'm Naseela Pervez, a pre-doctoral research assistant at the University of Southern California (USC). My academic journey in Computer Science culminated in a Master's degree from USC, where I immersed myself in cutting-edge courses such as Applications of Natural Language Processing, Machine Learning, Data Mining, and Algorithm Analysis and Design. These hands-on experiences have shaped my research interests and prepared me for the challenges of doctoral studies. I'm now actively seeking PhD positions to further my research.

My work lies at the intersection of Natural Language Processing, Fairness and Bias in AI, Network Science, and Computational Linguistics. As a Research Assistant at USC's Information Science Institute (ISI) and part of the MINERVA team, I've had the privilege of engaging in groundbreaking projects that push the boundaries of our understanding of AI and its societal implications.

At MINERVA, I'm tackling the complex challenge of parsimonious document labeling. This project involves developing a novel clustering-based approach to generate document-specific labels. One of our key findings is the crucial role that background information, supplemented by Large Language Models (LLMs), plays in efficient label generation. This research not only advances our understanding of document classification but also opens up new possibilities for more accurate and context-aware information retrieval systems.

One of my primary research focuses has been leveraging natural language processing to analyze scientific documents and uncover gender and prestige bias within the scientific community. A particularly intriguing discovery from this work revealed that large language models tend to adopt a more masculine writing style when generating scientific content. This finding raises critical questions about the ethical implications of using AI assistance in academic publishing and highlights the need for more diverse and inclusive AI systems.

My commitment to advancing AI extends beyond the laboratory. I'm a strong advocate for diversity in STEM, actively participating in organizations like Rewriting The Code and the Society of Women Engineers. As a woman in research, I feel a deep responsibility to promote and mentor younger women and individuals from underrepresented groups in our field. I seize every opportunity to motivate and inspire the next generation of diverse minds in STEM, believing that a more inclusive scientific community leads to more comprehensive and impactful research outcomes.

I'm particularly excited about my upcoming presentation at the SDProc workshop at ACL 2024, where I'll be sharing our paper, "Artificial Intuition: Efficient Classification of Scientific Abstracts." This opportunity to contribute to the global dialogue on AI and computational linguistics is both thrilling and humbling.

If you're interested in innovative AI research, exploring collaborations, or have information about PhD opportunities that align with these interests, I'd be thrilled to connect. Let's work together to shape the future of AI and ensure it serves and represents all of humanity.

News and Highlights
Jul 2024
Attenting ACL 2024 to present "Artificial Intuition: Efficient Classification of Scientific Documents" at SDProc 2024.
June 2024
Our paper - "Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts", accepted at NLAI 2024.
June 2024
Submitted our paper - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts, at 5th International Conference on NLP & Artificial Intelligence Techniques (NLAI 2024).
June 2024
Our paper "Artificial Intuition: Efficient Classification of Scientific Documents" is accepted at SDProc at ACL 2024.
May 2024
Submitted our paper - Artificial Intuition: Efficient Classification of Scientific Documents, at Scholarly Document Processing at ACL 2024.
February 2024
Starting a position as research assistant at USC Information Science Institute (ISI) working on applying statistical analysis of real-world patent-to-patent citation networks.
February 2024
Published a pre-print "Integrating MLSecOps in Biotechnology Industry 5.0" that is set to be publihsed as a book chapter in Fall 2024.
February 2024
Starting a position as research assistant at USC's MINERVA (The Management of INnovation, Entrepreneurial Research, and Venture Analysis) focusing on applying Natural Language Processing (NLP) tools to explore the intersection of public policy, systems engineering, and finance.
December 2023
Graduated from master's program in Computer Science at University of Southern California (USC).
April 2022
Presented research article " Application of Deep Learning for COVID Twitter Sentimental Analysis towards Mental Depression" at ICTIS 2022 Ahemedabad(virtually).
January 2022
Starting my Master's in Computer Science at USC.
August 2021
Graduated with a bechelor's in Information Technology at SRM Institite of Science and Technology.
Publications
Pervez, N., & Titus, A. J. (2024). Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts. (Accepted at NLAI 2024).
Sakhrani, S., Pervez, N., Kumar A., Morstatter, F, Graddy-Reed & A., Belz, A.(2024) Artificial Intuition: Efficient Classification of Scientific Abstracts (Accepted at SDProc at ACL 2024).
Pervez, N., & Titus, A. J. (2024). Integrating MLSecOps in the Biotechnology Industry 5.0.IntechOpen. doi: 10.5772/intechopen.114972.
Projects
-All Images Generated Using DALL-E-
Does Explainable AI Agree With Psychologists?
Applied Lime and Shap explainable AI techniques to diverse classification models including Random Forest, Logistic Regression, and SVM to assess personality traits based on the Big Five model, IRI-based empathy/distress traits, and MFT-based morality traits. Identified 50% of the most influential words in the original text using Lime and Shap and then conducted a comprehensive analysis utilizing statistical evaluations to compare established LIWC features between the original and transformed texts comprising those influential words. The analysis revealed a lack of statistically significant associations between therotically established groundtruths of psychology community - LIWC (Linguistic Inquiry and Word Count) features and the text when reduced to the influential words highlighted by Lime and Shap. This provided insights into feature interpretability in text analysis within the various classification models. Overall, determined through the statistical tests that the established LIWC features did not have a statistically significant relationship to the transformed text with 50% of the most influential words. This sheds light on the role of feature interpretability in text classification tasks using different models when the text is reduced to its most influential components identified by Lime and Shap.
AI and Human
Fact Verification - Can Mighty Transformers Identify Claims and Evidences?
The Fact Extraction and VERification (FEVER) benchmark is an important dataset used to evaluate automated fact checking systems. It contains 185,445 claims manually verified against Wikipedia pages to be labeled as Supported, Refuted, or NotEnoughInfo. My goal was to improve the performance of a fact checking model on this benchmark dataset. Through enhancements to the model architecture and text representations, I was able to boost accuracy by 4% absolute, achieving 93% fact verification accuracy. Specifically, I employed a neural network model with several key components. First, input encoding was done using HuggingFace's pretrained BERT model. BERT is a transformer model trained on large corpora to produce semantically rich text embeddings. Using BERT allowed the model to better understand the underlying meaning of the claims and evidence documents. This was an important enhancement over previous baseline methods. Second, I implemented a document retrieval module to find relevant Wikipedia pages based on the claim text. Candidate evidence documents are identified through sparse keyword matching and dense embedding similarity approaches. This retrieves possibly relevant evidence pages to verify the claim. Next, an evidence extraction module selects pertinent sentences from the retrieved Wikipedia documents to serve as evidence. By focusing on relevant evidence sentences, it reduces noise and improves the quality of evidence fed into later stages. Finally, a claim-evidence inference module predicts if the claim is Supported, Refuted, or NotEnoughInfo based on the encoded claim and extracted evidence representations. This component determines the veracity of the claim by modeling the interaction between the claim and evidence text. The model was trained end-to-end on the FEVER dataset to optimize the components to work effectively together for fact checking. The improvements from using BERT and the enhanced model architecture over previous baseline methods account for the 4% absolute accuracy gain achieved on the dataset. This demonstrates the value of leveraging strong pretrained language models and engineering a model architecture tailored for fact verification.
go game
Little Go
An AI agent was architected to play the 5x5 Chinese board game Little Go, achieving over 90% win rates against benchmark opponents. This was accomplished by developing a Minimax search algorithm with Alpha Beta Pruning in Python to explore future game states and select optimal moves. The Minimax search algorithm modeled the two-player game as adversarial search, evaluating possible future game states up to a depth of 13 moves. It assigned maximizing and minimizing scores to the AI agent's and opponent's possible moves using a heuristic evaluation function. Alpha Beta Pruning was implemented to prune away branches that were guaranteed to be worse than already explored options, increasing search efficiency. The heuristic evaluation function was designed through iterative testing and tuning, incorporating factors such as number of stable pieces, liberties, and territory to estimate the value of board positions. The agent's opening book and endgame databases were also optimized by analyzing patterns in winning games. By effectively combining the optimized Minimax search with powerful heuristics, domain-specific adaptations, and move precomputations, the AI agent was able to successfully evaluate complex positions and make tactically strong moves. The over 90% win rate demonstrated the agent's proficiency in strategic decision-making and planning to achieve victory. The specialized techniques developed could generalize to building AI for other complex board games.
go game
Restaurant Recommender System
A hybrid recommendation system was developed to predict user ratings for restaurants by employing a weighted average approach that combined an item-based collaborative filtering recommender using Pearson correlation similarity with an XGBoost regressor model. The item-based collaborative filtering component generated recommendations based on similarity of rating patterns between different restaurants, while the XGBoost regressor predicted numeric rating values. By combining these two components using a weighted average ensembling approach, the hybrid model achieved an RMSE of 0.97 on the prediction task, demonstrating improved accuracy over a baseline model RMSE of 1.09. This hybrid integration of collaborative filtering and regression techniques enabled more robust and accurate rating predictions. To optimize the runtime performance of the recommendation system, PySpark RDD APIs were leveraged to distribute the data processing and model training. The Spark RDDs allowed parallelization of computations across clusters, significantly speeding up the processing time for the dataset containing 3 million rows of user-restaurant ratings. The use of PySpark reduced the total runtime from approximately 7200 seconds in the original configuration to just 1800 seconds after optimization, underscoring a 4x improvement in processing time. This considerable enhancement in efficiency enabled more rapid iteration during model development and facilitated application of the system to larger real-world datasets. Overall, the PySpark implementation delivered substantial gains in scalability and speed for the recommendation engine.
restaurant cartoon
Made with by Naseela