Hi, I'm Anurag Singh
AI/ML Engineer & Data Analyst
Specialized in building AI/ML solutions, analyzing complex datasets, and deploying production-ready models. I transform data into actionable insights using cutting-edge machine learning and deep learning techniques.
"Every remarkable creation begins with perfectly balanced symmetry."
I'm an AI/ML Engineer and Data Analyst specializing in building intelligent systems that learn from data. My expertise spans from training local LLMs and deploying offline AI models to creating scalable data pipelines and REST APIs. I'm passionate about leveraging machine learning to solve real-world business problems.
From implementing NLP models with spaCy to deploying containerized applications with Docker, I bring end-to-end technical capabilities. I've worked extensively with PyTorch, scikit-learn, and local LLMs (Ollama), building solutions that are both powerful and production-ready.
"Building systems that ship, not just prototypes that impress."
I'm a Software Engineer specializing in AI/ML systems and data platforms that solve real business problems. At TRPW Strategic Partners, I've delivered production systems that eliminated 80% of manual work through intelligent automation—processing 10,000+ transactions monthly and handling datasets with 500K+ rows.
My work spans the full stack: from training local LLMs (Ollama) and deploying computer vision pipelines (OpenCV, Tesseract, Google Cloud Vision) to building Flask applications with PostgreSQL and AWS infrastructure. I architect solutions that are scalable, production-ready, and business-focused.
Whether it's financial reconciliation platforms, AI-powered spreadsheet validation, or employee productivity analytics—I build systems that deliver measurable impact from day one.
Education & Experience
Data Analyst
TRPW Strategic Partners • Gurgaon, India
March 2024 - Present
Leading AI/ML, OCR, and audit automation projects. Automated data recognition and processing for Paytm KYC audits and internal projects including Limestone Reconciliation, BetelTMS, and Excellia AI using Python, Pandas, NumPy, and SQL, reducing manual work by 80%. Integrated Google Cloud Vision, OpenCV, Tesseract OCR, and AI-driven spreadsheet validation with multithreading and real-time previews.
Master of Computer Applications (MCA)
Galgotias College of Engineering & Technology
2021 - 2023 • CGPA: 7.5
Greater Noida, India. Specialized in AI/ML, data science, and software development. Completed projects in machine learning, natural language processing, and cloud computing.
Bachelor of Computer Applications (BCA)
Singhania University
2018 - 2021 • CGPA: 7.5
Rajasthan, India. Foundation in computer science fundamentals, programming, database management, and software development principles.
Last Updated: December 2024
Education & Experience
Built financial reconciliation platform handling 500K+ row datasets with custom formula engine and ML anomaly detection. Developed AI-powered spreadsheet validation using local LLMs (Ollama Gemma3/Phi3). Created employee productivity tracking system serving 50+ users with role-based analytics.
Architected end-to-end solutions using Python, Flask, PostgreSQL, AWS S3, and integrated Google Cloud Vision, OpenCV, and Tesseract OCR for automated data extraction. Deployed production systems with Gunicorn, Nginx, and Docker.
Last Updated: December 2024
My Specializations
AI/ML Engineering
Building and deploying machine learning models using PyTorch and scikit-learn. Expertise in local LLMs (Ollama), NLP with spaCy, and offline AI solutions.
15+ ML Projects
Data Analysis & Visualization
Transforming complex datasets into actionable insights using Pandas, NumPy, and visualization libraries. Expert in statistical analysis and data-driven decision making.
20+ Analytics Projects
Backend & Cloud Solutions
Developing scalable REST APIs with Flask and deploying on AWS (S3, EC2). Containerization with Docker, database management, and Linux server administration.
12+ Production Systems
My Specializations
Technical Skills
- Languages: Python, JavaScript, SQL
- Backend & Web: Flask, REST APIs, Gunicorn, Nginx, HTML/CSS
- Databases: PostgreSQL, MySQL, MongoDB
- Data & ML: Pandas, NumPy, Matplotlib, Seaborn, PyTorch, scikit-learn, spaCy (NLP/NER), Orange3, Ollama (Local LLMs)
- Cloud & DevOps: AWS (S3, EC2, IAM), Docker, Linux, Git
- Computer Vision: OpenCV, Tesseract OCR, Google Cloud Vision API
- Automation: Selenium, BeautifulSoup, multithreading
Proficient: Python, Flask, Pandas, PostgreSQL, AWS, Docker, ML deployment
Expert: Data automation, AI/ML system design, scalable data pipelines
Featured Projects
Excellia AI - AI-Powered Spreadsheet Validation
AI-powered spreadsheet validation platform using local LLMs (Ollama Gemma3/Phi3) that reduced analyst review time by 70% for datasets with 100K+ rows. Built Flask application with threaded job queue, natural language transformations, and dual-layer quality assurance (rule-based + ML anomaly detection). Handles datasets up to 500K rows with real-time preview and Excel/CSV export.
Tech: Python, Flask, Ollama (Gemma3/Phi3), Pandas, NumPy, scikit-learn, Orange3, multithreading
BetelTMS - Work Log Analytics System
Full-stack employee productivity tracking platform serving 50+ employees across 4-tier role hierarchy (Employee → Manager → Project Head → Admin). Built with Flask, PostgreSQL, and AWS S3 for secure file storage with role-based access control. Analytics engine processes 10,000+ work log entries, generating automated CSV reports on productivity and project timelines with dynamic filtering by employee, project, or date range.
Tech: Python, Flask, PostgreSQL, AWS S3, Pandas, Gunicorn, Nginx, HTML/CSS/JavaScript, Flask-Mail
Limestone Reconciliation - TRPW
Automated financial reconciliation platform that reduced manual data processing by 80%, eliminating 32+ hours of weekly work for finance team handling 10,000+ monthly transactions. Built Python-based data processing engine with custom formula language and regex-based cleaning for non-technical users. Features real-time data preview, automated discrepancy analysis, and CSV report generation. Optimized for 500K+ row datasets with ML anomaly detection using isolation forests.
Tech: Python, Pandas, NumPy, Matplotlib, SQL, scikit-learn, Regex, psutil, memory-profiler
YouTube Channel
Join me on my YouTube channel where I share tutorials, project walkthroughs, and insights into the world of AI, data science, and software development. Subscribe for weekly content!