Data Engineer

Veerabagu Krishnasamy

About Candidate

Education

D
Diploma in Electronics and Communications EN=ngineering 1999

Experiences

P
Python Developer Nov 2022 - Oct 2024
ScotiaBank

• Hands-on experience in developing data integration solutions using Azure Data Factory (ADF), Azure SQL Server, and Azure Logic App Service. • Proficient in building and optimizing Data Flow Activities within ADF for efficient data transformations. • Skilled in migrating pipelines across environments to support continuous deployment and integration efforts. • Strong understanding of Azure SQL Database and SQL Data Warehouse, with the ability to translate logical models into physical implementations. • Collaborated with Cloud and DevOps teams to manage IAM policies and role-based access control across various Azure services. • Adept at gathering, analyzing, and documenting business requirements to align technical solutions with stakeholder goals. • Designed and developed end-to-end ETL pipelines using ADF to extract, transform, and load data from on-premises and cloud sources into Azure Data Lake Storage Gen2. • Managed and resolved production issues on Apache NiFi, ensuring uninterrupted data flow, swift troubleshooting, and minimal system downtime. • Designed, developed, and optimized ETL pipelines for banking domains, including Trades, Orders, and Positions, leveraging pandas and PySpark for efficient data processing and analytics. • Streamlined and standardized modules by adhering to industry standards using PEP8 and black, enhancing code readability and maintainability. • Created comprehensive documentation using Google docstring and Sphinx, enabling the generation of HTML pages for modules and improving developer onboarding and module integration. • Identified and rectified redundant code snippets, significantly reducing technical debt and improving codebase efficiency. • Led the migration of existing code from pandas to PySpark, enhancing scalability and performance of data processing tasks. • Creating expectations using great expectations and validating new data against the expectations in real time to generate exception reports. (great expectation) • Packaged applications using Docker and managed deployment over Kubernetes clusters, utilizing JFrog and Rancher for efficient container orchestration and version control. • Implemented robust code versioning practices using Git and Bitbucket, employing a branching strategy to ensure smooth collaboration and codebase management. • Utilized Jira for effective project management, tracking progress, managing tasks, and ensuring timely delivery of project milestones. • Retrieved time series data from Oracle DB using cx_Oracle and pandas. • Pre-processed data and engineered features using pandas. • Trained anomaly detection models with Sklearn's Isolation Forest and performed hyperparameter tuning. (Isolation Forest) • Logged model details with MLFlow, saving the path and run ID in a config file. • Designed and implemented GenAI-driven data pipelines, incorporating chunking, Retrieval-Augmented Generation (RAG) on vector databases, and Azure OpenAI’s ChatCompletion API (GPT-3.5 Turbo, GPT-4, GPT-4o) for advanced insights and keyword extraction. (GenAI, RAG, Vector Databases, Azure OpenAI, GPT Models, Keyword Extraction) • Deployed scalable AI workflows on Azure, utilizing AKS and WebApps, and built RESTful APIs with Flask and FastAPI for seamless integration and performance optimization. (Azure, AKS, WebApps, Flask, FastAPI) • Applied the logged MLFlow model for real-time anomaly detection. • Deployed training and scoring pipelines on a Linux server using crontab, mlpipeline, and scoring pipeline. • Developed robust ETL pipelines in Databricks to ingest transactional data from Elasticsearch, transforming and storing it across bronze, silver, and gold Delta Lake layers for regulatory and financial reporting. • Flattened deeply nested JSON structures from core banking systems and normalized them for downstream analytics using PySpark and Spark SQL in Databricks notebooks. • Performed schema validation and data type standardization in silver tables, ensuring high data quality for sensitive banking metrics such as balances, risk scores, and KYC flags. • Engineered gold-layer datasets with business-critical aggregations, joins, and time-series analyses to support fraud detection, compliance dashboards, and portfolio performance tracking. • Automated and monitored ETL jobs using Databricks Workflows and integrated version control via Git to meet audit and change management requirements in a secure banking environment.

P
Python Developer Aug 2020 - Nov 2022
TD Bank

• Designed and developed Azure Data Factory (ADF) pipelines, configuring Linked Services and Azure Key Vault to securely connect with databases and flat files for data movement into Azure Data Lake Storage Gen2. • Executed seamless data migration from on-premises virtual machines (VMs) to ADLS using ADF. • Implemented initial data load transformations using Data Flow Activities, storing the processed data in ADLS Gen2 for further use. • Scheduled and monitored ADF pipelines using time-based triggers, ensuring reliable and timely execution of workflows. • Conducted hands-on migration of legacy on-prem applications to Azure Cloud, optimizing performance and scalability. • Performed transformation and loading of curated data into Azure SQL Data Warehouse using ADF’s Copy Activity. • Developed Logic Apps to automate the ingestion of daily incremental updates (Excel files) from SharePoint into ADLS Gen2. • Created and maintained incremental data pipelines in ADF to load daily updates from ADLS Gen2 into SQL Data Warehouse. • Implemented automated email alerts for success and failure events at each pipeline activity level for proactive monitoring. • Worked extensively with key ADF components such as Linked Services, Data Flows, Copy Activities, Lookup Activities, Source Connections, and Azure Data Lake Storage to deliver scalable data integration solutions. • Developed and optimized cloud solutions using AWS components (EC2, EMR, Lambda) and automated tasks such as Hadoop job migration and data processing. • Automated API integration, log rotation, and AWS services using Python and Shell scripts, and managed CI/CD pipelines with Git and Jenkins. • Managed all backend tasks, including RabbitMQ automation, API migration from Sybase to Oracle, and troubleshooting Python applications. • Customized and deployed Jupyter-Lab, Jupyter-Notebook, and Jupyter-Hub for Data Scientists, including package creation and versioning solutions. • Leveraged Python modules for web crawling and optimized multi-threading for performance enhancement in various processes.

P
Python Developer Mar 2017 - Apr 2020
HCL Technologies Ltd

Data Engineering: • Collaborated directly with clients to gather, clarify, and document business and technical requirements for data integration solutions. • Designed and developed robust ADF pipelines, Linked Services, and Datasets in version 2 to support complex data processing workflows. • Engineered various pipelines tailored to specific business use cases, ensuring scalability and performance. • Configured key Azure cloud services, including Azure Blob Storage and Azure SQL Database, to support data ingestion and storage. • Implemented email notification workflows using Azure Logic Apps to alert stakeholders of pipeline execution outcomes. • Scheduled ADF pipelines using time-based triggers to automate daily data loads and ensure data availability. • Developed and maintained PySpark code to retrieve and process data from the refined data layer for downstream consumption. • Actively participated in Agile ceremonies, including daily stand-ups, sprint planning, and backlog grooming sessions, to align development efforts with sprint goals. • Solution for an insurance client, leveraging LLMs to extract high, deep, and deepest-level insights from claim data, generating CXO-level reports and executive summaries. Software Engineering: • Engaged in all SDLC stages, including design, development, testing, and implementation. • Re-engineered modules to enhance system efficiency and incorporate new features. • Collaborated with stakeholders to gather requirements and create high-level and detailed design documents. • Fixed and deployed Python bug fixes for key applications used by customers and internal teams. • Utilized JIRA for bug tracking and Git for version control and deployment. • Implemented CI/CD pipelines using Ansible playbooks with Jenkins and SonarQube. • Developed applications in UNIX environments and utilized relevant commands. • Created business decision graphs using Python’s matplotlib library and maintained technical documentation. • Worked with feature engineers on defect reproduction, troubleshooting, and root cause analysis. • Conducted peer reviews of design and code and recommended cost-effective AWS solutions. • Automated tasks using Crontab and participated in Agile and Scrum practices for project management.

S
System Specialist Jun 2013 - Mar 2017
Datapage Digital Services Pvt Ltd

• Directed end-to-end Wintel administration, including monitoring 43 Windows Domain Controllers using Microsoft SCOM for automated ticketing. • Led troubleshooting for ADDS, DHCP, DNS, and client-related issues like printer and connectivity problems. • Developed, configured, and managed group policies, redesigned Active Directory hierarchy, and created new policies for software deployment and settings. • Identified risks and developed mitigation plans; implemented ITIL Service Management processes for incident, configuration, and change management. • Trained and mentored the team, allocated tasks, and reported on performance indicators and value delivery. • Managed the installation, maintenance, and upgrade of anti-virus, firewalls, WSUS Patch Management, and native ADS tools. • Monitored server performance, applied test patches/hot fixes, and ensured timely updates and virus definition updates. • Coordinated with Symantec, production, development, and application teams for virus management, patch deployment, and firewall policies. • Planned and deployed user policies on firewalls, configured leased lines, monitored network traffic, and reviewed network logs for misuse.

Skills

Python
100%
SQL
80%
Pyspark
95%
Azure Databricks
90%
Azure Delta Lake
80%
Apache Spark
90%
Apache NiFi
Github
Agile Methodology
ETL Pipeline

Be the first to review “Veerabagu Krishnasamy”

Your Rating for this listing