loader
Summary
Overview
Work History
Education
Work Availability
Skills
Timeline

Alen Jeffie Penelope

Lead Data Engineer
Bengaluru
Summary

Lead Data Engineer/Senior Data Architect with over 10 years of experience in design and development of data pipelines and processing huge volumes of data using tools and frameworks in the Hadoop and Spark ecosystem. Also I’m proficient in a variety of platforms, languages and methods including Hadoop, Spark core, Spark sql, Hive scripts, No-SQL - HBase, HDFS, Airflow, Kafka,S3, EMR, EC2, Lambda, Cloud Watch, DynamoDB, Impala on both On-prem and Cloud (AWS).

Overview
6
6
years of post-secondary education
10
10
years of professional experience
Work History

Lead Data Engineer

Retail Kloud9 Technologies Pvt Ltd
2020-07 - Current

Client : Nike, United States of America

Project Name :NGAP Migration

The main objective of this project is to move all the airflow jobs from EDF(persistent cluster) to NGAP(Next Generation Analytics Platform – MAP Cluster).

Roles & Responsibility:

  • Responsible for migrating the data pipeline using Big Data tools including Hive, Airflow, Spark, S3 and EMR
  • Tuned system resources (executor, core, memory, etc.) and spark configuration for better job processing.
  • Optimized the spark code by replacing the hive writes with S3 writes, removing the unwanted count check and used parquet file format with snappy compression for performance.
  • Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating and storing data in S3 buckets
  • Worked with Jenkins and Github during automated deployments
  • Involved in Requirements gathering, Analysis, Coding and Code Reviews, Unit and Integration Testing

Application Development Team Lead

Accenture solution private Ltd
2019-01 - 2020-07

Client : Telia, Sweden

Project Name : Airflow Migration

The main objective of this project is to extract and load the CRC data into CDL and migrate all Talend jobs into Airflow pipelines.

Roles & Responsibility:

  • Designed and developed ETL pipelines to automate ingestion of both structured and semi-structured data (JSON, XML, CSV, AVRO, PARQUET, Relational DB,etc.) into Hadoop storages (HDFS, HIVE , IMPALA & HBASE) using data processing engine Spark and their various transformations.
  • Deployed the code in different environments using Jenkins as part of CI/CD process and worked in version control tools like Bitbucket.
  • Optimized HQL queries running over very huge compressed data and reduced the running time to a significant level.
  • Performance tuning of Spark Applications for setting the correct level of Parallelism and memory tuning.
  • Handled large datasets using Partitions, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Reviewed the code and pair programmed with the developers regularly to provide technical guidance.
  • Responsible for designing the spark jobs and scheduling the jobs using Airflow.
  • Followed agile methodology and created user story and tracked status in JIIRA for project delivery and execution.

Senior Information Data Architect

GE Power
2017-03 - 2019-01

Project Name: Performance Management Backlog

The main objective of this project is to extract and load the GE Power equipment’s data into data lake.

Roles & Responsibility:

  • Analyzed the business requirements and modeled the data load for various sources.
  • Highlighted gaps in reporting and data management capabilities, and designed mitigation strategies.
  • Presented the requirements and their solutions in Data Governance and Data Architect meeting.
  • Implemented data extraction tool using metadata to generate Extract,Transform and load (ETL) loader code.
  • Responsible for fixing the failed jobs scheduled in TAC server.
  • Followed agile methodology and acted as scrum master in the scrum team.

IT Analyst

Tata Consultancy Services
2016-02 - 2017-03

Client : PNC, United States of America

Project Name: EPK Metric Dashboard

The main objective of this project is to extract the mnemonics data from Zenoss database to EPK database for all the metric.

Roles & Roles and Responsibility:

  • Responsible for interacting with client and gather requirements
  • Responsible for preparing required diagram such as STM, Detail Design Specification, Architecture and Runbook.
  • Responsible for developing the workflow as per GFB standards.
  • Worked with Source/Target Files such as Mainframe Files(imported using datamaps), Flat Files, Oracle and Teradata tables.
  • Responsible for Unit testing and preparing the Unit Testcase document.
  • Provided support during UAT and PROD implementation

Senior Software Engineer

HCL Technologies Pvt Ltd
2011-05 - 2016-02

Client : Common Wealth Bank of Australia, Australia

Project Name : Global Asset Management

The main objective of this project is to provide small work development (SWR) for the applications in GAM segment. The applications involved are SSRS, FTS Interface, Charles River and GAM Scheduler

Roles & Responsibility:

  • Responsible for interacting with business user for requirement gathering and preparing BRD document.
  • Responsible for analyzing Functional Specifications and Preparing Technical Design document
  • Created various Transformations like Joiner, Aggregator, Expression, Filter, Update Strategy and Lookup.
  • Involved in enhancements and maintenance activities of the data warehouse. .
  • Responsible for Unit testing and Integration Testing.
  • Prepared supporting documents like UTC, Production Implementation and Tech Hand Over documents.
  • Work with IT Change Management on Production rollouts and Implementation Issues.
Education

Master of Engineering Software Engineering

SRM Easwari Engineering College, Anna University
2009-01 - 2011-01

Bachelor of Engineering Computer Science

R.V.S College of Engineering and Technology affiliated to Anna University
2005-01 - 2009-01
Work Availability
monday
tuesday
wednesday
thursday
friday
saturday
sunday
morning
afternoon
evening
swipe to browse
Skills

Bigdata Processing Engines : Hadoop - (Hive, HDFS, HBASE, Impala, SQOOP, YARN, AVRO) Spark - (Spark core, Spark SQL, Streaming) Streaming - Kafka

Hadoop Distribution : Cloudera

Cloud : AWS - (S3, Cloud Watch, Lambda, IAM, Athena DynamoDB, Step Functions, EMR, EC2)

Languages : Python
No-SQL : HBASE, DynamoDB

Workflow Orchestration :Airflow, TAC, Control-M

Databases: Oracle, MSSQL, Green-plum, Postgresql

Version Control : SVN,Github, Bitbucket

Agile :JIIRA

ETL Tools: Informatica Power Center, Talend BigData

Timeline

Lead Data Engineer

Retail Kloud9 Technologies Pvt Ltd
2020-07 - Current

Application Development Team Lead

Accenture solution private Ltd
2019-01 - 2020-07

Senior Information Data Architect

GE Power
2017-03 - 2019-01

IT Analyst

Tata Consultancy Services
2016-02 - 2017-03

Senior Software Engineer

HCL Technologies Pvt Ltd
2011-05 - 2016-02

Master of Engineering Software Engineering

SRM Easwari Engineering College, Anna University
2009-01 - 2011-01

Bachelor of Engineering Computer Science

R.V.S College of Engineering and Technology affiliated to Anna University
2005-01 - 2009-01