0 2025-Intro
1.Introduction
I’m Jacob. I graduated from Stevens Institute of technology in Hoboken New Jersey. And got my master’s degree of Computer Science. My major was data science and data system.
After I graduated, I took a contract job in startup company named Wrevel in New York city, working as backend developer to build the company website with PHP and MYSQL database.
And then after one year, I finished my project and made my decision to came back to Shanghai and working as DevOps engineer in Blackboard. The Blackboard is a company which provides online teaching and learning SAAS system for universities and companies.
My projects mainly include migrating the old CI/CD tool jenkins servers and jobs from on-premsie servers lab to AWS cloud, include AWS infrastructure and jenkins servers build with AWS Cloudformation and ansible playbook an IAC tool. Move jenkins master from servers to Docker, design and built new pipeline jobs to replace old multi-branch and free-style jobs, convert all old and manually created jenkins jobs to groovy and script code. And I also created and moved part of jenkins servers and jobs from ec2 and dockers to kubernets cluster, try to achieve the dynamic spinning up and removing jenkins agents based on buildout requirement. And beside of this. My daily operation job like covered scheduled release jobs to update client site with Chef cookbook and Jenkins pipeline job.
After that I landed my new opportunity in SAP Jam and Workzone team and working as Senior DevOps engineer.
Jam is more like company version content management system and social system. Compared to the job before, I have more opportunities to manage and operate on on-premises data centers, we have like more than 30 DCs globally.
Our application is build with ruby on rails in Docker, besides writing code and operate our DCs with Chef-solo. My most important project is build our Jam application with Kubernetes on AWS and Azure from scratch and convert part of monolithic services to micro service.
The whole procedures like, containerize our infrastructure components like Elasticsearch, RabbitMQ and Haproxy etc. Set up Kubernetes Cluster on Azure via AKS and on AWS with EKS with Terraform Code. Build out Cloud Infrastructure like CDN, S3(Storage account), RDS MySQL, Aurora, or Azure DBs and VNet with Terraform code.
Use HELM Chart and kustomise, with GitOps way and ArgoCD as CD tool to deploy all JAM components into Kubernetes Cluster.Use SealedSecret and ArgoCD to encrypt secrets and push into Git Repo to store our encrypted secrets.
Monitor Cluster with Prometheus Operator, writing or own ServiceMonitor and Metrics / Exporter and AlertManger to help us collect data from a variety of sources and report alert.
Get application log, operational log and audit log by EFK. And I also enabled Service Mesh inside the cluster. Install Istio and enable envoy sidecar inject in our main namespace and track the network tracing with Kiali & Jaeger and enable mutual X509 TLS service call from internal and external.
And after that, I’m trying to adopt security as service like hadolint, and apply it to pipeline, mainly help us to do the docker image scan in CI pipeline.
When I was wokring new Product called DWZ(Digital Work ZONE). And also trying to move part of our service components from Self maintenance like objectstore, rabbitmq and Log service to SAP BTP platform.
And In UBS, I worked in two branch companies. I worked in AM(Assesst management) team firstly, I mainly work as Cloud archiect, design and setup the hybrid cloud infra for the WIND market data service which helps onboard the service on Azure public cloud as upstram data resource and send the data to on-premise downstream AM trading system.
And after that, I was internal transfered to Global Investmen bank research team as Senior Devops Engineer.
Woked with global team to migrate global research analyst platform to China region mainly like, setup on-premise datacenter for different envionments like servers, LBers, NAS storage, and deployed middlewares and service like postgres, ES, kafka, nginx, tomcat etc with Gilab-CI as CI pipeline and designed new puppet modules as CD pipeline. To automate all buildout step to avoid any manual jobs
And aprart from this project. And now I'm work in gobal team to manage and operate the global systems. Like helped and setup OpenAI service in Azure cloud with terraform code, Azure Devops pipeline to automate the setup process. And Helped setup Azure translate service in HCI AKS cluster for Chinese-Eng translate service specific for China analyst
Recently I working on NEW project with Vault, Azure AD and PWM API to help automate password rotaion process to replace any manul jobs to rorate account password in yearly basis
Besides these, I passed AWS solutions architect test and azure solution expert test az305 and Certified Istio service mesh and gitlab CICD test,
I also create multiple tutorial books for my colleague this year mainly about Azure ,AWS, Elasticsearch, Reddis, Chef & Ansible, Istio So that’s all about me these couple years.
Quick version
I’m Jacob. I graduated from Stevens Institute of technology in Hoboken New Jersey US. And got my master’s degree of Computer Science. My major was data science and data system.
After I graduated, I took a job in startup company named Wrevel in New York city, working as backend developer to build the company website with PHP and MYSQL database.
And then after one year, I finished my project and made my decision to came back to Shanghai and working as DevOps engineer in Blackboard. The Blackboard is a company which provides online teaching and learning SAAS system for universities and companies
My projects mainly include migrating the old CI/CD tool jenkins servers and jobs from on-premsie servers lab to AWS cloud including AWS infrastructure and jenkins servers build with AWS Cloudformation and ansible playbook. Move jenkins master from servers to Docker, design and built new pipeline jobs to replace old multi-branch and free-style jobs, convert all old and manually created jenkins jobs to groovy and script code.
After that I landed my new opportunity in SAP Jam and Workzone team and working as Senior DevOps engineer.
Our application is build with ruby on rails in Docker, besides writing code and operate our DCs with Chef-solo. My most important project is build our Jam application with Kubernetes on AWS and Azure from scratch and convert part of monolithic services to micro service.
The whole procedures like, containerize our infrastructure components like Elasticsearch, RabbitMQ and Haproxy etc. Set up Kubernetes Cluster on Azure via AKS and on AWS with EKS with Terraform Code. Build out Cloud Infrastructure like CDN, S3(Storage account), RDS MySQL with Terraform code.
Use HELM Chart and kustomise, with GitOps way and ArgoCD as CD tool to deploy all components into Kubernetes Cluster.Use SealedSecret and ArgoCD to encrypt and deploy encrypted secrets
Monitor Cluster with Prometheus Operator, writing or own ServiceMonitor and Metrics / Exporter and AlertManger to help us collect data from a variety of sources and report alert.
Get application logs by EFK. And I also enabled Service Mesh inside the cluster. Install Istio and enable envoy sidecar inject in our main namespace and track the network tracing and enable mutual X509 TLS service call from internal and external.
And In UBS Shanghai I worked in AM(Assesst management) team firstly, I mainly work as Cloud archiect, work with Wind infra team and Microsoft team. Design and setup the hybrid cloud infra for the WIND market data service which helps onboard the service on Azure public cloud as upstram data resource and send the data to on-premise downstream AM trading system.
And after that, I was internal transfered to Global IB research team as Senior Devops Engineer.
Worked with global team to migrate global research analyst platform to China region mainly like, setup on-premise datacenter for different envionments with new designed Gilab-CI as CI pipeline and designed new puppet modules as CD pipeline. To automate all buildout steps
And aprart from this project. And now I'm work in gobal team to manage and operate the global systems. Like helped and setup OpenAI service in Azure cloud with terraform code, Azure Devops pipeline to automate the setup process.
Recently I completed one PWM project with Vault, With Azure AD and PWM API to help automate password rotation process to replace any manual jobs
to rotate account password in yearly basis
And right now I'm working with Micsoft azure team to build out Azure congnitive AI translate service in HCI AKS cluster for Chinese-Eng translate service for China analyst
Besides these, I passed AWS solutions architect test and azure solution expert test az305 and Certified Istio service mesh and gitlab CICD test,
I also create multiple tutorial books for my colleague this year mainly about Azure ,AWS, Elasticsearch, Reddis, Chef & Ansible, Istio So that’s all about me these couple years.
Why left the current team
Pursue New career path
I already work as devops eng over 8 years. I always want improve my career into a new level. I’m pursuing SRE because I’m passionate about reliability engineering. I deeply value my DevOps experience. It especially focus on automation, CI/CD, cloud migration, and new service collaboration. But SRE balances speed with reliability targets (SLIs/SLOs). I’d like to working on designing resilient systems to reducing error budgets, improve service performance, site reability and treating operations as a software problem.
This role can help me specialize in maximizing uptime and user experience through data-driven engineering—which aligns perfectly with my skills in observability and incident management."
Still is I can say our research application is quite heavy service and old setup and the core service is using vendor service. Thus the service is still setting on on-premise data centers. So most of tech stacks are using traditional setup not container. Exp, in next couple years, our most critical project is migrating services from redhat 7 to redhat 8 for all regions. So I hope I can find any opportunity to work on some projects with trendy tech stacks like containerlization, cloud, service mesh etc
I working in a global team, this team actually based in LDN.. And since I'm only one here in Shanghai. Although I enjoy my work alone with the my current team, while I have thrived in more collaborative environments. from what I have more chance to learn from each other here
So I have to work with them like meetings, discussion, troubleshootings every day late night
- After spend 3 years in this financial industry, I have dicovered my passion in financial tech a lot. And I truly belive that there is much more to explore in this industray rather in daily BAU operational job and want to double down on this descision to pursue it further. So I want have more opprtunity to get touch with new projects
-
I always want improve my career into a new level. I think this opportunity at your firm would be more leveled up. and better match what I can bring to the table. I'm looking for new solution architect job
-
In the financial company, I dont see my future here. Cause, comparing to new techs, codes or new solutions, they are more concern about procedures. Every day, I spend more times on writing documents, go-through steps, meeting with compliance team than working on my codes and my operations jobs. And I dont' even have any chance to learn new techs here. Thus I feel like I gonna be wasted in this team for longer time.
-
I really enjoyed our initial conversation learning from company, although I wasn't actively looking for new change at the time,it really felt like this new role entails exciting challenges that I would like to be a part of. i'm truly intrigued by this career development at your firm and believe that my skill set largely align with what you are looking for
Common Questions
What do you think of overtime work?
I recognize that overtime can be necessary in certain situations, like urgent deadlines, and I'm committed to supporting the team when it truly adds value.
That said, I believe consistent overtime often indicates inefficiencies. My focus is on maximizing productivity during core hours through prioritization, streamlined workflows, and clear planning.
This way, I deliver high-quality results efficiently while maintaining sustainable work rhythms.
What's your main consideration in your current job hunt?
- growth opportunities- I want work that challenges me just beyond my current skills, pushing me to learn (like taking on cross-functional projects or mastering new tools).
- alignment with the company's mission-I need to see how my role directly contributes to the bigger picture, so my work feels meaningful, not just task-oriented.
- team culture-a collaborative, supportive environment where feedback flows freely and mistakes are seen as learning chances. These factors help me grow sustainably and stay motivated long-term.
What's your biggest weakness?
My biggest weakness is the fact that I don't yet have much experience in talking to large groups of people.
This is something I'm very keen to improve on, so if any opportunities were to arise in this position to gain experience in this area, I would certainly want to take advantage of that.
Do you have any questions for us?
-
What's the first thing you will need me to concentrate on in this role if I am successful?
-
What opportunities are there for professional development and growth in this position?
3. Can you please tell me more about the team I would be a part of in this role?
-
What's the culture like within this organization?
-
What are the biggest challenges your company is facing right now, and what could I do to help you overcome them?
-
What would my success in the position look like in 12 months from now?
-
What is it the top-performing employees do in this company to achieve success?
8. What are the next steps of the selection process, and when can I expect to hear from you?
What do you see in Five years
1. In five years from now, I will hopefully still be working for you, either in this role or perhaps even having gained advancement to a higher level.
- In five years, I will have developed into a trusted, loyal, and committed member of the company who can be relied upon to do a great job for you
3. I feel in five years' time, I will have sufficient experience and expertise to help train up newer members of the organization as and when they join
Project blocker and resolver
Actualluy happened in SAP. I introduced, brought in and set up the open-source monitoring tool Prometheus and Grafana as our Kubernetes cluster monitoring and alerting solution. Before it, we are using the traditional monitor tool Zabbix and it was set up by OPS team which only provide basic info like server CPU, memory, network throughout, disk info etcs.
We are couple blocked in this project,
First no one in our team has experience to set up the whole monitor, view, and alerting system from scratch.
Second, most of them do not has faith in open-source solution and they prefer enterprise solutions. like zabbix or datadog
However, on cloud native infrastructure, Prometheus solution has more advantages especially it not only can used to monitor the basic infra and it also be easily customized to monitor different services, like cache service, queue service, and search services. Compared to enterprise solution, it’s more flexible, we can have full management privilege, and it’s the most prevalent monitor solution currently.
After I give my teams the detail intro for this solution. I started to make proof of concept(POC) on the project, I choose one of our internal test Kubernetes cluster works as my demo environment to test the availability.
In the poc part, I also encounter some blocks like I’m not expert for all service, so I need to consult to developer who write the service, and with the help to customize the metrics for services.
And after the couple months, the whole system setup, which provide much more stability for our internal tool and we can easily do the troubleshooting and receive the alarm from alert system. Then we adopt the solutions and push this solution to our main production service cluster. Everyone in our team started to learn it and got they own task to write different metrics for the all microservices.
And after setup, I also brought in Loki, Grafana, promtail solution as Log aggregrator easiy working together with prometheus solution. More like two birds with one stone
2.Self Description
- I consider myself as a good leaner, I’m willing to learn all new stuffs. I learn things efficiently and effectively. You can check my github, I push my tech articles into it almost everyday. And every time when I faced a project, there some techs I never touched before. I don’t just jump into this project, I’m willing to spend 2 or 3 days to explore this tech. I can always deliver high quality project
- I consider myself more extrovert than introvert, I prefer to work in a team rather than work alone. I’d like share my opinion and listen to others' advices on the job.
- Im pretty good at time management, Im never missed deadline. I don’t like procreations, if the jobs or tickets must be done in a sprint, I’d like do it right way, not holding it back to the due day. So I can alway deliver assigned issues on time no holding back at all
Describe yourself in 3 words.
resilient, knowledgeable, and adaptable.
-
I'm resilient because I never get stressed in difficult situations, and I can easily prioritize tasks regardless of how many I have.
-
I am knowledgeable in this industry because I have applicable academic qualifications, and I also have experience in similar job-related roles.
-
Finally, I am adaptable because I am the type of person who will always take on duties outside of my job description, and I will also help out your business at short notice as and when needed. So, for example, if you need me to work extra hours or duties, I will always do that, and I will be prepared to cover the work of co-workers who are ever off work sick.
3.What is Devops definition
DevOps is the gray area between development (Dev) and operations (Ops) teams in the product development process. DevOps is a culture that emphasizes communication, integration, and collaboration in the product development cycle.
As such, it removes barrier between software development and operations teams, enabling them to integrate and deploy products quickly and continuously.
- Accept failure as normal, try to spot it and resolve it on time
- Reduce the cost of failure, so try to implement gradual changes, like migration, update on system
- Build automation tool to reduce the uncertainty of manual work, like build the cicd pipeline.
- Measure everything, build the monitor-tools, log aggregator and alert system to help to supervise whole system
- Reduce the cost.
- Reduces the time required to recover from failures on average.
- Increase the frequency of deployments.
- Reduce deployment failure rate.
4 Well-Architected Framework
- Security
- Reliability
- Performance Efficiency
- Cost Optimization
- Operation
1.Security
Data protection:
- Encrypting and protecting your data at rest with encryption algorithm and in transit with ssl
- Versioning, for example, our most data was stored in S3 bucket in AWS, we use versioning, one data with different version, can protect against accidental overwrites deletes.
Privilege Management
- Password Management (such as password rotation policies)
- Enable Multi-Factor Authentication(MFA) in your account to login into our cloud service, we use DUO on our phone to send notification every time, we want to login into the account.
- Role Based Access Controls, we give every server like ec2 different role, which can restrict they can only access different resource necessary
- Access Control Lists (ACLs), different operators, only get temporary access to certain servers only authorized by managers
Infrastructure Protection
We set our servers in different VPC, AZ, security group, public subnet, private subnet, and only can access them by jump host/ bastion host
Detective Controls
We use cloudtrail audit and monitor on all account, you can discover and troubleshoot security and operational issues by capturing a comprehensive history of changes that occurred in your AWS account within a specified period of time.
And use cloudwatch and sms to monitor the aws resource and use prometheus and EKF on our clusters
2. Reliability (everything turned to code)
- Automatically recover from failure
- Scale horizontally
Automatically recover from failure
- AWS Servers like EC2, we use cloudformation to put all servers settings into code
- AWS RDS, We use auto-back and Multi-AZ which can synchronically backup the database into other secondary RDS to protect rds failure
- other servers like Kubernetes cluster or other virtual machine, we use terraform to put all servers settings into code
Scale horizontally
- aws we use auto scaling group for our EC2 servers
- HPA and deployment for our containers
- backup all logs into s3 bucket
3. Performance Efficiency
- Compute= =>. Autoscaling
- Storage => type of EBS, SSD HDD storage compare their connections and IOPS
- Database => read replicas, dynamodb for web sessions, metadata, Aurora, multiple write and read
- time-space trade => ElastiCache(redis) and cache for redis, Direct Connect
4.Cost Optimization
- Monitoring usage and spending every month
- Autoscaling
- decommission resources that you no longer need, stop resources that are temporarily not needed
- Merge low used resource, like some rds, we used shared rds instead.