automate hadoop cluster deployment in a banking ecosystem
TRANSCRIPT
Hellmar Becker, INGContinuous Lifecycle London
Automate Hadoop Cluster Deployment in a Banking Ecosystem
Lessons from Practice
May 4, 2016
Who am I?
2
Automate Hadoop Cluster Deployment in a Banking Ecosystem
3
The GoalPrelude: Hadoop Patterns in INGChapter 1: First StepsChapter 2: StandardizingChapter 3: The CloudConclusionQuestions
The Goal
IN WHICH we look at the challenges that a bank has to face in the 21stcentury, and how this translates into decisions made in the IT landscape.
Market leaders Benelux
Growth marketsCommercial Banking
Challengers
The world of ING – Data Driven Since 1881
5
Customers
33 MillionPrivate, Corporate and Institutional
Customers
Countries
41 In Europe, Asia,
Australia, North and South America
Employees
52,000
6
We accelerate through the Concept of One
7
Provide standardized and easy to use global capabilities and services
Accelerate strategy and concentrate on business value
Concept of One
Prelude: Hadoop Patterns in ING
8
IN WHICH we describe the journey of some interesting characters that set out to get Hadoop adopted within a large, venerable institution, and across the world.
Data Lake and Advanced Analytics within ING
9
External and internal reporting for own or regulatory purposes
Integrate all data sources within the bank into one processing platform
• Batch data streams• Live transactions• Model building for customer
interaction
Better understand customer needs in an increasingly digital world
Data can help us offering tailored products and services
Empower data scientists and analyststo get the best results with advancedanalytics tools and predictive models
Open source software where possible – Hadoop as a core component
1. File Storage
2. Deep Data
3. AnalyticalHadoop
4. Real Time
Hadoop Usage Patterns
10
Analytical Hadoop• Our first use case• Development and Production environments• P environment has Production level security but Test level SLA
FileStore and Deep Data• Completely automated• Full DTAP street (Development, Test, Acceptance, Production)
Patterns and their maintenance
11
• Vendors give us tools to do a GUI based install• Maintain several clusters in parallel, DTAP!• Auditability!• Not for us, we need to do automated installs• APIs and scripting facilities do exist, but are often poorly tested and
documented
Standard installation doesn’t cut it
• Layers – IaaS, PaaS, Application (we want IaaS not PaaS)• Organizational divide: Platform team vs. Infra team• Different privileges• Different tool choices• Trust and collaboration need to be actively built• Convince security audit teams!
Organizational challenge
Chapter 1: First Steps
IN WHICH a first expedition ventures into uncharted territory, encounters strange monsters and reconsiders their equipment.
• First take by Exploration teams (Analytical Pattern)• Unusual Ops mode: No Production system (although we use production data)• Install everything with Ansible• YAML based, ssh based access• All text files. Easy to put in git and to document• The Power of Root
• Great power and flexibility• Risk people and GUI users do not like it• You are on your own• We tried to learn from this!
Tooling part 1
Chapter 2: Standardizing
16
IN WHICH a larger party sets out with better equipment, reaches the shores of a new world but finds that still, much is to be improved.
• Now we needed a Datalake integrated solution with full support• Also need a full DTAP streetInfra team has legacy tooling (proprietary tools) but limited flexibility.• Basically, we roll our specific configuration into homemade rpm packages.
Tool choice for application deployment: CA Lisa aka Nolio• GUI based• No version control (tagging added as an afterthought)• Slow and awkward to use• Dumbed down by organizational restrictionsConclusion: Don’t go there!
Implementing the FileStore and DeepData patterns
• By then, we had a lot of structure to help us• Standardized build server with GitBlit, Artifactory, Jenkings• Agile Way of Working
• Now deployment is a split approach• Infra parts use TEM (and Ambari blueprint) to deploy full Hadoop stack• On top of the stack we deploy our own applications with Nolio
• Handovers CIO-Infra still hurt us• We do have: Deployment on a given system at the press of a button• We do have: Automatic propagation of Git changes into Artifactory via Jenkins• We do not (yet) have: Automated propagation D->T->A->P via Jenkins
Implementing the FileStore and DeepData patterns
Chapter 3: The Cloud
IN WHICH our heroes learn from the cloud experience and from explorers around the world, and make deployment a safe experience for everyone.
Chapter 3: The Cloud
• ING Private Cloud: is essentially Datacenter v2.0• However, we get the chance to rethink our tooling• Puppet integrates nicely with RH Satellite and is used to provision PaaS
solutions• Ansible is gaining ground in the internal discussion• External Ansible community: Meetup grown a lot over the last year. Now more
than Puppet and Chef combined• ING has an initiative to come up with a standardized way to deploy packaged
software, based on Ansible
The Cloud
Conclusion
• Be aware: Deployment of mostly prepackaged software is different from developing your own software
• Full automation might not be needed because we do not change as quickly as e.g. mobile app
• Use tools that are scriptable• GUIs suck• Own your stack
Conclusion
22
QuestionsQuestions
Questions
• Crane Gears by Kevin Utting is licensed under CC BY 2.0• Hellmar in Nîmes / With Python in Mindanao, by the author• Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0
Attributions
24