automate hadoop cluster deployment in a banking ecosystem

24
Hellmar Becker, ING Continuous Lifecycle London Automate Hadoop Cluster Deployment in a Banking Ecosystem Lessons from Practice May 4, 2016

Upload: hellmar-becker

Post on 09-Jan-2017

182 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Hellmar Becker, INGContinuous Lifecycle London

Automate Hadoop Cluster Deployment in a Banking Ecosystem

Lessons from Practice

May 4, 2016

Page 2: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Who am I?

2

Page 3: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Automate Hadoop Cluster Deployment in a Banking Ecosystem

3

The GoalPrelude: Hadoop Patterns in INGChapter 1: First StepsChapter 2: StandardizingChapter 3: The CloudConclusionQuestions

Page 4: Automate Hadoop Cluster Deployment in a Banking Ecosystem

The Goal

IN WHICH we look at the challenges that a bank has to face in the 21stcentury, and how this translates into decisions made in the IT landscape.

Page 5: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Market leaders Benelux

Growth marketsCommercial Banking

Challengers

The world of ING – Data Driven Since 1881

5

Customers

33 MillionPrivate, Corporate and Institutional

Customers

Countries

41 In Europe, Asia,

Australia, North and South America

Employees

52,000

Buter, Bart
Move CustomersCountriesEmployees here
Page 6: Automate Hadoop Cluster Deployment in a Banking Ecosystem

6

Page 7: Automate Hadoop Cluster Deployment in a Banking Ecosystem

We accelerate through the Concept of One

7

Provide standardized and easy to use global capabilities and services

Accelerate strategy and concentrate on business value

Concept of One

Page 8: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Prelude: Hadoop Patterns in ING

8

IN WHICH we describe the journey of some interesting characters that set out to get Hadoop adopted within a large, venerable institution, and across the world.

Page 9: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Data Lake and Advanced Analytics within ING

9

External and internal reporting for own or regulatory purposes

Integrate all data sources within the bank into one processing platform

• Batch data streams• Live transactions• Model building for customer

interaction

Better understand customer needs in an increasingly digital world

Data can help us offering tailored products and services

Empower data scientists and analyststo get the best results with advancedanalytics tools and predictive models

Open source software where possible – Hadoop as a core component

Page 10: Automate Hadoop Cluster Deployment in a Banking Ecosystem

1. File Storage

2. Deep Data

3. AnalyticalHadoop

4. Real Time

Hadoop Usage Patterns

10

Page 11: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Analytical Hadoop• Our first use case• Development and Production environments• P environment has Production level security but Test level SLA

FileStore and Deep Data• Completely automated• Full DTAP street (Development, Test, Acceptance, Production)

Patterns and their maintenance

11

Page 12: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• Vendors give us tools to do a GUI based install• Maintain several clusters in parallel, DTAP!• Auditability!• Not for us, we need to do automated installs• APIs and scripting facilities do exist, but are often poorly tested and

documented

Standard installation doesn’t cut it

Page 13: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• Layers – IaaS, PaaS, Application (we want IaaS not PaaS)• Organizational divide: Platform team vs. Infra team• Different privileges• Different tool choices• Trust and collaboration need to be actively built• Convince security audit teams!

Organizational challenge

Page 14: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Chapter 1: First Steps

IN WHICH a first expedition ventures into uncharted territory, encounters strange monsters and reconsiders their equipment.

Page 15: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• First take by Exploration teams (Analytical Pattern)• Unusual Ops mode: No Production system (although we use production data)• Install everything with Ansible• YAML based, ssh based access• All text files. Easy to put in git and to document• The Power of Root

• Great power and flexibility• Risk people and GUI users do not like it• You are on your own• We tried to learn from this!

Tooling part 1

Page 16: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Chapter 2: Standardizing

16

IN WHICH a larger party sets out with better equipment, reaches the shores of a new world but finds that still, much is to be improved.

Page 17: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• Now we needed a Datalake integrated solution with full support• Also need a full DTAP streetInfra team has legacy tooling (proprietary tools) but limited flexibility.• Basically, we roll our specific configuration into homemade rpm packages.

Tool choice for application deployment: CA Lisa aka Nolio• GUI based• No version control (tagging added as an afterthought)• Slow and awkward to use• Dumbed down by organizational restrictionsConclusion: Don’t go there!

Implementing the FileStore and DeepData patterns

Page 18: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• By then, we had a lot of structure to help us• Standardized build server with GitBlit, Artifactory, Jenkings• Agile Way of Working

• Now deployment is a split approach• Infra parts use TEM (and Ambari blueprint) to deploy full Hadoop stack• On top of the stack we deploy our own applications with Nolio

• Handovers CIO-Infra still hurt us• We do have: Deployment on a given system at the press of a button• We do have: Automatic propagation of Git changes into Artifactory via Jenkins• We do not (yet) have: Automated propagation D->T->A->P via Jenkins

Implementing the FileStore and DeepData patterns

Page 19: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Chapter 3: The Cloud

IN WHICH our heroes learn from the cloud experience and from explorers around the world, and make deployment a safe experience for everyone.

Chapter 3: The Cloud

Page 20: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• ING Private Cloud: is essentially Datacenter v2.0• However, we get the chance to rethink our tooling• Puppet integrates nicely with RH Satellite and is used to provision PaaS

solutions• Ansible is gaining ground in the internal discussion• External Ansible community: Meetup grown a lot over the last year. Now more

than Puppet and Chef combined• ING has an initiative to come up with a standardized way to deploy packaged

software, based on Ansible

The Cloud

Page 21: Automate Hadoop Cluster Deployment in a Banking Ecosystem

Conclusion

Page 22: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• Be aware: Deployment of mostly prepackaged software is different from developing your own software

• Full automation might not be needed because we do not change as quickly as e.g. mobile app

• Use tools that are scriptable• GUIs suck• Own your stack

Conclusion

22

Page 23: Automate Hadoop Cluster Deployment in a Banking Ecosystem

QuestionsQuestions

Questions

Page 24: Automate Hadoop Cluster Deployment in a Banking Ecosystem

• Crane Gears by Kevin Utting is licensed under CC BY 2.0• Hellmar in Nîmes / With Python in Mindanao, by the author• Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0

Attributions

24