1. Data Pipeline:
Big Data meets Salesforce
Carolina Ruiz Medina
Principal Developer on Product Innovation
cruiz@financialforce.com
@carolenlanube
Agustina García Peralta
Principal Developer on Platform Strategy
agarcia@financialforce.com
@agarciaodeian
4. About
GREAT ALONE. BETTER TOGETHER.
• Native to Salesforce App Cloud since 2009
• Investors include Salesforce Ventures
• Customers in 27 countries
• 650+ employees, San Francisco based
• Dreamforce.FinancialForce.com
5. Agenda
• Data Pipeline - Overview
• Pipeline Use Cases
• How Pipeline works – Demos
• Big Data
• Take away
• Q&A
6. Asynchronous apex
• @future
• Queueable
• Batch Apex
• Flex Queue (since Summer ’15)
Common scenario – Large amount of data
7. • Any other option?
• Data Pipeline: New feature to integrate Apache Pig into Salesforce
Common scenario – Large amount of data
8. • What does it do?
• Process massive amounts of data in parallel.
• Key elements
• MapReduce software to write programs to run amounts of data in parallel
• Hadopp cluster cluster for storing and analyzing amounts of data
Apache Pig Background
Enables Developers to create executions for
analyzing LARGE AMOUNT of data
in PARALLEL
9. • How does it work?
• It uses Pig Latin
• Data-flow language
• Between SQL and Java
• We can create our own UDF (user – define functions)
Apache Pig Background
10. • Why is it relevant?
• Technology associated with Hadoop but can be used by other frameworks Salesforce
• Is there anything unique to Apache Pig running in Salesforce?
• Running in multitenant environment
Apache Pig Background
11. • Under Pilot program GA by Summer ‘16 (Safe Harbor)
• How does Data Pipeline work?
• Run Pig Scripts written in Pig Latin language
What is Data Pipeline?
Data Pipeline Pig Script
Apex?
12. • Execution feature
• Run asynchronously
• In Parallel
• From where?
• Developer Console
• During deploy
• Tooling API 33.0 onwards
What is Data Pipeline?
13. • Anything else?
• It is an ETL (Extract – Transform – Load)
• Pig Scripts can be included into a package
What is Data Pipeline?
15. 1 . Performance
Data Pipeline – Advantages vs other processes
2 . Ability to Execute Scripts in Parallel
3 . No hitting governor Limits
4 . De-couple On-line Transaction
Processing and On-line Analytical
Processing
5 . Allows you to think in terms of
data flow
16. How Pipeline can help us?
…. and we need to process
them Now!
We have a large volume of
Financial Transactions
…. for our Users to be able to
use them: Report, print, or for
another quick process to finish
revaluate
Prepare data
for Currency
Revaluation
SObject SObject
to
17. How Pipeline can help us?
…. and we need to process
them Now!
We have a large volume of
Financial Transactions
…. for our manager to look the
progress, to export data
quickly...
Extracting
information
from large
amount of Data
SObject Fileto
18. To build the Solution lets See Pig Script first
What is Pig Script ?
Operators
JOIN
GROUP
DISTINCT
ORDER
…
35. Big Data – Big Objects
Custom Object Big Object
Creation Manual & Metadata Metadata
API name myObject__c myObject__b
Enable Reports, Track Activities,
Track Field History, etc. Options Available Options No Available
Field Types All Text ; Date/Time ; Lookup
Numbers!!!
36. Big Data – Big Objects
Custom Object Big Object
Able to edit / delete fields? Yes No
Triggers; Field Sets; etc Options Available Options no Available
37. Big Data – Big Objects
Custom Object Big Object
How to Populate records All options Bulk API; SOAP API; Data Pipeline
Can I amend a record? Yes No Only clone is available
Can I see data creating a Tab Yes No Only via SOQL
For free? Yes No Talk with Salesfoce about it
Storage? It count against storage limitation
It DOES NOT count against the
storage limitation
Yes!!
39. • Size complexity 20 operators, 20 loads and 10 stores / script
• Run up to 30 scripts a day
• Bulk API
• Store calls it and its limits are in place
• Does not support some operators like Count
• Can’t break the rules on Salesforce Platform triggers, validations, required fields, etc…
• Once you run the process there is no way back
Data Pipeline - Limitations
40. Data Pipeline – Take away
1. New Feature is in Pilot
2. Run Scripts via:
Developer Console
Deploy
Tooling API ( since API 33.0)
3. Run Scripts Asynchronously and in Parallel
4. Better performance
5. Easy to use!!
41. Q&A
ISV Scale: Big Data for ISV – 4pm
Park Central Hotel, Franciscan Ballroom
42. • https://pig.apache.org/
• http://goo.gl/h5N7Sa
• https://goo.gl/KXQSKC
Links and more
Carolina Ruíz Medina
cruiz@financialforce.com
@CarolEnLaNube
@CodeCoffeeCloud
www.codeandvoge.com
http://www.meetup.com/es/South-Spain-
Salesforce-Developer-Group/
Agustina García Peralta
agarcia@financialforce.com
@agarciaodeian
www.agarciaodeian.com
http://www.meetup.com/es/Spain-Salesforce-
Developer-User-Group/
First, a few quick words about FinancialForce.com.
FinancialForce.com builds ERP apps that are native to the Salesforce App cloud including Accounting, professional services automation, Human resources and Inventory applications. Our apps can be subscribed to separately or part of a whole ERP family.
Our company investors include Salesforce Ventures, which made their original investment in us in 2009.
We have customers all around the world in 27 countries and over 650 employees including those at our headquarters on 595 Market St. here in San Francisco.
We have quite few sessions and parties planned here this week, you can learn more about those at Dreamforce.Financialforce.com. Feel free to join us.