Session 2: Financial Data Collection & APIs - Your Gateway to Market Data#
Learning Objectives#
By the end of this session, you will be able to:
Understand why different data sources exist and how they serve different needs in finance
Set up and use free financial APIs including yfinance, Alpha Vantage, and financialdatasets.ai
Write simple code to download stock data without complex programming
Handle common data issues like missing data and API limits
Compare data from multiple sources to ensure accuracy
Present your data collection process clearly in a professional video
Section 1: The Financial Hook#
The $50,000 Data Mistake#
In March 2023, a junior analyst at a small hedge fund made a critical error. She downloaded Tesla’s stock price from a free website that hadn’t adjusted for a stock split. Her analysis showed Tesla trading at $600 per share when it was actually at $200 (post-split adjusted).
The fund almost made a $50,000 trade based on this wrong data before a senior analyst caught the error.
The lesson: In finance, bad data leads to bad decisions. And bad decisions cost money.
Why Data Quality Matters#
Professional investors spend millions on data because:
Accuracy drives profits: A 1% data error can mean millions in losses
Speed matters: Getting data 1 second faster can mean capturing opportunities
Completeness counts: Missing dividends or splits destroys return calculations
Your Data Journey#
Today you’ll learn to:
Access professional-grade financial data (for free!)
Understand why data sources differ
Build simple but reliable data collection systems
Validate data like a professional
Real-World Application Timeline#
Week 1: Manual data collection (like Session 1)
Week 2: Automated data with APIs (today!)
Week 3: Multiple sources for validation
Week 4: Building data pipelines
Career: Managing \$millions based on data quality
Section 2: Foundational Financial Concepts & Models#
Understanding Financial Data Sources#
Types of Financial Data:
Price Data
Open, High, Low, Close (OHLC)
Volume traded
Adjusted vs. unadjusted prices
Fundamental Data
Financial statements
Earnings reports
Company metrics
Alternative Data
News sentiment
Social media mentions
Satellite imagery (yes, really!)
The Data Provider Ecosystem#
FREE TIER (Good for Learning)
├── Yahoo Finance (yfinance)
│ └── Most popular, reliable for basics
├── Alpha Vantage
│ └── Good API, requires free key
└── Financial Datasets AI
└── New, AI-focused, generous free tier
INSTITUTIONAL (What Professionals Use)
├── Bloomberg Terminal ($24,000/year)
├── Refinitiv Eikon ($22,000/year)
└── S&P Capital IQ ($20,000+/year)
Key Concept: Adjusted vs. Unadjusted Prices#
Unadjusted Price: The actual price on that day Adjusted Price: Accounts for splits and dividends
What is a Stock Split? A stock split is when a company divides its existing shares into multiple shares. In a 4-for-1 split, each share becomes 4 shares, but the total value stays the same (like cutting a pizza into more slices - more pieces, same amount of pizza).
Example: Apple’s 4-for-1 split in 2020
Unadjusted: \(500 → \)125 (looks like 75% loss!)
Adjusted: Both show as $125 (no change)
Always use adjusted prices for return calculations!
API Basics (No Programming Experience Needed!)#
What’s an API? API stands for Application Programming Interface. In simple terms:
Think of it as a data vending machine
You send a request (put in money and select your item)
You get data back (receive your snack)
Real-world analogy: When you use a weather app on your phone, it doesn’t generate weather data—it asks a weather service’s API for the data and displays it nicely. Financial APIs work the same way: your code asks for stock prices, and the API sends them back.
Three Simple Steps:
Install the tool (one-time setup)
Ask for data (ticker and dates)
Receive your data (prices, volumes, etc.)
🎯 AI Learning Support - Understanding APIs#
Learning Goal: Understand what APIs are and why they matter for finance professionals.
Starting Prompt: “What is an API?”
🚀 Hints to Improve Your Prompt:
Ask for a relatable analogy (restaurant, library, etc.)
Specify you need finance context
Include what you already understand
Ask about authentication (API keys)
💡 Better Version Hints:
Ask about free vs. paid APIs in finance
Request examples of financial data APIs
Inquire about rate limits and costs
Ask how APIs differ from downloading CSV files
🎯 Your Challenge: Create a prompt that helps you understand not just what APIs are, but why financial firms pay thousands of dollars for premium API access.
Section 3: The Financial Gym - Partner Practice & AI Copilot Learning#
Solo Warm-Up (15 minutes)#
Exercise 1: Your First API Call
# THE SIMPLEST POSSIBLE API CALL
# No functions, no classes, just three lines!
import yfinance as yf
# Download Apple stock data
apple_data = yf.download('AAPL', start='2024-01-01', end='2024-01-31')
# Look at what we got
print(apple_data.head())
What each line does:
Line 1: Import the tool (like opening Excel)
Line 2: Download data (like clicking “Download”)
Line 3: Show first 5 rows (like viewing a spreadsheet)
AI Copilot Learning Phase (20 minutes)#
Step 1: Understanding Data Differences
Try this experiment:
# Get Microsoft data from yfinance
msft = yf.download('MSFT', start='2024-01-01', end='2024-01-05')
print("Yahoo Finance MSFT Close:")
print(msft['Close'])
Now ask AI: “Why might Microsoft’s closing price be slightly different on different financial websites for the same date?”
Step 2: Exploring API Options
🚀 Professional Prompt Sample A (Grade: A): “I’m learning about financial APIs. I know about yfinance for basic data. What are the pros and cons of Alpha Vantage vs financialdatasets.ai for a student learning financial modeling? I need to understand which to learn first.”
❌ Weak Prompt Sample (Grade: D): “List all financial APIs.”
Reciprocal Teaching Component (25 minutes)#
Partner Exercise: API Comparison
Partner A (10 min):
Explain what an API is using a real-world analogy
Show your partner how to make a simple yfinance call
Explain what “adjusted close” means
Partner B (10 min):
Explain why we need API keys for some services
Show what happens when an API call fails
Explain the difference between real-time and historical data
Together (5 min):
Discuss: Which API would you use for a retirement portfolio analysis? Why?
Collaborative Challenge (20 minutes)#
Build a Simple Data Collector
# SIMPLE DIVIDEND DATA COLLECTOR
# Work together to complete this code
# Step 1: Import tool
import yfinance as yf
# Step 2: Pick three dividend stocks
stock1 = 'JNJ' # Johnson & Johnson
stock2 = 'KO' # Coca-Cola
stock3 = 'PG' # Procter & Gamble
# Step 3: Get data for each stock (you complete this!)
# Hint: Use yf.download() like in the example above
# Step 4: Compare closing prices
# Which stock has the highest price?
# Which moved the most in January?
💡 AI Learning Support - Troubleshooting#
Learning Goal: Learn to debug common API issues independently.
Starting Prompt: “I’m getting an error with my API call. Help!”
🚀 Improvement Hints:
Include the exact error message
Specify which API and what data you’re trying to get
Mention what you’ve already tried
Ask for systematic troubleshooting steps
💡 Advanced Hints:
Ask about common API error patterns
Request rate limit handling strategies
Inquire about fallback options
Ask how professionals handle API outages
🎯 Your Challenge: Create a prompt that would help you debug API issues like a senior developer, including preventive measures.
Section 4: The Financial Coaching - Your DRIVER Learning Guide#
Complete DRIVER Case Study: Building a Dividend Aristocrat Screener#
What is a Dividend Aristocrat? A Dividend Aristocrat is a company in the S&P 500 that has increased its dividend payment every year for at least 25 consecutive years. These are considered the most reliable dividend-paying companies because they’ve proven they can raise dividends through recessions, market crashes, and economic uncertainty. Examples include Coca-Cola, Johnson & Johnson, and Procter & Gamble.
Scenario: You’re interviewing for a financial analyst position. The interviewer asks: “Show me how you’d identify the best dividend-paying stocks using Python. Keep it simple - I want to see your thought process, not complex code.”
D - Define & Discover#
Understanding the Problem
What makes a great dividend stock?
Consistent dividend payments
Growing dividends over time
Sustainable payout ratios
Strong company fundamentals
What is a Payout Ratio? The payout ratio tells you what percentage of a company’s earnings are paid out as dividends. For example, if a company earns $4 per share and pays $1 in dividends, the payout ratio is 25% (1/4 = 0.25). A sustainable payout ratio is typically below 60% - this means the company keeps enough earnings to grow while still rewarding shareholders.
🎯 AI Learning Support - Problem Exploration
Learning Goal: Understand key metrics for dividend stock analysis.
Starting Prompt: “What metrics should I use for dividend stocks?”
🚀 Enhancement Hints:
Specify you’re building a screener
Mention metrics you already know
Ask for priority ranking
Include your investment timeframe
💡 Professional Hints:
Ask about dividend sustainability metrics
Request sector-specific considerations
Inquire about red flags to avoid
Ask about institutional criteria
🎯 Your Challenge: Develop a prompt that gets you institutional-grade screening criteria, not just basic metrics.
Design Criteria:
Must work with free APIs only
Code simple enough to explain line-by-line
Focus on 5-10 well-known dividend stocks
Compare at least 2 data sources
R - Represent#
Logic Flow (No Complex Diagrams!)
1. Pick dividend stocks to analyze
↓
2. Download price data
↓
3. Download dividend data
↓
4. Calculate dividend yield
↓
5. Rank stocks by yield
↓
6. Validate with second source
🚀 AI Learning Support - Planning
Learning Goal: Plan before coding.
Starting Prompt: “Help me plan my dividend screener.”
🚀 Better Planning Hints:
List the steps you’ve already thought of
Specify your simplicity constraints
Ask about missing components
Request validation checkpoints
💡 Professional Elements:
Ask about error handling steps
Request data quality checks
Inquire about output formatting
Ask about documentation needs
🎯 Your Challenge: Create a prompt that helps you build a plan so clear that another student could implement it.
I - Implement#
Simple Dividend Screener
# DIVIDEND ARISTOCRAT SCREENER
# Professional-grade but simple enough for beginners
# Step 1: Import our tool
import yfinance as yf
# Step 2: List of famous dividend stocks
dividend_stocks = ['JNJ', 'KO', 'PEP', 'PG', 'MMM']
stock_names = ['Johnson & Johnson', 'Coca-Cola', 'PepsiCo', 'P&G', '3M']
# Step 3: Collect data for each stock
print("=== DIVIDEND STOCK ANALYSIS ===\n")
for i in range(len(dividend_stocks)):
# Get stock data
ticker = dividend_stocks[i]
name = stock_names[i]
# Download last 1 year of data
stock = yf.Ticker(ticker)
info = stock.info
# Get current price and dividend
current_price = info.get('currentPrice', 0)
annual_dividend = info.get('dividendRate', 0)
# Calculate dividend yield
if current_price > 0:
dividend_yield = (annual_dividend / current_price) * 100
else:
dividend_yield = 0
# Display results
print(f"{name} ({ticker}):")
print(f" Current Price: ${current_price:.2f}")
print(f" Annual Dividend: ${annual_dividend:.2f}")
print(f" Dividend Yield: {dividend_yield:.2f}%")
print()
💻 AI Learning Support - Code Understanding
Learning Goal: Understand defensive coding practices.
Starting Prompt: “What does .get() do in Python?”
🚀 Context Improvements:
Include the specific code line
Ask why .get() vs. bracket notation
Inquire about the second parameter
Ask when to use each approach
💡 Deeper Understanding:
Ask about error prevention benefits
Request financial data examples
Inquire about None vs. 0 defaults
Ask about production best practices
🎯 Your Challenge: Create a prompt that helps you understand not just syntax, but why defensive coding matters when handling financial data.
V - Validate#
Cross-Check with Alpha Vantage
# VALIDATION: Check one stock with different source
# First, get your free API key from: https://www.alphavantage.co/support/#api-key
import requests
# Alpha Vantage setup (replace with your key)
api_key = 'YOUR_FREE_API_KEY'
symbol = 'JNJ'
# Build the URL (like a web address for data)
url = f'https://www.alphavantage.co/query?function=OVERVIEW&symbol={symbol}&apikey={api_key}'
# Get the data
response = requests.get(url)
data = response.json()
# Extract dividend information
av_dividend = float(data.get('DividendPerShare', 0))
av_yield = float(data.get('DividendYield', 0)) * 100
print(f"\nValidation for {symbol}:")
print(f"Alpha Vantage Annual Dividend: ${av_dividend:.2f}")
print(f"Alpha Vantage Dividend Yield: {av_yield:.2f}%")
🔍 AI Learning Support - Data Validation
Learning Goal: Understand data discrepancies across sources.
Starting Prompt: “Why are my numbers different from different APIs?”
🚀 Specific Improvements:
Include actual numbers and sources
Specify the type of data (dividends, prices)
Mention the size of discrepancy
Ask about materiality thresholds
💡 Professional Validation:
Ask about common causes of differences
Request industry tolerance standards
Inquire about reconciliation methods
Ask when differences matter vs. don’t
🎯 Your Challenge: Develop a prompt that helps you understand when data differences are errors vs. acceptable variations.
E - Evolve#
Extend Your Analysis
What you’ve learned can grow into:
Screening 50+ stocks automatically
Adding payout ratio analysis
Historical dividend growth rates
Sector-based comparisons
Automated daily updates
🎯 AI Learning Support - Pattern Recognition
Learning Goal: Identify enhancement priorities.
Starting Prompt: “What features should I add to my screener?”
🚀 Priority Hints:
Describe current capabilities first
Specify your user (long-term investors)
Ask for ranked suggestions
Include implementation complexity
💡 Strategic Thinking:
Ask about user value vs. effort
Request competitive analysis
Inquire about data availability
Ask about maintenance burden
🎯 Your Challenge: Create a prompt that helps you prioritize features like a product manager, not just a developer.
R - Reflect#
Key Insights
Different APIs have different strengths
Data validation is crucial
Simple code can do professional analysis
Free tools are sufficient for learning
📋 AI Learning Support - Synthesis
Learning Goal: Connect learning to career applications.
Starting Prompt: “How do professionals use APIs?”
🚀 Career Context:
Specify what you’ve learned
Ask about real job applications
Inquire about scale differences
Request transition advice
💡 Professional Reality:
Ask about Bloomberg terminal vs. free APIs
Request team workflow insights
Inquire about compliance requirements
Ask about career progression path
🎯 Your Challenge: Develop a prompt that bridges the gap between classroom API usage and managing data for a $100M portfolio.
Section 5: Assignment#
Scenario#
You’re a data analyst at an investment research firm. Your team needs reliable financial data for a new sector-focused portfolio strategy. Design and implement a robust data collection system.
Requirements#
Create a video (approximately 10-15 minutes) demonstrating:
Portfolio design with at least 5 securities
API selection and implementation strategy
Code execution showing data collection for your portfolio
Error handling and data validation approaches
Execution Format#
Use your completed Jupyter notebook or Python script
Run your code cell-by-cell while explaining what each part does
Show outputs and interpret them immediately
Demonstrate how your system handles errors or edge cases
Deliverables#
Video demonstration showing code execution and analysis
Python code file (.py or .ipynb)
Section 6: Reflect & Connect - Financial Insights Discussion#
Individual Reflection (10 minutes)#
Write your thoughts on:
What surprised you about financial data APIs?
Why do data differences exist between sources?
How will you use APIs in your career?
What was the hardest concept to understand?
Small Group Discussion (15 minutes)#
In groups of 3-4, discuss:
Compare API experiences
Which API was easiest to use?
What errors did you encounter?
How did you solve problems?
Data quality insights
What differences did you find?
Which source do you trust more?
How would you validate professionally?
Real-world applications
How would this scale to 500 stocks?
What additional data would you want?
How often would you update data?
Class Synthesis (15 minutes)#
Key Topics for Discussion:
The Build vs. Buy Decision
When to use free APIs
When to pay for data
Cost-benefit analysis
Data Quality Standards
Acceptable error margins
Validation requirements
Documentation needs
Career Applications
Entry-level: Using provided data
Mid-level: Choosing data sources
Senior: Data strategy decisions
Connecting to Practice#
Industry Insight: “At Vanguard, we spend $50 million annually on data. But the tools you’re learning - Python, APIs, validation - are exactly what our analysts use daily. The scale is different, but the skills are the same.”
Maria Chen, Head of Data, Vanguard
Key Takeaways Board#
Create class list of:
Most useful APIs
Common error messages
Validation techniques
Career applications
Section 7: Looking Ahead - From APIs to Analysis#
Skills Mastered#
✅ Technical Competencies:
Making API calls successfully
Understanding data structures
Comparing multiple sources
Basic error handling
✅ Professional Skills:
Data quality awareness
Validation mindset
Problem-solving approach
Clear documentation
Building Bridges to Session 3#
Next session’s Data Cleaning & Validation builds on today:
From Getting to Cleaning: Today we got data; next we’ll make it analysis-ready
From Errors to Solutions: Today we saw problems; next we’ll fix them systematically
From Manual to Automated: Today we validated manually; next we’ll automate quality checks
From Single to Multiple: Today we used one stock; next we’ll handle portfolios
Connecting Concepts#
Session 2: Get Data → Session 3: Clean Data → Session 4: Analyze Patterns
↓ ↓ ↓
API calls Handle missing data Find trends
Download prices Fix inconsistencies Calculate statistics
Check sources Remove outliers Make predictions
Preview Challenge#
Before next session, think about:
What if prices are missing for holidays?
How do you handle stock splits in data?
What if two sources disagree significantly?
How do you document data quality issues?
Professional Development Path#
Your API skills enable:
Immediate: Pull data for any stock analysis
Next Month: Build automated data pipelines
Six Months: Create trading strategies
One Year: Design data architecture
Preparation for Session 3#
Technical Prep:
Ensure yfinance works reliably
Get Alpha Vantage API key
Practice with 5-10 stocks
Conceptual Prep:
Think about data quality
Consider missing data scenarios
Review validation approaches
Professional Prep:
Research data vendor differences
Understand clean vs. raw data
Consider audit trail needs
Section 8: Appendix - Solutions & Implementation Guide#
Complete Solution Code#
# COMPLETE DIVIDEND SCREENER SOLUTION
# Simple enough for beginners, professional enough for interviews
# ===== PART 1: BASIC SETUP =====
import yfinance as yf
import time # For delays between API calls
# List of Dividend Aristocrats
stocks = ['JNJ', 'KO', 'PEP', 'PG', 'MMM', 'CL', 'EMR', 'GPC']
names = ['Johnson & Johnson', 'Coca-Cola', 'PepsiCo', 'P&G',
'3M', 'Colgate', 'Emerson', 'Genuine Parts']
# Store results
results = []
# ===== PART 2: COLLECT DATA =====
print("Collecting dividend data...\n")
for i in range(len(stocks)):
ticker = stocks[i]
name = names[i]
# Get stock info
stock = yf.Ticker(ticker)
info = stock.info
# Extract data (with safety checks)
price = info.get('currentPrice', 0)
dividend = info.get('dividendRate', 0)
payout = info.get('payoutRatio', 0)
# Calculate yield
if price > 0 and dividend > 0:
div_yield = (dividend / price) * 100
else:
div_yield = 0
# Store result
results.append({
'ticker': ticker,
'name': name,
'price': price,
'dividend': dividend,
'yield': div_yield,
'payout': payout
})
# Display progress
print(f"✓ Processed {name}")
# Small delay to avoid hitting rate limits
time.sleep(0.5)
# ===== PART 3: RANK AND DISPLAY =====
print("\n=== DIVIDEND ARISTOCRATS RANKED BY YIELD ===\n")
# Sort by yield (highest first)
results.sort(key=lambda x: x['yield'], reverse=True)
# Display results
for stock in results:
print(f"{stock['name']} ({stock['ticker']})")
print(f" Price: ${stock['price']:.2f}")
print(f" Annual Dividend: ${stock['dividend']:.2f}")
print(f" Dividend Yield: {stock['yield']:.2f}%")
print(f" Payout Ratio: {stock['payout']:.1%}")
print()
# ===== PART 4: FIND BEST DIVIDEND STOCK =====
if results:
best = results[0] # Already sorted, so first is best
print(f"BEST DIVIDEND YIELD: {best['name']} at {best['yield']:.2f}%")
Practice Problem Solutions#
Exercise 1: Simple API Call
# Solution: Download and display Apple data
import yfinance as yf
# Download one month of Apple data
apple = yf.download('AAPL', start='2024-01-01', end='2024-01-31')
# Show first few rows
print("First 5 days of Apple data:")
print(apple.head())
# Show basic statistics
print("\nApple price statistics:")
print(f"Highest price: ${apple['High'].max():.2f}")
print(f"Lowest price: ${apple['Low'].min():.2f}")
print(f"Average volume: {apple['Volume'].mean():,.0f}")
Common Student Mistakes#
Forgetting API Limits
Wrong: Making 100 calls instantly
Right: Add time.sleep(1) between calls
Not Handling Missing Data
Wrong: info[‘dividendRate’]
Right: info.get(‘dividendRate’, 0)
Overcomplicating Code
Wrong: Complex classes and functions
Right: Simple step-by-step code
No Error Checking
Wrong: Assuming API always works
Right: Check if data exists before using
API Quick Reference#
yfinance Basics:
# Download historical data
data = yf.download('AAPL', start='2024-01-01', end='2024-12-31')
# Get company info
ticker = yf.Ticker('AAPL')
info = ticker.info
dividend = info.get('dividendRate', 0)
Alpha Vantage Basics:
# Get your free key from: https://www.alphavantage.co/support/#api-key
url = f'https://www.alphavantage.co/query?function=OVERVIEW&symbol=AAPL&apikey={key}'
response = requests.get(url)
data = response.json()
Implementation Notes for Instructors#
Pre-Session Setup#
Test yfinance is working
Have Alpha Vantage keys ready
Prepare backup CSV files
Test API rate limits
Common Issues & Solutions#
“No data found”: Check ticker symbol spelling
Rate limits: Add delays between calls
Missing dividends: Some stocks don’t pay dividends
Connection errors: Have offline backup data
Assessment Tips#
Watch for copy-pasted code they can’t explain
Ask them to change ticker symbols live
Have them explain error messages
Test their understanding of yields
Data Files#
Note: Sample data provided after course completion.
For Session 2, students should:
Use live APIs (learning experience)
Save their downloaded data as backup
Compare multiple sources themselves
Instructor Resources:
Backup CSV files for common stocks
Pre-written error handling examples
API comparison spreadsheet
Troubleshooting guide
End of Session 2: Financial Data Collection & APIs
Next Session: Data Cleaning & Validation - Making Your Data Analysis-Ready