Session 2: Financial Data Collection & APIs - Your Gateway to Market Data

Contents

Session 2: Financial Data Collection & APIs - Your Gateway to Market Data#

Learning Objectives#

By the end of this session, you will be able to:

  1. Understand why different data sources exist and how they serve different needs in finance

  2. Set up and use free financial APIs including yfinance, Alpha Vantage, and financialdatasets.ai

  3. Write simple code to download stock data without complex programming

  4. Handle common data issues like missing data and API limits

  5. Compare data from multiple sources to ensure accuracy

  6. Present your data collection process clearly in a professional video


Section 1: The Financial Hook#

The $50,000 Data Mistake#

In March 2023, a junior analyst at a small hedge fund made a critical error. She downloaded Tesla’s stock price from a free website that hadn’t adjusted for a stock split. Her analysis showed Tesla trading at $600 per share when it was actually at $200 (post-split adjusted).

The fund almost made a $50,000 trade based on this wrong data before a senior analyst caught the error.

The lesson: In finance, bad data leads to bad decisions. And bad decisions cost money.

Why Data Quality Matters#

Professional investors spend millions on data because:

  • Accuracy drives profits: A 1% data error can mean millions in losses

  • Speed matters: Getting data 1 second faster can mean capturing opportunities

  • Completeness counts: Missing dividends or splits destroys return calculations

Your Data Journey#

Today you’ll learn to:

  1. Access professional-grade financial data (for free!)

  2. Understand why data sources differ

  3. Build simple but reliable data collection systems

  4. Validate data like a professional

Real-World Application Timeline#

Week 1: Manual data collection (like Session 1)
Week 2: Automated data with APIs (today!)
Week 3: Multiple sources for validation
Week 4: Building data pipelines
Career: Managing \$millions based on data quality

Section 2: Foundational Financial Concepts & Models#

Understanding Financial Data Sources#

Types of Financial Data:

  1. Price Data

    • Open, High, Low, Close (OHLC)

    • Volume traded

    • Adjusted vs. unadjusted prices

  2. Fundamental Data

    • Financial statements

    • Earnings reports

    • Company metrics

  3. Alternative Data

    • News sentiment

    • Social media mentions

    • Satellite imagery (yes, really!)

The Data Provider Ecosystem#

FREE TIER (Good for Learning)
├── Yahoo Finance (yfinance)
   └── Most popular, reliable for basics
├── Alpha Vantage
   └── Good API, requires free key
└── Financial Datasets AI
    └── New, AI-focused, generous free tier

INSTITUTIONAL (What Professionals Use)
├── Bloomberg Terminal ($24,000/year)
├── Refinitiv Eikon ($22,000/year)
└── S&P Capital IQ ($20,000+/year)

Key Concept: Adjusted vs. Unadjusted Prices#

Unadjusted Price: The actual price on that day Adjusted Price: Accounts for splits and dividends

What is a Stock Split? A stock split is when a company divides its existing shares into multiple shares. In a 4-for-1 split, each share becomes 4 shares, but the total value stays the same (like cutting a pizza into more slices - more pieces, same amount of pizza).

Example: Apple’s 4-for-1 split in 2020

  • Unadjusted: \(500 → \)125 (looks like 75% loss!)

  • Adjusted: Both show as $125 (no change)

Always use adjusted prices for return calculations!

API Basics (No Programming Experience Needed!)#

What’s an API? API stands for Application Programming Interface. In simple terms:

  • Think of it as a data vending machine

  • You send a request (put in money and select your item)

  • You get data back (receive your snack)

Real-world analogy: When you use a weather app on your phone, it doesn’t generate weather data—it asks a weather service’s API for the data and displays it nicely. Financial APIs work the same way: your code asks for stock prices, and the API sends them back.

Three Simple Steps:

  1. Install the tool (one-time setup)

  2. Ask for data (ticker and dates)

  3. Receive your data (prices, volumes, etc.)

🎯 AI Learning Support - Understanding APIs#

Learning Goal: Understand what APIs are and why they matter for finance professionals.

Starting Prompt: “What is an API?”

🚀 Hints to Improve Your Prompt:

  • Ask for a relatable analogy (restaurant, library, etc.)

  • Specify you need finance context

  • Include what you already understand

  • Ask about authentication (API keys)

💡 Better Version Hints:

  • Ask about free vs. paid APIs in finance

  • Request examples of financial data APIs

  • Inquire about rate limits and costs

  • Ask how APIs differ from downloading CSV files

🎯 Your Challenge: Create a prompt that helps you understand not just what APIs are, but why financial firms pay thousands of dollars for premium API access.


Section 3: The Financial Gym - Partner Practice & AI Copilot Learning#

Solo Warm-Up (15 minutes)#

Exercise 1: Your First API Call

# THE SIMPLEST POSSIBLE API CALL
# No functions, no classes, just three lines!

import yfinance as yf

# Download Apple stock data
apple_data = yf.download('AAPL', start='2024-01-01', end='2024-01-31')

# Look at what we got
print(apple_data.head())

What each line does:

  • Line 1: Import the tool (like opening Excel)

  • Line 2: Download data (like clicking “Download”)

  • Line 3: Show first 5 rows (like viewing a spreadsheet)

AI Copilot Learning Phase (20 minutes)#

Step 1: Understanding Data Differences

Try this experiment:

# Get Microsoft data from yfinance
msft = yf.download('MSFT', start='2024-01-01', end='2024-01-05')
print("Yahoo Finance MSFT Close:")
print(msft['Close'])

Now ask AI: “Why might Microsoft’s closing price be slightly different on different financial websites for the same date?”

Step 2: Exploring API Options

🚀 Professional Prompt Sample A (Grade: A): “I’m learning about financial APIs. I know about yfinance for basic data. What are the pros and cons of Alpha Vantage vs financialdatasets.ai for a student learning financial modeling? I need to understand which to learn first.”

❌ Weak Prompt Sample (Grade: D): “List all financial APIs.”

Reciprocal Teaching Component (25 minutes)#

Partner Exercise: API Comparison

Partner A (10 min):

  1. Explain what an API is using a real-world analogy

  2. Show your partner how to make a simple yfinance call

  3. Explain what “adjusted close” means

Partner B (10 min):

  1. Explain why we need API keys for some services

  2. Show what happens when an API call fails

  3. Explain the difference between real-time and historical data

Together (5 min):

  • Discuss: Which API would you use for a retirement portfolio analysis? Why?

Collaborative Challenge (20 minutes)#

Build a Simple Data Collector

# SIMPLE DIVIDEND DATA COLLECTOR
# Work together to complete this code

# Step 1: Import tool
import yfinance as yf

# Step 2: Pick three dividend stocks
stock1 = 'JNJ'   # Johnson & Johnson
stock2 = 'KO'    # Coca-Cola
stock3 = 'PG'    # Procter & Gamble

# Step 3: Get data for each stock (you complete this!)
# Hint: Use yf.download() like in the example above

# Step 4: Compare closing prices
# Which stock has the highest price?
# Which moved the most in January?

💡 AI Learning Support - Troubleshooting#

Learning Goal: Learn to debug common API issues independently.

Starting Prompt: “I’m getting an error with my API call. Help!”

🚀 Improvement Hints:

  • Include the exact error message

  • Specify which API and what data you’re trying to get

  • Mention what you’ve already tried

  • Ask for systematic troubleshooting steps

💡 Advanced Hints:

  • Ask about common API error patterns

  • Request rate limit handling strategies

  • Inquire about fallback options

  • Ask how professionals handle API outages

🎯 Your Challenge: Create a prompt that would help you debug API issues like a senior developer, including preventive measures.


Section 4: The Financial Coaching - Your DRIVER Learning Guide#

Complete DRIVER Case Study: Building a Dividend Aristocrat Screener#

What is a Dividend Aristocrat? A Dividend Aristocrat is a company in the S&P 500 that has increased its dividend payment every year for at least 25 consecutive years. These are considered the most reliable dividend-paying companies because they’ve proven they can raise dividends through recessions, market crashes, and economic uncertainty. Examples include Coca-Cola, Johnson & Johnson, and Procter & Gamble.

Scenario: You’re interviewing for a financial analyst position. The interviewer asks: “Show me how you’d identify the best dividend-paying stocks using Python. Keep it simple - I want to see your thought process, not complex code.”

D - Define & Discover#

Understanding the Problem

What makes a great dividend stock?

  • Consistent dividend payments

  • Growing dividends over time

  • Sustainable payout ratios

  • Strong company fundamentals

What is a Payout Ratio? The payout ratio tells you what percentage of a company’s earnings are paid out as dividends. For example, if a company earns $4 per share and pays $1 in dividends, the payout ratio is 25% (1/4 = 0.25). A sustainable payout ratio is typically below 60% - this means the company keeps enough earnings to grow while still rewarding shareholders.

🎯 AI Learning Support - Problem Exploration

Learning Goal: Understand key metrics for dividend stock analysis.

Starting Prompt: “What metrics should I use for dividend stocks?”

🚀 Enhancement Hints:

  • Specify you’re building a screener

  • Mention metrics you already know

  • Ask for priority ranking

  • Include your investment timeframe

💡 Professional Hints:

  • Ask about dividend sustainability metrics

  • Request sector-specific considerations

  • Inquire about red flags to avoid

  • Ask about institutional criteria

🎯 Your Challenge: Develop a prompt that gets you institutional-grade screening criteria, not just basic metrics.

Design Criteria:

  • Must work with free APIs only

  • Code simple enough to explain line-by-line

  • Focus on 5-10 well-known dividend stocks

  • Compare at least 2 data sources

R - Represent#

Logic Flow (No Complex Diagrams!)

1. Pick dividend stocks to analyze
   
2. Download price data
   
3. Download dividend data
   
4. Calculate dividend yield
   
5. Rank stocks by yield
   
6. Validate with second source

🚀 AI Learning Support - Planning

Learning Goal: Plan before coding.

Starting Prompt: “Help me plan my dividend screener.”

🚀 Better Planning Hints:

  • List the steps you’ve already thought of

  • Specify your simplicity constraints

  • Ask about missing components

  • Request validation checkpoints

💡 Professional Elements:

  • Ask about error handling steps

  • Request data quality checks

  • Inquire about output formatting

  • Ask about documentation needs

🎯 Your Challenge: Create a prompt that helps you build a plan so clear that another student could implement it.

I - Implement#

Simple Dividend Screener

# DIVIDEND ARISTOCRAT SCREENER
# Professional-grade but simple enough for beginners

# Step 1: Import our tool
import yfinance as yf

# Step 2: List of famous dividend stocks
dividend_stocks = ['JNJ', 'KO', 'PEP', 'PG', 'MMM']
stock_names = ['Johnson & Johnson', 'Coca-Cola', 'PepsiCo', 'P&G', '3M']

# Step 3: Collect data for each stock
print("=== DIVIDEND STOCK ANALYSIS ===\n")

for i in range(len(dividend_stocks)):
    # Get stock data
    ticker = dividend_stocks[i]
    name = stock_names[i]
    
    # Download last 1 year of data
    stock = yf.Ticker(ticker)
    info = stock.info
    
    # Get current price and dividend
    current_price = info.get('currentPrice', 0)
    annual_dividend = info.get('dividendRate', 0)
    
    # Calculate dividend yield
    if current_price > 0:
        dividend_yield = (annual_dividend / current_price) * 100
    else:
        dividend_yield = 0
    
    # Display results
    print(f"{name} ({ticker}):")
    print(f"  Current Price: ${current_price:.2f}")
    print(f"  Annual Dividend: ${annual_dividend:.2f}")
    print(f"  Dividend Yield: {dividend_yield:.2f}%")
    print()

💻 AI Learning Support - Code Understanding

Learning Goal: Understand defensive coding practices.

Starting Prompt: “What does .get() do in Python?”

🚀 Context Improvements:

  • Include the specific code line

  • Ask why .get() vs. bracket notation

  • Inquire about the second parameter

  • Ask when to use each approach

💡 Deeper Understanding:

  • Ask about error prevention benefits

  • Request financial data examples

  • Inquire about None vs. 0 defaults

  • Ask about production best practices

🎯 Your Challenge: Create a prompt that helps you understand not just syntax, but why defensive coding matters when handling financial data.

V - Validate#

Cross-Check with Alpha Vantage

# VALIDATION: Check one stock with different source
# First, get your free API key from: https://www.alphavantage.co/support/#api-key

import requests

# Alpha Vantage setup (replace with your key)
api_key = 'YOUR_FREE_API_KEY'
symbol = 'JNJ'

# Build the URL (like a web address for data)
url = f'https://www.alphavantage.co/query?function=OVERVIEW&symbol={symbol}&apikey={api_key}'

# Get the data
response = requests.get(url)
data = response.json()

# Extract dividend information
av_dividend = float(data.get('DividendPerShare', 0))
av_yield = float(data.get('DividendYield', 0)) * 100

print(f"\nValidation for {symbol}:")
print(f"Alpha Vantage Annual Dividend: ${av_dividend:.2f}")
print(f"Alpha Vantage Dividend Yield: {av_yield:.2f}%")

🔍 AI Learning Support - Data Validation

Learning Goal: Understand data discrepancies across sources.

Starting Prompt: “Why are my numbers different from different APIs?”

🚀 Specific Improvements:

  • Include actual numbers and sources

  • Specify the type of data (dividends, prices)

  • Mention the size of discrepancy

  • Ask about materiality thresholds

💡 Professional Validation:

  • Ask about common causes of differences

  • Request industry tolerance standards

  • Inquire about reconciliation methods

  • Ask when differences matter vs. don’t

🎯 Your Challenge: Develop a prompt that helps you understand when data differences are errors vs. acceptable variations.

E - Evolve#

Extend Your Analysis

What you’ve learned can grow into:

  1. Screening 50+ stocks automatically

  2. Adding payout ratio analysis

  3. Historical dividend growth rates

  4. Sector-based comparisons

  5. Automated daily updates

🎯 AI Learning Support - Pattern Recognition

Learning Goal: Identify enhancement priorities.

Starting Prompt: “What features should I add to my screener?”

🚀 Priority Hints:

  • Describe current capabilities first

  • Specify your user (long-term investors)

  • Ask for ranked suggestions

  • Include implementation complexity

💡 Strategic Thinking:

  • Ask about user value vs. effort

  • Request competitive analysis

  • Inquire about data availability

  • Ask about maintenance burden

🎯 Your Challenge: Create a prompt that helps you prioritize features like a product manager, not just a developer.

R - Reflect#

Key Insights

  1. Different APIs have different strengths

  2. Data validation is crucial

  3. Simple code can do professional analysis

  4. Free tools are sufficient for learning

📋 AI Learning Support - Synthesis

Learning Goal: Connect learning to career applications.

Starting Prompt: “How do professionals use APIs?”

🚀 Career Context:

  • Specify what you’ve learned

  • Ask about real job applications

  • Inquire about scale differences

  • Request transition advice

💡 Professional Reality:

  • Ask about Bloomberg terminal vs. free APIs

  • Request team workflow insights

  • Inquire about compliance requirements

  • Ask about career progression path

🎯 Your Challenge: Develop a prompt that bridges the gap between classroom API usage and managing data for a $100M portfolio.


Section 5: Assignment#

Scenario#

You’re a data analyst at an investment research firm. Your team needs reliable financial data for a new sector-focused portfolio strategy. Design and implement a robust data collection system.

Requirements#

Create a video (approximately 10-15 minutes) demonstrating:

  • Portfolio design with at least 5 securities

  • API selection and implementation strategy

  • Code execution showing data collection for your portfolio

  • Error handling and data validation approaches

Execution Format#

  • Use your completed Jupyter notebook or Python script

  • Run your code cell-by-cell while explaining what each part does

  • Show outputs and interpret them immediately

  • Demonstrate how your system handles errors or edge cases

Deliverables#

  1. Video demonstration showing code execution and analysis

  2. Python code file (.py or .ipynb)


Section 6: Reflect & Connect - Financial Insights Discussion#

Individual Reflection (10 minutes)#

Write your thoughts on:

  1. What surprised you about financial data APIs?

  2. Why do data differences exist between sources?

  3. How will you use APIs in your career?

  4. What was the hardest concept to understand?

Small Group Discussion (15 minutes)#

In groups of 3-4, discuss:

  1. Compare API experiences

    • Which API was easiest to use?

    • What errors did you encounter?

    • How did you solve problems?

  2. Data quality insights

    • What differences did you find?

    • Which source do you trust more?

    • How would you validate professionally?

  3. Real-world applications

    • How would this scale to 500 stocks?

    • What additional data would you want?

    • How often would you update data?

Class Synthesis (15 minutes)#

Key Topics for Discussion:

  1. The Build vs. Buy Decision

    • When to use free APIs

    • When to pay for data

    • Cost-benefit analysis

  2. Data Quality Standards

    • Acceptable error margins

    • Validation requirements

    • Documentation needs

  3. Career Applications

    • Entry-level: Using provided data

    • Mid-level: Choosing data sources

    • Senior: Data strategy decisions

Connecting to Practice#

Industry Insight: “At Vanguard, we spend $50 million annually on data. But the tools you’re learning - Python, APIs, validation - are exactly what our analysts use daily. The scale is different, but the skills are the same.”

  • Maria Chen, Head of Data, Vanguard

Key Takeaways Board#

Create class list of:

  • Most useful APIs

  • Common error messages

  • Validation techniques

  • Career applications


Section 7: Looking Ahead - From APIs to Analysis#

Skills Mastered#

Technical Competencies:

  • Making API calls successfully

  • Understanding data structures

  • Comparing multiple sources

  • Basic error handling

Professional Skills:

  • Data quality awareness

  • Validation mindset

  • Problem-solving approach

  • Clear documentation

Building Bridges to Session 3#

Next session’s Data Cleaning & Validation builds on today:

  1. From Getting to Cleaning: Today we got data; next we’ll make it analysis-ready

  2. From Errors to Solutions: Today we saw problems; next we’ll fix them systematically

  3. From Manual to Automated: Today we validated manually; next we’ll automate quality checks

  4. From Single to Multiple: Today we used one stock; next we’ll handle portfolios

Connecting Concepts#

Session 2: Get Data  Session 3: Clean Data  Session 4: Analyze Patterns
                                                 
API calls            Handle missing data      Find trends
Download prices      Fix inconsistencies      Calculate statistics  
Check sources        Remove outliers          Make predictions

Preview Challenge#

Before next session, think about:

  • What if prices are missing for holidays?

  • How do you handle stock splits in data?

  • What if two sources disagree significantly?

  • How do you document data quality issues?

Professional Development Path#

Your API skills enable:

  • Immediate: Pull data for any stock analysis

  • Next Month: Build automated data pipelines

  • Six Months: Create trading strategies

  • One Year: Design data architecture

Preparation for Session 3#

  1. Technical Prep:

    • Ensure yfinance works reliably

    • Get Alpha Vantage API key

    • Practice with 5-10 stocks

  2. Conceptual Prep:

    • Think about data quality

    • Consider missing data scenarios

    • Review validation approaches

  3. Professional Prep:

    • Research data vendor differences

    • Understand clean vs. raw data

    • Consider audit trail needs


Section 8: Appendix - Solutions & Implementation Guide#

Complete Solution Code#

# COMPLETE DIVIDEND SCREENER SOLUTION
# Simple enough for beginners, professional enough for interviews

# ===== PART 1: BASIC SETUP =====
import yfinance as yf
import time  # For delays between API calls

# List of Dividend Aristocrats
stocks = ['JNJ', 'KO', 'PEP', 'PG', 'MMM', 'CL', 'EMR', 'GPC']
names = ['Johnson & Johnson', 'Coca-Cola', 'PepsiCo', 'P&G', 
         '3M', 'Colgate', 'Emerson', 'Genuine Parts']

# Store results
results = []

# ===== PART 2: COLLECT DATA =====
print("Collecting dividend data...\n")

for i in range(len(stocks)):
    ticker = stocks[i]
    name = names[i]
    
    # Get stock info
    stock = yf.Ticker(ticker)
    info = stock.info
    
    # Extract data (with safety checks)
    price = info.get('currentPrice', 0)
    dividend = info.get('dividendRate', 0)
    payout = info.get('payoutRatio', 0)
    
    # Calculate yield
    if price > 0 and dividend > 0:
        div_yield = (dividend / price) * 100
    else:
        div_yield = 0
    
    # Store result
    results.append({
        'ticker': ticker,
        'name': name,
        'price': price,
        'dividend': dividend,
        'yield': div_yield,
        'payout': payout
    })
    
    # Display progress
    print(f"✓ Processed {name}")
    
    # Small delay to avoid hitting rate limits
    time.sleep(0.5)

# ===== PART 3: RANK AND DISPLAY =====
print("\n=== DIVIDEND ARISTOCRATS RANKED BY YIELD ===\n")

# Sort by yield (highest first)
results.sort(key=lambda x: x['yield'], reverse=True)

# Display results
for stock in results:
    print(f"{stock['name']} ({stock['ticker']})")
    print(f"  Price: ${stock['price']:.2f}")
    print(f"  Annual Dividend: ${stock['dividend']:.2f}")
    print(f"  Dividend Yield: {stock['yield']:.2f}%")
    print(f"  Payout Ratio: {stock['payout']:.1%}")
    print()

# ===== PART 4: FIND BEST DIVIDEND STOCK =====
if results:
    best = results[0]  # Already sorted, so first is best
    print(f"BEST DIVIDEND YIELD: {best['name']} at {best['yield']:.2f}%")

Practice Problem Solutions#

Exercise 1: Simple API Call

# Solution: Download and display Apple data
import yfinance as yf

# Download one month of Apple data
apple = yf.download('AAPL', start='2024-01-01', end='2024-01-31')

# Show first few rows
print("First 5 days of Apple data:")
print(apple.head())

# Show basic statistics
print("\nApple price statistics:")
print(f"Highest price: ${apple['High'].max():.2f}")
print(f"Lowest price: ${apple['Low'].min():.2f}")
print(f"Average volume: {apple['Volume'].mean():,.0f}")

Common Student Mistakes#

  1. Forgetting API Limits

    • Wrong: Making 100 calls instantly

    • Right: Add time.sleep(1) between calls

  2. Not Handling Missing Data

    • Wrong: info[‘dividendRate’]

    • Right: info.get(‘dividendRate’, 0)

  3. Overcomplicating Code

    • Wrong: Complex classes and functions

    • Right: Simple step-by-step code

  4. No Error Checking

    • Wrong: Assuming API always works

    • Right: Check if data exists before using

API Quick Reference#

yfinance Basics:

# Download historical data
data = yf.download('AAPL', start='2024-01-01', end='2024-12-31')

# Get company info
ticker = yf.Ticker('AAPL')
info = ticker.info
dividend = info.get('dividendRate', 0)

Alpha Vantage Basics:

# Get your free key from: https://www.alphavantage.co/support/#api-key
url = f'https://www.alphavantage.co/query?function=OVERVIEW&symbol=AAPL&apikey={key}'
response = requests.get(url)
data = response.json()

Implementation Notes for Instructors#

Pre-Session Setup#

  1. Test yfinance is working

  2. Have Alpha Vantage keys ready

  3. Prepare backup CSV files

  4. Test API rate limits

Common Issues & Solutions#

  • “No data found”: Check ticker symbol spelling

  • Rate limits: Add delays between calls

  • Missing dividends: Some stocks don’t pay dividends

  • Connection errors: Have offline backup data

Assessment Tips#

  • Watch for copy-pasted code they can’t explain

  • Ask them to change ticker symbols live

  • Have them explain error messages

  • Test their understanding of yields

Data Files#

Note: Sample data provided after course completion.

For Session 2, students should:

  1. Use live APIs (learning experience)

  2. Save their downloaded data as backup

  3. Compare multiple sources themselves

Instructor Resources:

  • Backup CSV files for common stocks

  • Pre-written error handling examples

  • API comparison spreadsheet

  • Troubleshooting guide


End of Session 2: Financial Data Collection & APIs

Next Session: Data Cleaning & Validation - Making Your Data Analysis-Ready