Image: ESA - C.Carreau (SEMPDN9OY2F)
Overview | Assignments | Lab sessions | Hall of Fame

NETS 212: Scalable and Cloud Computing (Fall 2020)

What is the "cloud"? How do we build software systems and components that scale to millions of users and petabytes of data, and are "always available"?

In the modern Internet, virtually all large Web services run atop multiple geographically distributed data centers: Google, Yahoo, Facebook, iTunes, Amazon, eBay, Bing, etc. Services must scale across thousands of machines, tolerate faults, and support thousands of concurrent requests. Increasingly, the major providers (including Amazon, Google, Microsoft, HP, and IBM) are looking at "hosting" third-party applications in their data centers - forming so-called "cloud computing" services. A significant number of these services also process "streaming" data: geocoding information from cell phones, tweets, streaming video, etc.

This course, aimed at a sophomore with exposure to basic programming within the context of a single machine, focuses on the issues and programming models related to such cloud and distributed data processing technologies: data partitioning, storage schemes, stream processing, and "mostly shared-nothing" parallel algorithms.

NETS212 is a required course for the NETS program and for the Data Science Minor.

Instructors

Andreas Haeberlen
Office hour: Wednesdays 10:00-11:00am (on Zoom)

Zachary G. Ives
Office hour: Mondays 2:00-3:00pm (via gather.town)

Teaching assistants

Chaim Fishman chaimj@sas.upenn.edu OH: Sundays 10:30-11:30am EDT
Sarah Payne paynesa@sas.upenn.edu OH: Mondays 2:00-3:00pm EDT
Peter Chen cbaile@seas.upenn.edu OH: Mondays 5:00-6:00pm EDT
Joan Shaho jshaho@seas.upenn.edu OH: Tuesdays noon-1:00pm EDT
Stefan Papazov spapazov@seas.upenn.edu OH: Tuesdays 6:00-7:00pm EDT
Vatsal Jain vatsal99@seas.upenn.edu OH: Wednesdays 1:00-2:00am EDT
Alexander Go alexdgo@seas.upenn.edu OH: Wednesdays 3:00-4:00pm EDT
Anthony Mansur amansur@seas.upenn.edu OH: Wednesdays 4:00-5:00pm EDT
Tashweena Heeramun htash@seas.upenn.edu OH: Thursdays 8:00-9:00am EDT
Bharath Jaladi bjaladi@seas.upenn.edu OH: Thursdays 11:00am-noon EDT
Jamie Wang jamwa@wharton.upenn.edu OH: Thursdays 3:30-4:30pm EDT
Lydia Ma malydia@wharton.upenn.edu OH: Fridays 11:00am-noon EDT
Vraj Shroff vshroff@sas.upenn.edu OH: Fridays 10:00-11:00am EDT

We will be using ohq.io for the TA office hours.

Format

The Fall 2020 version of this class will be entirely online, due to COVID-19. We will make prerecorded lectures available for download, and we will use the class slots for discussion, review, and Q&A. The review sessions will be recorded as well. There will be regular homework assignments, two midterms (online, via GradeScope), and a final team project. We will use Piazza for course-related discussions, and there will be occasional lab sessions.

Time and location

Q&A: Tuesdays 1:30pm EDT (Zoom link)

Prerequisites

CIS 120, Introduction to Programming
CIS 160, Discrete Mathematics
Co-requisite: CIS 121, Data Structures

Textbooks

Spark: The Definitive Guide, by Bill Chambers and Matei Zaharia (O'Reilly)
ISBN 9781491912218; read online for free, or buy for approx. $54.

Data-Intensive Text Processing with MapReduce, by Jimmy Lin and Chris Dyer (Morgan & Claypool)
ISBN 978-1608453429; read online for free, or buy for approx. $40.

Additional materials will be provided as handouts or in the form of light technical papers.

Grading

Homework 35%, Term project 35%, Exams 20%, Participation/quizzes 10%

Policies

You are encouraged to discuss your homework assignments with your classmates; however, any code you submit must be your own work. You may not share code with others or copy code from outside sources, except where the assignment specifically allows it. Plagiarism can have serious consequences.

Recordings and other materials

We will make the recordings from the lectures, Q&A sessions, and labs available on this web page for the duration of this course. These recordings, as well as the other course materials (slides, handouts, framework code) are solely for your personal, educational use and may not be shared, copied, or redistributed without permission of the instructors. You are not allowed to record class sessions yourself. Unauthorized sharing or recording is a violation of the Code of Academic Integrity.

Project and awards

The final team project is to build a small Facebook-like application using Node.js and Amazon's DynamoDB. Based on network analysis, the application should make friend recommendations; it should also visualize the social network. In previous years, Facebook sponsored an award for the best term project. You can learn more about the winners from previous years in the Hall of Fame.

Assignments

Homework assignments will be available for download; you can submit your solution here. If necessary, you can request an extension.

Tentative schedule

DateTopicDetailsReadingRemarks
Sep 1 Introduction [Q&A] Course introduction [Video] [Slides]
What is the Cloud, and why is it interesting? [Video] [Slides]
Data-centric computing [Video] [Slides]
Course goals [Video] [Slides]
Logistics [Video] [Slides]
Policies [Video] [Slides]
Overview of topics [Video] [Slides]
Sep 3 The Cloud What is the Cloud? [Video] [Slides] [Quiz]
Cloud hardware [Video] [Slides] [Quiz]
Problems with classical scaling [Video] [Slides] [Quiz]
Utility computing [Video] [Slides] [Quiz]
Kinds of clouds [Video] [Slides] [Quiz]
Virtualization [Video] [Slides] [Quiz]
Cloud challenges [Video] [Slides] [Quiz]
Armbrust: A view of cloud computing HW0 released
Sep 8 Concurrency [Q&A] Scalability and parallelization; Amdahl's law [Video] [Slides] [Quiz]
Synchronization/concurrency/consistency [Video] [Slides] [Quiz]
Mutual exclusing and locking [Video] [Slides] [Quiz]
NUMA, shared-nothing [Video] [Slides] [Quiz]
Frontend/backend, sharding [Video] [Slides] [Quiz]
Vogels: Eventually consistent
Sep 10 The Internet The Internet; packet switching [Video] [Slides] [Quiz]
Path properties; TCP [Video] [Slides] [Quiz]
HW1 overview [Video] [Slides] [Quiz]
MDN: A re-introduction to JavaScript HW0 due; HW1 released
Sep 15 Faults and Failures [Q&A] Fault models [Video] [Slides] [Quiz]
Examples of non-crash faults [Video] [Slides] [Quiz]
Replication; durability and availability [Video] [Slides] [Quiz]
Primary-backup replication [Video] [Slides] [Quiz]
Quorum replication [Video] [Slides] [Quiz]
Network partitions; CAP theorem [Video] [Slides] [Quiz]
Tseitlin: The antifragile organization
Sep 15Last day to add
Sep 17 Cloud basics History of cloud computing [Video] [Slides] [Quiz]
Interacting with the cloud [Video] [Slides] [Quiz]
EC2 basics [Video] [Slides] [Quiz]
EBS basics [Video] [Slides] [Quiz]
Overview of some other AWS services [Video] [Slides] [Quiz]
Cloud computing features, issues, and challenges: a big picture HW1MS1 due
Sep 22 Cloud storage Key-value stores [Video] [Slides] [Quiz]
KVS and concurrency [Video] [Slides] [Quiz]
KVS and the Cloud [Video] [Slides] [Quiz]
Case study: S3 [Video] [Slides] [Quiz]
Case study: DynamoDB [Video] [Slides] [Quiz]
Cooper et al.: PNUTS to Sherpa - Lessons from Yahoo!'s Cloud Database
Sep 24 Spark Introduction to Spark TBA HW1MS2 due; HW2 released
Sep 29 Programming in Spark Programming in Spark TBA
Oct 1 Spark internals Spark internals TBA
Oct 6 Graph algorithms Representing data in graphs
Iterative algorithms
The BSP model
Basic example: Shortest path
Advanced example I
Advanced example II
Lin & Dyer, Chapter 5 HW2MS1 due
Oct 8First midterm exam
Oct 12Last day to drop
Oct 13 Random-walk algorithms Random-surfer model
PageRank challenges and solutions
Adsorption
TBA HW2MS2 due; HW3 released
Oct 15 Iterative processing Advanced processing TBA
Oct 20 Web programming Web overview
Basic HTML, basic CSS, Bootstrap
Client/server model
DNS, DNSSEC
HTTP/HTTPS, headers, verbs
Cookies and their uses
Server design: Concurrency, events
Research spotlight: The cookie ecosystem
TBA
Oct 22 Node.js Project overview and strategy
Motivation: CGI and its problems
Node.js; basic operation
"Hello world" with Node
Express
Diagnostics
Working with databases: DynamoDB API
Session management
TBA HW3 due; HW4 released
Oct 27 Dynamic content The DOM
React
Promises, callbacks
The async library
TBA Project handout released
Oct 29 AJAX Motivation: Interactivity
Accessing the DOM from JavaScript
XMLHttpRequest; the same-origin policy
AJAX, and a concrete example
AJAX with jQuery, JSON
Lazy loading
Web sockets; Socket.io
Working with APIs; Google Maps case study
TBA HW4MS1 due
Oct 30Last day to designate course as pass/fail
Nov 3 Web services Web services
Data interchange; challenges
Data formats
Research spotlight
TBA Team formation deadline; project begins
Nov 5 XML XML, DTDs
XML DOM
TBA HW4MS2 due
Nov 9Last day to withdraw
Nov 10First project check-in
Nov 10 Advanced topic TBA
Nov 12 Security Cryptography
Digital signatures
Case study: RSA
OAuth
OWASP Top 10
Research spotlight: Side-channel attacks
TBA
Nov 17Second project check-in
Nov 17 Databases Relational model; schemata
SQL basics; declarative approach
Transactions; ACID
Challenges with scaling classical databases
SparkSQL
Research spotlight
Nov 19 Peer-to-peer Centralization and its consequences
Case study: BitTorrent; incentives
Fully decentralized systems
Unstructured/structured topologies
Consistent hashing
Key-based routing
Security challenges
Research spotlight: Amazon Dynamo
Rodrigues and Druschel: P2P systems
Nov 24 Case study: Bitcoin Cryptocurrencies
Blockchain
Proof of work; mining
Bitcoin script
Smart contracts
Permissioned networks
Scalability limits of Bitcoin
Research spotlight: Algorand
Nakamoto: Bitcoin Third check-in with TA
Nov 26Thanksgiving - no class
Dec 1 Case study: Facebook Facebook's TAO
Facebook's Haystack
Fourth check-in with TA
Dec 3 Advanced topics Cloud research at Penn I
Cloud research at Penn II
Dec 8Second midterm exam
Dec 10Monday schedule - no class
Dec 15-22Project demos (via Zoom), written reports due