CS Colloquium 2/1/2013 "The Weakening and Delayed Effects of Long Tail Distributions in Big Data Access"Posted Jan. 23, 2013
The Weakening and Delayed Effects
of Long Tail Distributions in Big Data Accesses
Dr. Xiaodong Zhang, Ph.D.
Robert M. Critchfield Professor in Engineering and Chair
Computer Science and Engineering Department
The Ohio State University
Locality is an important and classic concept in computer science, which supports various system designs and implementation and programming models: frequently used data are stored in hierarchical caching and buffer systems for fast accesses. This type of access patterns is characterized by the power law or Zipf distribution with "long tail" effects. As rapid advancement of computer and network systems, data accesseshave been fundamentally changed in both space and time. Data can be stored in an unlimited way in low cost disks, while dada access latency has been significantly shortened due to advanced storage and search technologies, The distributed systems to process data in increasingly large volumes (called big data) have become big and flat, reflecting the new system design concept of "scale-out".
Based on our long-term analysis of large volumes of Internet streaming data, we have found that the long tail effects have been weekend and delayed for big data accesses. We have developed a statistic model called stretched exponential model to characterize big data access patterns. This model has been verified by many big data applications world-wide in the last four years.
I will present the stretched exponential model: its development, trace data verifications, and its usage in many big data applications.
Xiaodong Zhang is the Robert M. Critchfield Professor in Engineering and Chair of the Computer Science and Engineering Department at the Ohio State University. His research interests focus on data management in computer and distributed systems. He has made strong efforts to transfer his academic research into advanced technology to update the design and implementation of major general-purpose computing systems. He received his Ph.D. in Computer Science from University of Colorado at Boulder, where he received Distinguished Engineering Alumni Award in 2011. He is a Fellow of the ACM, and a Fellow of the IEEE.