Course Information:

  • General description. 现代信息技术的飞速发展使得许多科学,技术,商业,医药中的发现和决定是在分析海量数据集的基础上进行,但是仅仅从数据分析出发很容易得到错误的结论,而且与个体相关的数据隐私性也是一个问题。本课程提供了一个基础的大数据分析技术及其应用介绍:历史 背景及案例分析; 隐私问题;数据分析技术,包括数据库,数据挖掘,和机器学习; 采样和统计意义; 数据分析工具包括R,SQL,Python 以及数据可视化技术和工具。
  • Prerequisites. 统计学习, 计算机编程.
  • Textbook.自编讲义.
  • Office hours. AM 9:00-12:00, PM 2:30- 5:00
  • Assignments & tests.以project报告为主.
  • Course grade. 课程最终成绩由以下决定: 15/100 课堂出勤等以及85/100 Project报告成绩
  • 网上在线答疑
课程内容
  • Topic 1: Introduction
  • Topic 2: Big data platforms
    • Readings:
      • Hadoop for Dummies
  • Topic 3: Spark
  • Topic 4: Correlation and causation; Privacy
  • Topic 5: Regression
  • Topic 6: Classification and clustering
    • Readings:
      • Doing Data Science: Chapter 3: Algorithms
      • L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classfication and Regression Trees, Wadsworth, Belmont, 1984.
      • J. R. Quinlan. Bagging, boosting, and C4.5, AAAI 96.
      • Christopher J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2:121 - 167, 1998.
      • A. K. Jain, M. N. Murty, and P. J. Flynn, Data Clustering: A Review, ACM Computing Surveys, 31(3), 1999.
      • J. Grabmeier and A. Rudolph, Techniques of Cluster Algorithms in Data Mining, Data Mining and Knowledge Discovery, 6:303-360, 2002.
  • Topic 7: Text analysis
    • Readings:
      • Sholom M. Weiss, Nitin Indurkhya, Tong Zhang and Fred J. Damerau, (2005) Text Mining: Predictive Methods for Analyzing Unstructured Information . Springer
  • Topic 8: Associate rule mining
    • Readings:
      • R. Agrawal, T. Imielinski, and A. Swami. Mining Association Rules Between Sets of Items in Large Database, SIGMOD 93.
      • A. Savasere, E. Omiecinski, and S. Navathe. An Efficient Algorithm for Mining Association Rules in Large Databases, VLDB 95.
      • S. Brin, R. Motwani, and C. Silverstein. Beyond Market Baskets: Generalizing Association Rules to Correlations, SIGMOD 97.
      • A. Ceglar and J. F. Prddick. Association Mining. ACM Computing Surveys, Vol. 38, Issue 2, 2006. (A survey paper on association rule mining)
  • Topic 9: Recommendation
    • Readings:
      • Dietmar Jannach et al. (2010) Recommender Systems : An introduction. Cambridge university press.
  • Topic 10: Data visualizaiton and Tableau
    • Readings:
      • Wong, P. C., Shen, H. W., Johnson, C. R., Chen, C., & Ross, R. B. (2012). The top 10 challenges in extreme-scale visual analytics. IEEE computer graphics and applications, 32(4), 63.

返回(Back)