环球教育

    当前位置: 首页 > OSSD资讯

OSSD中的大数据

2020-06-10 环球教育

  现在我们的生活中哪里都是大数据的应用,我们甚至可以根据大数据来帮助一个人相亲,匹配出合适的对象,那其实数据也分非常多种,今天我们来了解一下。


  1.介绍


  “More people like to eat fruits than vegetables”


  In order to draw general conclusions,such as the one above,information must be gathered,organized,and displayed clearly.Before we get into the details,let's learn the basic:what is data?


  “更多的人喜欢吃水果而不是蔬菜”


  为了得出上述一般性结论,必须收集,整理和清楚地显示信息。在深入探讨细节之前,让我们学习基本知识:什么是数据?


  Data:Information providing the basis of a discussion from which conclusions may be drawn;data often takes the form of numbers that can be displayed graphically or in a table.In general,data is a collection of facts,such as numbers,words,measurements,observations or even just descriptions of things.A data set is the data used in a particular study.


  数据:提供讨论依据的信息,可以从中得出结论;数据通常采用数字形式,可以以图形方式或在表格中显示。通常,数据是事实的集合,例如数字,单词,度量,观察结果,甚至只是事物的描述。甲数据集是在一个特定的研究中使用的数据。


  The data we collect are observations of the characteristics of individuals in a population.These characteristics can assume one of several different values.For instance,if we were observing the hair color of students at school,some students would have black hair,some blond,some brown,some red,some with highlights,some with low lights,and some would be without hair.Because these characteristics can vary,we use the term variable to describe them.


  我们收集的数据是对人口个体特征的观察。这些特性可以假定为几个不同的值之一。例如,如果我们观察学生在学校的头发颜色,那么有些学生会留着黑发,有些是金发,有些是棕色,有些是红色,有些带有高光,有些带有弱光,有些则没有头发。由于这些特征可能会有所不同,因此我们使用术语变量来描述它们。


  Hair colour is one example of a variable,but there are many,many more.Gender,age,height,weight,grade,number of hours of sleep,number of hours worked at a part-time job,income,religion,ancestry,and favourite subject in school are a few among a multitude of different variables that we can observe.To be clear about the types of variables we can measure,we will clearly define the different types of variables.


  For example the following table displays the employment by industry in the province of Ontario.Variables are the quantities being measured e.g.list of industries,whereas observations are the collection of data representing the specific year.For example in 2013,1.289 million people were in the Educational services industry.


  头发颜色是变量的一个示例,但还有很多很多。性别,年龄,身高,体重,等级,睡眠时间,兼职工作时间,收入,宗教,血统和在学校中最喜欢的科目是我们可以在众多不同变量中列举的几个观察。为了明确我们可以测量的变量类型,我们将明确定义变量的不同类型。


  例如,下表显示了安大略省各行业的就业情况。变量是被测量的数量,例如行业列表,而观测值是代表特定年份的数据集合。例如,2013年,教育服务行业有128.9万人。


 


  2.分类


  Data can be classified as Qualitative data or Quantitative data.


  Qualitative data is descriptive information(it describes something).


  Quantitative data,is numerical information(numbers).


  数据可以分为定性数据或定量数据。


  定性数据是描述性信息(它描述了某些物质)


  定量数据是数字信息(数字)




  我们用一些事例来说明这些数据是什么~


  Illustration using Examples:


  Hair colour,gender,religion,ancestry,and favourite subject in school are all considered qualitative data.These characteristics cannot be described with a number but rather by a specific category.For this reason,these types of data are also called categorical data.


  头发的颜色,性别,宗教,血统和在学校中最喜欢的科目均被视为定性数据。这些特征不能用数字来描述,而是用特定的类别来描述。因此,这些类型的数据也称为分类数据。


  For instance,if you asked several people what religion they belonged to,their responses could include faiths such as Muslim,Jewish,Protestant,Catholic,Hindu,and Buddhist.Since each religion or category is viewed with equal importance,this variable is called nominal data,which is used to describe the names of the categories.


  例如,如果您问几个人他们属于什么宗教,他们的回答可能包括穆斯林,犹太人,新教徒,天主教徒,印度教徒和佛教徒等信仰。由于每个宗教或类别都具有同等重要性,因此此变量称为名义数据,用于描述类别名称。


  However,if we wanted to see the degree to which students agreed with the new changes to the dress code,some possible responses could include:strongly disagree,somewhat disagree,neither agree nor disagree,somewhat agree,and strongly agree.Since these responses can be ordered in some logical or natural way,this is referred to as an ordinal data.


  但是,如果我们希望了解学生对着装要求的新更改的同意程度,则可能的回答可能包括:强烈不同意,有些不同意,既不同意又不同意,有些同意,并非常同意。由于可以以某种逻辑或自然方式对这些响应进行排序,因此将其称为序数数据。


  Age,height,weight,grade,number of hours of sleep,number of hours worked at a part-time job and income are all data or variables that can assume a quantity and can be expressed by a number.For this reason,these variables are considered quantitative or numerical data.Even these numerical data are slightly different.


  年龄,身高,体重,等级,睡眠小时数,兼职工作时数和收入都是可以假设数量并可以用数字表示的数据或变量。因此,这些变量被视为定量或数值数据。即使这些数值数据也略有不同。


  Grade of a student usually only takes on whole number values—a student can be in grade 4,5 or 6,but there is no such a thing as Grade 4½.Since this data can assume only specific values or a fixed set of values,grade is considered a discrete data.


  一个学生的年级通常只取整数值-一个学生可以处于4、5或6年级,但没有4½年级这样的东西。由于此数据只能采用特定值或一组固定值,因此等级被视为离散数据。


  On the other hand,height and weight,number of hours worked,and number of hours of sleep can take on an infinite number of possibilities;as a result,these data or variables are continuous.Although we usually round our values to the nearest whole number,if we had measuring devices precise enough there would be an infinite number of possible weights(to the milligrams and micrograms),heights(to the millimetres),number of hours of sleep or of work(to the seconds,milliseconds).Measurements,such as length,weight,and temperature,are examples of continuous data.


  另一方面,身高和体重,工作时间和睡眠时间可能具有无限的可能性。结果,这些数据或变量是连续的。尽管我们通常将数值四舍五入为最接近的整数,但是如果我们拥有足够精确的测量设备,则可能会有无数的可能的重量(以毫克和微克为单位),身高(以毫米为单位),睡眠时间或工作(以秒为单位,毫秒)。长度,重量和温度等测量值是连续数据的示例。


  现在大家是否对数据有进一步了解了呢?


  下面我们留一个小问题来判断一下


  question.Which one of the following is quantitative data?


  A.She is black and white.


  B.She has two ears.


  C.She has long hair.


  D.She has a long tail.


  答案下期揭晓哦!顺便揭晓一下上期答案



  环球教育秉持教育成就未来的理念,专注于为中国学子提供优质的出国语言培训及配套服务。环球教育在教学中采用“九步闭环法”,帮助学生快速提升学习效能,同时提供优质的课后服务,跟进学生学习进程,为优质教学提供坚强的保障。目前,环球教育北京学校已构建了包含语言培训、出国咨询、国际课程、游学考察、课程等在内的一站式服务教育生态圈。相关问题可免费咨询http://beijing.gedu.org,或拨打免费热线400-616-8800~

北京市海淀区环球雅思培训学校 版权所有 课程咨询热线:400-616-8800
Copyright 1997 – 2024 gedu.org. All Rights Reserved        京ICP备10036718号
全部课程、服务及教材面向18岁以上人群

市场合作申请